Agent stops moving after so many steps of training

James_Adey · Mar 26, 2020

I have been trying to train my agent to kill all of the enemies around the map and if they are successful it will reset and give them a positive reward. For some reason however, after it does so many steps, the agent just seems to get confused and stops moving. Are there any reasons as to why an agent would get stuck and not do anything? does it give up if it hasnt had any positive reward for so long?

James_Adey · Mar 26, 2020

James_Adey said: ↑

I have been trying to train my agent to kill all of the enemies around the map and if they are successful it will reset and give them a positive reward. For some reason however, after it does so many steps, the agent just seems to get confused and stops moving. Are there any reasons as to why an agent would get stuck and not do anything? does it give up if it hasnt had any positive reward for so long?
Click to expand...

ignore the second screen it didnt crop for some reason

TreyK-47 · Mar 26, 2020

Hey James, which version of Python and C# are you using?

christophergoy · Mar 26, 2020

Hi @James_Adey,
We need some important information in order to help you.
What version of the ML-Agents toolkit are you using?
What training configuration are you using?
What are your observation and action spaces?
How are you rewarding your agent?

Without any of these questions answered, we cannot help you based on the sparse amount of information you've given.

LexVolkov · Mar 26, 2020

I have the same problem. What rewards do you have?

ChrissCrass · Mar 31, 2020

LexVolkov said: ↑

I have the same problem. What rewards do you have?
Click to expand...

There are two common causes of this.

1. you have agent max steps configured incorrectly, and you are not properly calling "ResetOnDone" (depending on the MlAgents version you are on)

2: Vanishing and exploding gradients:
This is a peculiar phenomenon that occurs because of inconsistencies and unbalanced hyper-parameters in observation-action pairs, and the rewards associated with them. (you can identify this issue by checking to see if decisions are still being made by your agents (they will just be outputting a -1 or a 1 without change). Tensorboard shows this as a complete crash of performance. Visually agents show initial signs that this is happening, as usually it takes multiple back-propagation phases to completely ruin the weights of the network.

I can't say precisely what causes it other than the value estimator (et al.) goes haywire due to confusing data. If toying with lambda doesn't fix it, then you can address this in many ways:

1. fix your observations (they should be normalized, they should be relative to the agent; no excuse for poor osbervations)
2. add more observation modalities (more information given to the agent helps to clarify anomalous rewards, increases cost though)
3. increase buffer and batch size (ensures accurate statistical sampling, avoid anomalies, but slows training)
4. learning rate might be too high, or too many epochs (bigger gradient steps are riskier)
5. epsilon might be causing your policy exploration to be too aggressive, resulting in irrecoverable policy territory

Assuming that you guys don't have a simple misconfiguration, you're describing stability issues.

Search Unity

Agent stops moving after so many steps of training

James_Adey

Attached Files:

FORUMS.png

James_Adey

TreyK-47

Unity Technologies

christophergoy

Unity Technologies

LexVolkov

ChrissCrass

Search Unity

Unity ID

Useful Searches

Agent stops moving after so many steps of training

James_Adey

Attached Files:

FORUMS.png

James_Adey

TreyK-47

Unity Technologies

christophergoy

Unity Technologies

LexVolkov

ChrissCrass