Search Unity

Agent stops moving after so many steps of training

Discussion in 'ML-Agents' started by James_Adey, Mar 26, 2020.

  1. James_Adey

    James_Adey

    Joined:
    Jul 4, 2018
    Posts:
    19
    I have been trying to train my agent to kill all of the enemies around the map and if they are successful it will reset and give them a positive reward. For some reason however, after it does so many steps, the agent just seems to get confused and stops moving. Are there any reasons as to why an agent would get stuck and not do anything? does it give up if it hasnt had any positive reward for so long?
     

    Attached Files:

  2. James_Adey

    James_Adey

    Joined:
    Jul 4, 2018
    Posts:
    19
    ignore the second screen it didnt crop for some reason
     
  3. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    Hey James, which version of Python and C# are you using?
     
  4. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @James_Adey,
    We need some important information in order to help you.
    What version of the ML-Agents toolkit are you using?
    What training configuration are you using?
    What are your observation and action spaces?
    How are you rewarding your agent?

    Without any of these questions answered, we cannot help you based on the sparse amount of information you've given.
     
  5. LexVolkov

    LexVolkov

    Joined:
    Sep 14, 2014
    Posts:
    62
    I have the same problem. What rewards do you have?
     
  6. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    There are two common causes of this.

    1. you have agent max steps configured incorrectly, and you are not properly calling "ResetOnDone" (depending on the MlAgents version you are on)

    2: Vanishing and exploding gradients:
    This is a peculiar phenomenon that occurs because of inconsistencies and unbalanced hyper-parameters in observation-action pairs, and the rewards associated with them. (you can identify this issue by checking to see if decisions are still being made by your agents (they will just be outputting a -1 or a 1 without change). Tensorboard shows this as a complete crash of performance. Visually agents show initial signs that this is happening, as usually it takes multiple back-propagation phases to completely ruin the weights of the network.

    I can't say precisely what causes it other than the value estimator (et al.) goes haywire due to confusing data. If toying with lambda doesn't fix it, then you can address this in many ways:

    1. fix your observations (they should be normalized, they should be relative to the agent; no excuse for poor osbervations)
    2. add more observation modalities (more information given to the agent helps to clarify anomalous rewards, increases cost though)
    3. increase buffer and batch size (ensures accurate statistical sampling, avoid anomalies, but slows training)
    4. learning rate might be too high, or too many epochs (bigger gradient steps are riskier)
    5. epsilon might be causing your policy exploration to be too aggressive, resulting in irrecoverable policy territory

    Assuming that you guys don't have a simple misconfiguration, you're describing stability issues.