Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Question Does training multiple agents simultaneously using the same policy require the same environment?

Discussion in 'ML-Agents' started by GamesbyJP, Jan 11, 2021.

  1. GamesbyJP


    May 20, 2018
    Hello everyone,

    I'm training an Agent to learn how to drive a car around a track.
    While one track goes well, others go not as smooth.

    So my question is this: when using multiple Agent instances to learn the same policy (to be clear here:, they all have the same behavior name), is it recommended to:
    1) Teach the agents all track simultaneously (to have like 4 agents on each track all training at once)
    2) On a track by track basis? (to have all agents learn track 1, then track 2, then track 3 etc.)
  2. mbaske


    Dec 31, 2017
    It's probably best to randomize your tracks during training. Or cycle through them with each new episode. This way you prevent the policy from overfitting to a particular track, before introducing a new one. If the tracks vary a lot with regard to difficulty, then you might want to start with the easiest ones first. Training multiple agents simultaneously is a good way to collect more experiences in a given time span. I'd recommend having a couple of agents in the scene if that doesn't cause any performance issues.
    If you like, check out - the project contains a basic procedural track generator that's supposed to generalize the agent's behaviour.
  3. andrewcoh_unity


    Unity Technologies

    Sep 5, 2019
    It would be best to train agents on variants of the tracks simultaneously. RL agents will 'overfit' to their most recent training environment so if agents are trained on track 1 and then track 2, they will almost certainly 'forget' track 1 (called catastrophic forgetting).

    That said, it is very important that your observations are not ambiguous when the agent is training with multiple tracks. For example, if the agent's observations are just its coordinates, on one track it may be correct to turn left and on another track it may be correct to turn right. This would not train correctly.

    Let me know if you have any other questions. If you'd like to share your observations and reward function, I may be able to give you a better answer.
    Major_Alfred likes this.