Search Unity

Question Does training multiple agents simultaneously using the same policy require the same environment?

Discussion in 'ML-Agents' started by GamesbyJP, Jan 11, 2021.

  1. GamesbyJP

    GamesbyJP

    Joined:
    May 20, 2018
    Posts:
    62
    Hello everyone,

    I'm training an Agent to learn how to drive a car around a track.
    While one track goes well, others go not as smooth.

    So my question is this: when using multiple Agent instances to learn the same policy (to be clear here:, they all have the same behavior name), is it recommended to:
    1) Teach the agents all track simultaneously (to have like 4 agents on each track all training at once)
    or
    2) On a track by track basis? (to have all agents learn track 1, then track 2, then track 3 etc.)
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    It's probably best to randomize your tracks during training. Or cycle through them with each new episode. This way you prevent the policy from overfitting to a particular track, before introducing a new one. If the tracks vary a lot with regard to difficulty, then you might want to start with the easiest ones first. Training multiple agents simultaneously is a good way to collect more experiences in a given time span. I'd recommend having a couple of agents in the scene if that doesn't cause any performance issues.
    If you like, check out https://github.com/mbaske/ml-simple-driver - the project contains a basic procedural track generator that's supposed to generalize the agent's behaviour.
     
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    It would be best to train agents on variants of the tracks simultaneously. RL agents will 'overfit' to their most recent training environment so if agents are trained on track 1 and then track 2, they will almost certainly 'forget' track 1 (called catastrophic forgetting).

    That said, it is very important that your observations are not ambiguous when the agent is training with multiple tracks. For example, if the agent's observations are just its coordinates, on one track it may be correct to turn left and on another track it may be correct to turn right. This would not train correctly.

    Let me know if you have any other questions. If you'd like to share your observations and reward function, I may be able to give you a better answer.
     
    Major_Alfred likes this.