Search Unity

Question -num-envs=<n> help - what are Unity instances

Discussion in 'ML-Agents' started by StewedHarry, Sep 24, 2020.

  1. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    I'm looking for ways to speed up training and found this in the documentation: Training Using Concurrent Unity Instances.

    What is meant here by concurrent instances. Is this just referring to many instances of a training scenario within the unity engine, or is it actual Unity instances (multiple windows of the Unity editor)?

    Is this safe to use and does anyone have experience with it?
     
  2. henrypeteet

    henrypeteet

    Unity Technologies

    Joined:
    Aug 19, 2020
    Posts:
    37
    Concurrent instances are actually different copies of a Unity game (that has been built) and you can think of it as multiple editor windows. Note that because there are multiple copies of the same game running the game will need to be built[1] and this can't be used in-editor. This is different from duplicating a single arena/environment inside unity (like is done in many of the example environments).

    num-envs is generally safe, but start with small numbers (so you don't eat all your computer's resources). I would suggest starting with 3 and seeing if you get a speedup. You will likely get faster training but your trained model may train differently.

    [1] See https://github.com/Unity-Technologi...Executable.md#using-an-environment-executable
     
    Last edited: Sep 24, 2020
  3. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    Adding to the above reply, it's safe to use in general. But you might want to adjust your config mentioned in the doc since the trainer is collecting observations from all instances at the same time and it may behave differently.
     
  4. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    In what scenarios would the agent be trained differently with multiple Unity instances? I currently have a scene training with multiple agents. Their observations are homogenised so they can give coherent data to the model. Would they be effected by multiple instance?
     
  5. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    The training behavior could be different despite how many agents there are. There's difference because you are collecting observations from n instances instead of one and the buffer is filled up quicker with some slightly different data.
    In general, this doesn't either significantly hurt or help the overall performance if you adjust the config (especially buffer size) properly, but you can get a big boost in training time.
     
  6. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    Sorry I meant to say multiple instances of the same agent in the scene, each contributing to the same policy.
     
  7. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    Do you mean that you have multiple duplicated agents in your scene, and you're asking if it's safe to run multiple Unity instances of that scene?
     
  8. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    I mean that I have multiple duplicated agents in a single scene. But within this scene multiple environments are training in parallel, like the Rollerball example
     
  9. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    For example like Rollerball or many other example environments that have duplicated agents, you can still use --num-envs with them to speed up training and the explanation in my previous reply still apply. The training behavior could be different since you're collecting observations from num-envs * m agents instead of m agents.
     
  10. Visuallization

    Visuallization

    Joined:
    Mar 14, 2015
    Posts:
    8
    Hey there I was also wondering about num_envs as it is actually affecting the learning_perfromance in my case. E:g. if I train with setting num_envs=1 I get a total accumulated reward of 100 or so and if I train with num_envs=8 and keep everything the same I get a a total accumulated reward of 10.000. So how would I be able to achieve the same high reward with only num_envs=1?