Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

UnityEnvironment worker 0: environment stopping.

Discussion in 'ML-Agents' started by m4l4, Dec 5, 2020.

  1. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    Hi everyone, i was training my agent yesterday, everything was going perfectly until i reached 10 millions steps. Then i noticed a sudden drop in performances. Reward and value loss plummeted, policy loss skyrocketed, and entropy started increasing.

    Here are some tensorboard graphs: https://imgur.com/a/RhzTlUJ

    On top of that, this morning i discovered that the training stopped after 27 millions steps, with the message:
    UnityEnvironment worker 0: environment stopping.

    It's the first time that i see something like that, what could be the reason of such a sudden change?
    and what does "worker 0" means? no agent is responding? no agent is present in the scene?

    How is that possible? Currently the only thing that reset the agents, is the maxStep parameter in the inspector. after 5k steps, they start a new episode, no other line of code to reset agents or environment.

    I don't know if the performance drop, and the sim stop are related, maybe i'm asking two separate questions.

    My agent has 229 observation size, 20 continuous action space, no stacked observation.
    Here's the config file:

    Code (CSharp):
    1. behaviors:
    2.   Walker_4Legs:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2024
    6.       buffer_size: 20240
    7.       learning_rate: 0.0003
    8.       beta: 0.005
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 512
    16.       num_layers: 3
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.995
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     max_steps: 50000000
    24.     time_horizon: 1000
    25.     summary_freq: 100000
    26.     threaded: true
    Any idea about the issue?
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    They might be. Did you check the console log for NaN observation error messages?
     
  3. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    there were no error message in the console.
    I've researched the topic and i've found out that, the performance drop, might be related to the ppo algorithm.
    Noobish explanation, in my own word:
    if the agent reaches an high score for too many iteration (eg. almost solved the environment perfectly, but still 75% of training to go) , it might start trying different things due to the high entropy, if it does too many weird things, the bad experiences will clog the next batch, corrupting the training performances beyond recover.

    Saw suggestions about lowering the learning rate, and tweaking the batch/buffer size, to avoid taking step too big during policy upgrade.

    i've implemented curriculum learning, that will avoid the problem of getting to a solution too early.

    still no idea what caused the premature stop of the previous training session.
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Interesting! Could you share the source where you found this?
    I've seen sudden performance drops a couple of times, but was assuming there's something wrong with my agent design.
     
  5. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    i haven't found specific paper on the subject, but i've read every reddit and article, talking about :
    Performance drop, reward collapse, ppo learning instability.
    I've played around with these keywords, and every now and then someone talks about this exact problem.

    i've noticed the issue in other simulations i wrote, but more often than not, the agent recovered, so i just thought it was related to some exploration o the action space.

    Tonight i've run another training session, this time i had curriculum enabled, and it went on for 50 million step with no problem or performance drop.
    Last time it went crazy after 10M , being around max score for the last 4 (no curriculum).

    Have you ever noticed the problem with curr enabled?
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    No, not specifically with a curriculum. I guess the problem being related to high entropy makes sense. Although, my naive thinking so far was that the algorithm incrementally tries variations on the current policy all the time. And that it would always plateau if no better one can be found, rather than completely degrade all of a sudden.
    But yeah, I think I've seen these kind of drops more often with the learning rate set to constant instead of linear.
     
    m4l4 likes this.
  7. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    there must be something that screws the policy somehow, i understand "jumping off the cliff" to see what happen, but after doing that 100 times with no result, it should go back to the previous working strategy, not trying different jump styles over and over.