Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice

Massive reward drop after resuming training

Discussion in 'ML-Agents' started by graphicDNA, Sep 8, 2020.

  1. graphicDNA

    graphicDNA

    Joined:
    Jun 26, 2017
    Posts:
    47
    Hi guys,

    It seems that pausing/resuming is not working well for me.

    I interrupted my training after 46h. When resuming it (with no change at all in the code or configuration, just pause and resume), something is clearly different in my agent's behavior. You can see the massive drop in cumulative reward in the following pic (blue lines after resuming).

    upload_2020-9-8_14-58-54.png

    It's weird because it doesn't start from scratch either (like if I used the --force flag, which works well by the way). It's like if the training went back in time, getting levels of reward seen after 2-3 hours of training. Then, after only 4-5 hours of re-training gets back to levels of reward only achieved in the original training after 20 or 22 hours.

    What am I doing wrong?. The command I use to resume is:

    mlagents-learn.exe configuration.yaml --run-id=STAND_STILL_SPRING1 --resume
    Is it enough with that, just adding the --resume flag? Anything else I should have in mind?

    Thanks a lot in advance

    PS: Analyzing console output, seems it properly loads the model and resumes training from the correct step number (see below), but still model fails to resume where it was paused:

    2020-09-08 20:22:34 INFO [tf_policy.py:218] Loading model from results\STAND_STILL_SPRING1\WalkerDynamic.
    2020-09-08 20:22:34 INFO [tf_policy.py:246] Resuming training from step 43697698.

    PS2: My training settings:

    behaviors:
    WalkerDynamic:
    trainer_type: ppo
    hyperparameters:
    batch_size: 2048
    buffer_size: 20480
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: true
    hidden_units: 512
    num_layers: 3
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    gamma: 0.995
    strength: 1.0
    keep_checkpoints: 5
    max_steps: 90000000
    time_horizon: 1000
    summary_freq: 30000
    threaded: true
     
    Last edited: Sep 8, 2020
  2. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @graphicDNA,
    Coincidentally, there was a github issue posted about this. One of my colleagues is looking into this now.
     
    andrewcoh_unity likes this.
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    christophergoy likes this.
  4. graphicDNA

    graphicDNA

    Joined:
    Jun 26, 2017
    Posts:
    47
    Thanks a lot guys, will look forward for that fix.

    The behavior I see in the agent is like if the model was partially correct, but for some reason some of the output values received in OnActionReceived(float[] continuousActions) were wrong.

    I can see that because I'm training a NN to control a chatacter's joints, and after resuming, some of the joints seem to behave correctly, while others don't.

    Maybe some of the values in that array are coming in a different order than in the previous training?

    I mean, correct values but wrong order. Something like that would definitely be compatible with the weird behavior I'm experiencing.

    Thanks a lot.
     
    Last edited: Sep 9, 2020
  5. graphicDNA

    graphicDNA

    Joined:
    Jun 26, 2017
    Posts:
    47
    Update: Seems to have been fixed on master here #4463.
     
    andrewcoh_unity likes this.