Search Unity

unity-ml: Auto retain highest reward model ?

Discussion in 'ML-Agents' started by mentalgear, Jul 28, 2019.

  1. mentalgear

    mentalgear

    Joined:
    Jul 19, 2019
    Posts:
    23
    I just tried out the 3d Ball example, and during training, there was a step were Reward reached 100, however, I stopped training later at 80.
    When training stops, does it automatically save the best model with the heighest reward ?

    Thanks

    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 52000. Time Elapsed: 370.484 s Mean Reward: 100.000. Std of Reward: 0.000. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 53000. Time Elapsed: 377.597 s Mean Reward: 70.676. Std of Reward: 34.083. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 54000. Time Elapsed: 384.724 s Mean Reward: 62.453. Std of Reward: 39.364. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 55000. Time Elapsed: 391.841 s Mean Reward: 80.014. Std of Reward: 28.839. Training.
    ^CUnityEnvironment worker: keyboard interrupt
    INFO:mlagents.envs:Learning was interrupted. Please wait while the graph is generated.
    INFO:mlagents.envs:Saved Model
     
  2. dracolytch

    dracolytch

    Joined:
    Jan 1, 2016
    Posts:
    19
    No, it does not. It saves the latest trained model. This may seem counter-intuitive, but the system doesn't know if it got a perfect score because it had a better policy, or if it just got lucky (which happens).
     
  3. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    This isn't currently possible, but it's something that we'd like to add support for in the future. Our internal tracking ID for this is MLA-553.
     
  4. martinezalonso

    martinezalonso

    Joined:
    Nov 9, 2018
    Posts:
    5
    Holy god, this would be incredible! I spend a day crunching on something (it reached 100) and then something happened where it collapsed and now it's at -30. This seems so obvious, I'm surprised it's not part of Tensorflow training natively.
     
  5. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Tensorflow saves its own checkpoints. You can use
    --resume
    on the mlagents-learn commandline, using the run ID for the previous run.

    The change to save .nn files at checkpoints should be merged next week.
     
    andrzej_ likes this.
  6. martinezalonso

    martinezalonso

    Joined:
    Nov 9, 2018
    Posts:
    5
    Hey Celion, I deeply appreciate the response and help. Is there an internal bug tracking the evaluation of always keeping the highest reward .nn file?
     
  7. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Sorry, there hasn't been any progress on the issue since last time. The tracker ID is still MLA-553.
     
  8. ahsen35813

    ahsen35813

    Joined:
    Sep 15, 2019
    Posts:
    9
    I can't wait for this feature! It would be so incredible. For example, I am training some cars on a racetrack, and they start off slowly and learn to make it all the way around the track. Then they begin speeding up for more reward, and they get really good at it. But then they get too fast and start crashing, and their reward drops right back to down to where they started from. This feature would be incredible as this is something that's really causing problems for me.

    To normalize the luck factor, perhaps you can use a smoothed value of the reward history (like the chart you get when you use tensorflow --logdir) to determine performance to decide when to make a backup? I was also thinking perhaps it can be done every 500k steps by default the way it currently updates the .onnx file, and the user could be given the ability to adjust frequency of updates.

    I can't wait for this feature. It would be incredibly helpful.
     
    Last edited: Mar 23, 2021