Search Unity

  1. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice
  2. Ever participated in one our Game Jams? Want pointers on your project? Our Evangelists will be available on Friday to give feedback. Come share your games with us!
    Dismiss Notice

unity-ml: Auto retain highest reward model ?

Discussion in 'ML-Agents' started by mentalgear, Jul 28, 2019.

  1. mentalgear

    mentalgear

    Joined:
    Jul 19, 2019
    Posts:
    10
    I just tried out the 3d Ball example, and during training, there was a step were Reward reached 100, however, I stopped training later at 80.
    When training stops, does it automatically save the best model with the heighest reward ?

    Thanks

    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 52000. Time Elapsed: 370.484 s Mean Reward: 100.000. Std of Reward: 0.000. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 53000. Time Elapsed: 377.597 s Mean Reward: 70.676. Std of Reward: 34.083. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 54000. Time Elapsed: 384.724 s Mean Reward: 62.453. Std of Reward: 39.364. Training.
    INFO:mlagents.trainers: firstRun-0: 3DBallHardLearning: Step: 55000. Time Elapsed: 391.841 s Mean Reward: 80.014. Std of Reward: 28.839. Training.
    ^CUnityEnvironment worker: keyboard interrupt
    INFO:mlagents.envs:Learning was interrupted. Please wait while the graph is generated.
    INFO:mlagents.envs:Saved Model
     
  2. dracolytch

    dracolytch

    Joined:
    Jan 1, 2016
    Posts:
    13
    No, it does not. It saves the latest trained model. This may seem counter-intuitive, but the system doesn't know if it got a perfect score because it had a better policy, or if it just got lucky (which happens).
     
  3. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    147
    This isn't currently possible, but it's something that we'd like to add support for in the future. Our internal tracking ID for this is MLA-553.
     
  4. martinezalonso

    martinezalonso

    Joined:
    Nov 9, 2018
    Posts:
    3
    Holy god, this would be incredible! I spend a day crunching on something (it reached 100) and then something happened where it collapsed and now it's at -30. This seems so obvious, I'm surprised it's not part of Tensorflow training natively.
     
  5. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    147
    Tensorflow saves its own checkpoints. You can use
    --resume
    on the mlagents-learn commandline, using the run ID for the previous run.

    The change to save .nn files at checkpoints should be merged next week.
     
    andrzej_ likes this.
  6. martinezalonso

    martinezalonso

    Joined:
    Nov 9, 2018
    Posts:
    3
    Hey Celion, I deeply appreciate the response and help. Is there an internal bug tracking the evaluation of always keeping the highest reward .nn file?
     
unityunity