Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Continue at last step for loaded training?

Discussion in 'ML-Agents' started by JPhilipp, Jan 31, 2020.

  1. JPhilipp

    JPhilipp

    Joined:
    Oct 7, 2014
    Posts:
    56
    I'm having 3 different Runs (different Unity & Yaml Config settings) for my agent training. How can I ensure when continuing a past run training of these (using "... --train --load" on the blue run as pictured; it's using linear_rate, by the way) that it will continue from the last step it stopped, instead of jumping back to the very left in the graph?



    Thanks!
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
  3. JPhilipp

    JPhilipp

    Joined:
    Oct 7, 2014
    Posts:
    56
    The latest checkpoint file now contains:

    Code (CSharp):
    1. model_checkpoint_path: "model-135397.cptk"
    2. all_model_checkpoint_paths: "model-50000.cptk"
    3. all_model_checkpoint_paths: "model-100000.cptk"
    4. all_model_checkpoint_paths: "model-135397.cptk"
     
  4. JPhilipp

    JPhilipp

    Joined:
    Oct 7, 2014
    Posts:
    56
    I'm still having problems with this, does anyone know what to do to continue the training exactly where it left off?

    As it is, the Steps counter resets to zero everytime I use load, even when I know it does load the neural network (based on its performing level). When I then pick "relative" in Tensorboard it helps a bit -- at least it displays the lines chained side by side -- but it still feels like sometimes, the training cumulative success takes a brief but heavy fall before it recovers (I reckon that might be because it measures the training rate differently, as it thinks it's on step 0 again, and not say 100k).
     
  5. JPhilipp

    JPhilipp

    Joined:
    Oct 7, 2014
    Posts:
    56
  6. MaksChojniak

    MaksChojniak

    Joined:
    Feb 3, 2022
    Posts:
    1
    when creating an agent with ml agents for the first 9 million steps everything went well, but after that the agent became even worse than before. How can I go back to a certain step. My chart looks like this : https://ibb.co/0VpY1b4