Search Unity

ml-agents train an already trained model

Discussion in 'ML-Agents' started by SirRolf, Mar 5, 2020.

  1. SirRolf

    SirRolf

    Joined:
    Mar 5, 2020
    Posts:
    2
    I might be saying something really stupid but is it possible to train an already trained model instead of starting from scratch.
    If not is it possible to increase the number of steps it takes before it stops training?
     
  2. SmartMediaNL

    SmartMediaNL

    Joined:
    Sep 29, 2016
    Posts:
    77
    I like to know this as well. p.s. I do know you can increase the training steps in the config file: trainer_config.yaml
    (or make a copy and change that)
    Then changing the setting of the corresponding name you are using like default:

    and there change:

    max_steps: 5.0e5

    to for example:

    max_steps: 8.0e5

    or if your if you have the time:
    max_steps: 5.0e7
     
  3. ascrawfo

    ascrawfo

    Unity Technologies

    Joined:
    Aug 16, 2017
    Posts:
    4
    To confirm, we currently don't support training an already trained model from step 0, but you can increase the number of steps in the trainer_config.yaml as described above!
     
    gsprite likes this.
  4. SirRolf

    SirRolf

    Joined:
    Mar 5, 2020
    Posts:
    2
    Thank you that helps out a lot. Are there any plans to implement this in the future?
     
  5. Darian4

    Darian4

    Joined:
    Nov 5, 2018
    Posts:
    3
    If you can't train a model starting from another model what is the point of that command line option:
    --load: If set, the training code loads an already trained model to initialize the neural network before training.

    I haven't try it yet but I though it did exactly that.
     
    betike likes this.
  6. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    The --load is for resuming training, e.g. if you wanted to pause training and use your computer for something else.

    In principle, loading from a partially-trained model and resuming training is the same as training from an already trained model, so you're right - it does just initialize the NN from existing weights and continues training.
     
    betike likes this.
  7. ailuropoda0

    ailuropoda0

    Joined:
    Feb 26, 2020
    Posts:
    9
    betike likes this.
  8. radiantboy

    radiantboy

    Joined:
    Nov 21, 2012
    Posts:
    1,633
    Pretty sure now you can use --initialize-from=MoveToGoal --run-id=MoveToGoal2 where it uses data from MoveTopGoal to build upon previous learning. But, im not sure if you need to pass the new id in each time? for example :

    run one
    --initialize-from=MoveToGoal --run-id=MoveToGoal2
    run two
    --initialize-from=MoveToGoal2 --run-id=MoveToGoal3

    or whether you just do
    --initialize-from=MoveToGoal --run-id=MoveToGoal3

    anyone know?
     
  9. GB2102

    GB2102

    Joined:
    Jul 8, 2018
    Posts:
    7
    May 21,2023:
    try to use:
    Code (CSharp):
    1. >mlagents-learn --run-id=yourID --resume
     
    Last edited: May 21, 2023
  10. plat8899

    plat8899

    Joined:
    Sep 7, 2021
    Posts:
    3
    Just some little note from the little that i found.
    This option is to simply resume a run as it is. This has the added advantage that we preserve all the training history and stats.
    So make sure to not delete the folder and files of the run.
    So unless one edit the content to conform to what we need (dragons ahead), then we should just use it as intended.

    This option is the one that we probably should use if we want to resume the training from any earlier model and at any step. Note that we will only have the model and that the earlier history will be lost.
    But to make it work, ther are some important caveats.
    We needs a 'checkpoint.pt' file to initialize the training. 'checkpoint.pt' seem to basically be the <modelName>.pt file that is in the run folder beside the <model>.onnx one. Just change its name and extension.
    I think that the .pt file is the model in Pytorch format.
    So if you only have the onnx file, then you might be able to convert it to pytorch before changing its name.
    To know where to put that 'checkpoint.pt' file, just check any run and you will see the folder structure and its place in it.
     
  11. CloudyVR

    CloudyVR

    Joined:
    Mar 26, 2017
    Posts:
    715
    How do you restore a specific checkpoint? Is it even possible?