Search Unity

Question How does "initialize-from / init_path" actually work?

Discussion in 'ML-Agents' started by Sherlore, May 30, 2022.

  1. Sherlore

    Sherlore

    Joined:
    Feb 26, 2014
    Posts:
    13


    I attempt to train a Unity ragdoll to perform some task. As the video, the ragdoll is capable of running.
    But I failed to start another training with the trained running model with "initialize-from" in command or "init_path" in configuration.

    Even I tested it, with the same environment and configuration where the running model was just born. The peformance just dropped quickly and stucked in poor result. As the log below, the first 30000 step was good with the trained running model, but it became worse with the new training progressing.

    Code (CSharp):
    1.  INFO [tf_policy.py:118] Loading model for brain GurrenBattle?team=0 from ./models/GurrenRun/GurrenBattle.
    2. INFO [tf_policy.py:143] Starting training from step 0 and saving to ./models/GurrenChase2/GurrenBattle.
    3. tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
    4. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 30000. Time Elapsed: 20.532 s Mean Reward: 76.215. Std of Reward: 30.253. Training.
    5. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 60000. Time Elapsed: 33.016 s Mean Reward: 66.936. Std of Reward: 33.256. Training.
    6. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 90000. Time Elapsed: 45.812 s Mean Reward: 18.489. Std of Reward: 31.908. Training.
    7. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 120000. Time Elapsed: 60.078 s Mean Reward: 0.992. Std of Reward: 1.093. Training.
    8. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 150000. Time Elapsed: 76.910 s Mean Reward: 0.119. Std of Reward: 0.780. Training.
    9. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 180000. Time Elapsed: 94.182 s Mean Reward: -0.162. Std of Reward: 0.550. Training.
    10. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 210000. Time Elapsed: 111.804 s Mean Reward: -0.309. Std of Reward: 0.469. Training.
    11. INFO [stats.py:111] GurrenChase2_GurrenBattle: Step: 240000. Time Elapsed: 127.550 s Mean Reward: -0.288. Std of Reward: 0.774. Training.

    The docs noted
    But I am confused about how to use initialize-from / init_path correctly after trying. Could anyone share how does "initialize-from / init_path" actually work? and how to use it correctly?

    Many thanks

    (ML: Verified Package 1.0.8 Unity: 2020.3.34f1)

    Sincerely,
    Sherlore
     
  2. sarachan

    sarachan

    Joined:
    Jul 21, 2013
    Posts:
    43
    I am not the most expert in this, but I have had success using initialize-from when I make the environment more complex. In other words, I train the first time in a simplified environment, and then train a second time in a more complex environment using initialize-from to start from the first model. This sometimes works better than training the first time using the complex environment, going step-by-step to create the more complex model.

    I hope that comment is helpful. :)
     
  3. Sherlore

    Sherlore

    Joined:
    Feb 26, 2014
    Posts:
    13
    That sounds really great. To train a model for a complex scenario is exactly what I try to do with initialize-from. In your case, it seems to work nice and intuitively. your comment is helpful information. Thank you!

    May I ask what version of Unity ML did you use to train the above model?

    And if you trained a model A in an environment X, then if you start a new training with initialize-from model A and in exact same environment X.
    Does the new training keep the trained performance and continue to improve? Or its performance would drop quickly as my case?
     
  4. sarachan

    sarachan

    Joined:
    Jul 21, 2013
    Posts:
    43
    I am using ML Agents version 1.0.8.

    The example that I have been using starts with a nearly empty environment -- just the agent and a moving target -- to train the initial model. I then add more obstacles to the same environment and train with initialize-from the first model. The new training did keep the trained performance from the first training but may drop off a bit at first because the problem has become harder. After training the agent improves at avoiding the new obstacles.

    I have not yet tried applying the same model in a completely different environment but will try that soon. It will be interesting to see whether the models are generalizable to a new environment or will require retraining, in which case I would try initialize-from again.
     
    Last edited: May 31, 2022