Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Question Question about Initialize From

Discussion in 'ML-Agents' started by caileanfinn, Sep 20, 2023.

  1. caileanfinn

    caileanfinn

    Joined:
    Aug 19, 2022
    Posts:
    5
    I'm curious about how initialize from works. If we take the Walker example, and initially create a model where the agent can walk to a target location. If I was to make the environment more complex, lets say for example, the agent needs to search a set of rooms for the target, without hitting a wall or touching the ground - the reward system would change from (matchspeedreward * lookattarget) to (+1 for touching the target & -0.001 for each step). Would I be able to use the weights of the previous model to train this? or would it be too big of a change?

    I'm just trying to figure out what is possible with --initialisefrom, and what can/can't be changed. I'm aware that the observations, actions, and hyperparams can't change.

    Any help would be appreciated
     
  2. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    6,922
    As far as I understand any changes beyond the technical limitations of keeping the parameters the same are experimental choices that may or may not provide the desired results.

    Training with the existing model may force your AI to re- or unlearn too many things, but more likely it's going to give it a boost so that it is able to more quickly adapt to the changed environment.

    I retrained the walker without changes with its previous 4 hour run and trained it for another 8 hours. The result was essentially the same from observations alone, but according to tensorboard stats the reward scores were significantly lower. Upon closer inpsection the retrained AI did have the occassional issue where it walked very slowly - probably preferring to play it safe.
     
  3. caileanfinn

    caileanfinn

    Joined:
    Aug 19, 2022
    Posts:
    5
    Thanks for getting back.

    That's good to know. Would this be considered Transfer Learning?

    I'm just curious what happens if you completely overhaul the reward system - I'll need to experiment with it!