Search Unity

Model behavior vs training

Discussion in 'ML-Agents' started by Neodamus, Apr 16, 2020.

  1. Neodamus

    Neodamus

    Joined:
    Jan 12, 2018
    Posts:
    14
    In my training environment, the theoretical max reward is 800 per session, and I have a brain that gets close to this in training: https://cl.ly/8dcfe1166282

    However, when I try to use this model on the same agent in Unity and infer values, the mean reward I'm getting is around 20-100 per session, so it doesn't seem to be behaving at all in use as it does in training.

    Does anyone have any insight on why this might be occurring? I have been trying to figure this out for days, thank you for your help!
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    I'll flag this for the team to give their thoughts. In case they ask, which version of C# & Python are you running? Additionally, do you have any console logs you can share? Thanks!
     
  3. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,

    I think this might be due to an issue with the simulation being run at large time scales. Could you try to run inference from Python? If you are using v0.15.X, run

    mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier> --load --time-scale=1

    Note the use of --load instead of --train and the --time-scale=1

    This should run your game using the model you trained with the run-id <run-identifier> (it will not train, just load the pre-trained model) If the reward reported in lower than 800, it probably means that your game behaves differently at lower vs higher time scales.
     
  4. ChrissCrass

    ChrissCrass

    Joined:
    Mar 19, 2020
    Posts:
    31
    If you're doing anything in Update rather than FixedUpdate (and are using physics at all) then you will probably see deviations...