Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Do locomotion tasks use inference correctly on different update loops?

Discussion in 'ML-Agents' started by Creaturtle, Feb 25, 2021.

  1. Creaturtle

    Creaturtle

    Joined:
    Jan 24, 2018
    Posts:
    33
    Question regarding locomotion tasks in a static environment where the environment-space (not just local) position of the agent is important.

    My agent was trained to move using the DecisionRequester, with a period of 5, and RequestActions in-between decisions.

    It uses CharacterController.SimpleMove in FixedUpdate using the output of the network.

    I am using the trained network in a separate project for releases, and put the SimpleMove in the Update loop using a global displacement vector updated in inference results every 5 FixedUpdate frames.

    So far it seems to be performing the task as it was trained to.

    In principle, the agent shouldn't be dependent on the time scale or update loop because it moves to get closer to some environment-space position, based on it's current position. It also has stacked information so it (might) have learned to recognize how it's actions influence its future state.

    Does this generalize across static locomotion tasks that are environment-dependent, with the reward function linked to moving to environment-space areas?
     
  2. celion_unity

    celion_unity

    Unity Technologies

    Joined:
    Jun 12, 2019
    Posts:
    289
    First, a general comment: machine learning models (not just reinforcement learning or deep neural networks) work well when the inference data is "similar to" the training data. Mathematically speaking, the inputs should be drawn from the same distribution; if they're not, inference performance will likely not be good.

    For reinforcement learning, this applies not just to the observations, but how they change between steps (since the state transitions are implicitly part of the training and inference data). So making decisions with a different amount of game time between them could potentially lead to worse inference.

    In your case, it sounds like the training and inference timings might be slightly different, since the output from inference might not be acted on right away (the next Update call could be a bit after the 5th FixedUpdate call).

    I've never used Unity's CharacterController, but it looks like it works based on velocity, which should behave OK under variable frame rate. Doing things like moving a fixed distance (instead of velocity * deltaTime) or applying impulses (instead of forces) will behave differently with different deltaTimes, so don't do that.

    We definitely haven't done any experiments with this; AFAIK it's not possible to say conclusively that something will generalize to a different space.
     
    Creaturtle likes this.