Search Unity

Does varying episode length influence the learning?

Discussion in 'ML-Agents' started by fedetask, Feb 12, 2020.

  1. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    I'll start with an example: an agent has to go around a a map by following a path composed of N checkpoints. The state is the relative position of the next 4 checkpoints. The agent receives +1 every time it takes a checkpoint, and -0.01 at every time-step. In training, maps have different sizes and number of checkpoints, therefore the total achievable reward in each episode varies accordingly to the number of checkpoints in the episode. Does this negatively affect the training using PPO?

    I'm confused between two answers:
    A) Yes, it negatively affects the training because the value of the exact same state changes with the map.
    B) No, because PPO considers the temporal difference error, not the total reward from a given state to the end of the episode.

    However, if B is correct, how can the agent then learn to "go fast"?
     
  2. caioc2

    caioc2

    Joined:
    May 11, 2018
    Posts:
    8
    If you have an environment with a final state (i.e marks the agent as done after reaching a goal) the episode length will always vary, it is not something you have control of.

    Now if you are talking about the agent max steps, yes it can greatly influence the training result. You must make sure that the max steps is long enough to capture the desired behavior, and account for the beginning, when the agent knows nothing and can take really long to do any meaningful action.

    About map size I would rather use it as a curriculum learning, start with small and simple maps and increase to larger and complex maps as the agent evolves.

    Trying to predict whats better just by thinking is really hard because it depends on a lot of variables which you may not be accounting for.