Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Episode max steps and decision period

Discussion in 'ML-Agents' started by fedetask, Apr 4, 2020.

  1. fedetask


    Jan 17, 2020
    I read the documentation more than once, but the concept of episode steps, decision periods, and time horizons are still a bit unclear to me:
    • If I have 4000 steps in an episode and a decision period of 5, it means that there will be one observation-action-reward tuple every 5 environment steps. Therefore each trajectory will be 800 steps long. Am I right?
    • If the previous is true, should I tune all the parameters in the training (e.g. buffer_size, batch_size, time_horizon, etc) as an episode lasted 800 steps? This way time_horizon=800 means the whole episode
  2. awjuliani


    Unity Technologies

    Mar 1, 2017
    Hi fedetask,

    You are right on both accounts. 4000 engine steps with a decision period of 5 gets you 800 episode steps from the agent perspective. If you want your trajectories used for learning to last the entire episode, you'd set time_horizon to 800.