Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Decision interval and rewards

Discussion in 'ML-Agents' started by fedetask, Jan 27, 2020.

  1. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    Hi,
    If I set a decision interval what happens with the rewards during training? Are rewards assigned in "non decision" calls to AgentAction() considered?

    I'm asking this because I'm experiencing a strange behavior in my agent. My agent can be summed up like this:
    It moves around the map and follows a serie of points. It gets +0.5 for each point and -0.001 at each timestep. After some N steps, or when reaching the last point, the episode ends.
    By setting decision interval = 1, rewards are consistent: whenever the agent gets ONLY the first checkpoint, the GetCumulativeReward() method returns 0.5 -0.001 * N. However, by setting decision interval = 5, when the agent collects ONLY the first point, the rewards are inconsistent: sometimes GetCumulativeReward() returns 0.5 -0.001 * N, sometimes only -0.001*N.

    I have the suspect that rewards are considered only in every 5 steps
     
  2. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    I understood the issue. I'm using SetReward() instead of AddReward(), and therefore if the agent collects a checkpoint in an action not multiple of 5, the following action will SetReward(-0.001) and overwrite it.