Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Decision interval and rewards

Discussion in 'ML-Agents' started by fedetask, Jan 27, 2020.

  1. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    Hi,
    If I set a decision interval what happens with the rewards during training? Are rewards assigned in "non decision" calls to AgentAction() considered?

    I'm asking this because I'm experiencing a strange behavior in my agent. My agent can be summed up like this:
    It moves around the map and follows a serie of points. It gets +0.5 for each point and -0.001 at each timestep. After some N steps, or when reaching the last point, the episode ends.
    By setting decision interval = 1, rewards are consistent: whenever the agent gets ONLY the first checkpoint, the GetCumulativeReward() method returns 0.5 -0.001 * N. However, by setting decision interval = 5, when the agent collects ONLY the first point, the rewards are inconsistent: sometimes GetCumulativeReward() returns 0.5 -0.001 * N, sometimes only -0.001*N.

    I have the suspect that rewards are considered only in every 5 steps
     
  2. fedetask

    fedetask

    Joined:
    Jan 17, 2020
    Posts:
    7
    I understood the issue. I'm using SetReward() instead of AddReward(), and therefore if the agent collects a checkpoint in an action not multiple of 5, the following action will SetReward(-0.001) and overwrite it.