Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Why is the STD of mean reward so high?

Discussion in 'ML-Agents' started by jasonrdunne, Mar 8, 2020.

  1. jasonrdunne

    jasonrdunne

    Joined:
    Mar 24, 2019
    Posts:
    4


    I just ran the ball balance hard example scene. Can someone enlighten me as to why the STD of mean reward is so high? At some points, it's even higher than the mean reward itself. Does this just mean that the agents are very inconsistent? None of the agents looked like they were doing terribly, although some seemed worse than others.
     
  2. arminator444

    arminator444

    Joined:
    Aug 1, 2018
    Posts:
    3
    I have the same problem and dont know why.
     
  3. awjuliani

    awjuliani

    Unity Technologies

    Joined:
    Mar 1, 2017
    Posts:
    69
    Hello. Yes, a high standard deviation corresponds to the agent having a variety of different final rewards in the training episodes. For tasks which are harder to learn, or which the agent may not have completely mastered, it is not uncommon to see this. In fact, during training you would expect it to increase, as the agent moves from a bad policy to a good policy. As long as the mean reward is consistently increasing, and the final behavior looks good, it is nothing to worry about.