Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Can't seem to normalize mean reward

Discussion in 'ML-Agents' started by Print_Hello_World, Feb 20, 2020.

  1. Print_Hello_World

    Print_Hello_World

    Joined:
    Jan 14, 2020
    Posts:
    12
    Hi guys,

    I am trying to train a drone to fly towards a given direction, I have a reward function that uses AddReward(totalreward) where i made sure that totalreward is normalized. (I checked it by running a debug.log(totalreward) and could see that it was between -1 and 1. However, my mean reward is in the hundreds. I have set max step to be 200 and decision interval to be 1. Can somebody help me?

    Here are some screenshots of my training progress and settings:
    upload_2020-2-20_10-24-41.png
    upload_2020-2-20_10-23-23.png
     

    Attached Files:

    Last edited: Feb 20, 2020
  2. jeffrey_unity538

    jeffrey_unity538

    Unity Technologies

    Joined:
    Feb 15, 2018
    Posts:
    59
    hi - the mean reward is the cumulative reward over the # of steps. Let me know if that helps clarify.
     
  3. Print_Hello_World

    Print_Hello_World

    Joined:
    Jan 14, 2020
    Posts:
    12
    Oh ok I had my suspicion that that was the case.
    So if I normalize my rewards per time step according to the best practices it should be ok right? regardless of the cumulative/mean reward.
     
  4. jeffrey_unity538

    jeffrey_unity538

    Unity Technologies

    Joined:
    Feb 15, 2018
    Posts:
    59
    yes, best practice is to normalize the way you have described.
     
    Print_Hello_World likes this.