Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Shaping reward in Self-Play

Discussion in 'ML-Agents' started by Glolut, Feb 27, 2022.

  1. Glolut

    Glolut

    Joined:
    Jun 30, 2021
    Posts:
    7
    According to docs, the reward in self-play should be +1for winning, -1 for losing, and 0 for a draw. However, in my case, it's hard for learning if rewards are only 1, -1, 0. I put some more rewards which are beyond 1 and -1 to teach them useful actions. Then in finals, if then agents win => SetReward(1f), lose => SetReward(-1), at maxStep => SetReward(0f).
    After 1 million step training, I saw only negative mean reward (50k each summary), and ELO decreased overall.
    I'm wondering if this is caused because of my reward shape.
     
  2. weight_theta

    weight_theta

    Joined:
    Aug 23, 2020
    Posts:
    65
    This can occur, especially when you give negative rewards, you may want to clip your reward to never go below a certain threshold.
    Note 1 million is not a lot, try 60 million and then adjust your negative reward accordingly. It would be good if your reward or penalties are a function of the agents actions.