Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Shaping reward in Self-Play

Discussion in 'ML-Agents' started by Glolut, Feb 27, 2022.

  1. Glolut

    Glolut

    Joined:
    Jun 30, 2021
    Posts:
    7
    According to docs, the reward in self-play should be +1for winning, -1 for losing, and 0 for a draw. However, in my case, it's hard for learning if rewards are only 1, -1, 0. I put some more rewards which are beyond 1 and -1 to teach them useful actions. Then in finals, if then agents win => SetReward(1f), lose => SetReward(-1), at maxStep => SetReward(0f).
    After 1 million step training, I saw only negative mean reward (50k each summary), and ELO decreased overall.
    I'm wondering if this is caused because of my reward shape.
     
  2. weight_theta

    weight_theta

    Joined:
    Aug 23, 2020
    Posts:
    65
    This can occur, especially when you give negative rewards, you may want to clip your reward to never go below a certain threshold.
    Note 1 million is not a lot, try 60 million and then adjust your negative reward accordingly. It would be good if your reward or penalties are a function of the agents actions.