Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Resolved self-play oscilates

Discussion in 'ML-Agents' started by GamerLordMat, Jan 1, 2023.

  1. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    185
    Hello all,

    when I train my self-play Boxing Agent, every 200.000 steps when the team changes my reward goes to minus, and then after 200.000 steps it gets strictly postive again. So it kind of oscilates.
    I give symetric rewards, so if one Agent hits the other, the other gets the same amount of points flipped with a minus until the timer ends or one has scored more than 1 point (with one hit giving about 0.07 points). it traines pretty badly.

    the agents movement and almost all values are relative to reference frame and thus correctly mirrored (expect the local position relative to the root)

    Any idea why this keeps happening?

    https://drive.google.com/file/d/1DSV-L2SPSGi2ID3m59monjfaXgdljpfg/view?usp=share_link

    https://drive.google.com/file/d/1DE6wTR1TN-SvnZxlCBxZK_LBEaUa3UcE/view?usp=sharing
     
    Last edited: Jan 2, 2023
  2. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    185
    okay I fixed it. It was a bunch of smaller bugs leading to it

    1. that FindGameComponents seems random (thus giving Agent A points instead of Agent B)
    2. I did not reset everthing properly leading to input bugs at endEpisode()
    3. other Gameplay complications the I could not figure out but a human can
     
    hughperkins likes this.
  3. KaushalAgrawal

    KaushalAgrawal

    Joined:
    Dec 18, 2019
    Posts:
    8
    HI, Please tell me what did you do to solve the issue.
    I am facing same thing, as soon as it hits 200000 steps (Team Swap), elo starts decreasing with negative mean group rewards. Tried everything, but could not find the solution yet.