Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Join us on March 30, 2023, between 5 am & 1 pm EST, in the Performance Profiling Dev Blitz Day 2023 - Q&A forum and Discord where you can connect with our teams behind the Memory and CPU Profilers.
    Dismiss Notice

Resolved self-play oscilates

Discussion in 'ML-Agents' started by GamerLordMat, Jan 1, 2023.

  1. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    126
    Hello all,

    when I train my self-play Boxing Agent, every 200.000 steps when the team changes my reward goes to minus, and then after 200.000 steps it gets strictly postive again. So it kind of oscilates.
    I give symetric rewards, so if one Agent hits the other, the other gets the same amount of points flipped with a minus until the timer ends or one has scored more than 1 point (with one hit giving about 0.07 points). it traines pretty badly.

    the agents movement and almost all values are relative to reference frame and thus correctly mirrored (expect the local position relative to the root)

    Any idea why this keeps happening?

    https://drive.google.com/file/d/1DSV-L2SPSGi2ID3m59monjfaXgdljpfg/view?usp=share_link

    https://drive.google.com/file/d/1DE6wTR1TN-SvnZxlCBxZK_LBEaUa3UcE/view?usp=sharing
     
    Last edited: Jan 2, 2023
  2. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    126
    okay I fixed it. It was a bunch of smaller bugs leading to it

    1. that FindGameComponents seems random (thus giving Agent A points instead of Agent B)
    2. I did not reset everthing properly leading to input bugs at endEpisode()
    3. other Gameplay complications the I could not figure out but a human can
     
    hughperkins likes this.