Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

ELO calculation in an ML-Agents self play training process

Discussion in 'ML-Agents' started by Streiicher, Dec 16, 2020.

  1. Streiicher

    Streiicher

    Joined:
    Sep 26, 2020
    Posts:
    2
    I look for some help to understand the ELO results in the training process of the following game I created:

    The game is a symmetric 17+4 variant for 2 players (that is a Black Jack variant with 32 cards and card values which are different from Black Jack and no dealer).

    In the first example the values of the 8 cards in every suit are 1,2,3,4,5,6,7,8. This seems to work well: PPO with self play delivers these graphs for the mean rewards and elo:

    p2.png p1.png

    If I play against the produced brain model, the model plays very well. I coundn‘t detect any failures done by the model.

    In a second example the values of the 8 cards in every suit are 2,3,4,7,8,9,10,11. The same PPO / self play training now delivers these graphs for the mean rewards and elo:

    p4.png p3.png

    There is a small bias in the cumulative rewards and, after a roughly 1 million steps, a decrease in elo.
    But: the code has been reviewed intensively, so that I am quite confident that the bias isn´t caused by my code. And: the trained model again plays very well versus a human player, so I wonder about the decreasing elo.

    My questions: Do you have an idea what can cause such effects? Where can I find really detailed documentation of the training process and Elo calculation?

    Thanks a lot!
     
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    HI @Streiicher,
    You can find the documentation for self-play here. I will reach out to someone on our research team to see if they can answer your questions.
    Cheers,
    Chris
     
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Hi @Streiicher

    It looks like the final rewards are negative. The ELO calculation assumes the final reward determines the winner i.e. a positive reward indicates winning, negative losing, zero draw. So, what appears to be happening is the agent is always 'losing'.

    If it doesn't seem possible to specify a reward function that satisfies this for your game, please let me know and we can try to help.
     
  4. Streiicher

    Streiicher

    Joined:
    Sep 26, 2020
    Posts:
    2
    Hi @andrewcoh_unity, thx for your reply. The reward structure ist: 1 for win and -1 for loss of agent, 0 for a draw. In the documentation Christopher sent before I cannot find something about ELO calculation. Can you give me a hint about to look this up?
    Best, Martin
     
  5. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162