# ELO calculation in an ML-Agents self play training process

Discussion in 'ML-Agents' started by Streiicher, Dec 16, 2020.

1. ### Streiicher

Joined:
Sep 26, 2020
Posts:
2
I look for some help to understand the ELO results in the training process of the following game I created:

The game is a symmetric 17+4 variant for 2 players (that is a Black Jack variant with 32 cards and card values which are different from Black Jack and no dealer).

In the first example the values of the 8 cards in every suit are 1,2,3,4,5,6,7,8. This seems to work well: PPO with self play delivers these graphs for the mean rewards and elo:

If I play against the produced brain model, the model plays very well. I coundn‘t detect any failures done by the model.

In a second example the values of the 8 cards in every suit are 2,3,4,7,8,9,10,11. The same PPO / self play training now delivers these graphs for the mean rewards and elo:

There is a small bias in the cumulative rewards and, after a roughly 1 million steps, a decrease in elo.
But: the code has been reviewed intensively, so that I am quite confident that the bias isn´t caused by my code. And: the trained model again plays very well versus a human player, so I wonder about the decreasing elo.

My questions: Do you have an idea what can cause such effects? Where can I find really detailed documentation of the training process and Elo calculation?

Thanks a lot!

### Unity Technologies

Joined:
Sep 16, 2015
Posts:
735
HI @Streiicher,
You can find the documentation for self-play here. I will reach out to someone on our research team to see if they can answer your questions.
Cheers,
Chris

### Unity Technologies

Joined:
Sep 5, 2019
Posts:
162
Hi @Streiicher

It looks like the final rewards are negative. The ELO calculation assumes the final reward determines the winner i.e. a positive reward indicates winning, negative losing, zero draw. So, what appears to be happening is the agent is always 'losing'.

If it doesn't seem possible to specify a reward function that satisfies this for your game, please let me know and we can try to help.

4. ### Streiicher

Joined:
Sep 26, 2020
Posts:
2
Hi @andrewcoh_unity, thx for your reply. The reward structure ist: 1 for win and -1 for loss of agent, 0 for a draw. In the documentation Christopher sent before I cannot find something about ELO calculation. Can you give me a hint about to look this up?
Best, Martin

Joined:
Sep 5, 2019
Posts:
162