ELO calculation in an ML-Agents self play training process

Streiicher · Dec 16, 2020

I look for some help to understand the ELO results in the training process of the following game I created:

The game is a symmetric 17+4 variant for 2 players (that is a Black Jack variant with 32 cards and card values which are different from Black Jack and no dealer).

In the first example the values of the 8 cards in every suit are 1,2,3,4,5,6,7,8. This seems to work well: PPO with self play delivers these graphs for the mean rewards and elo:

If I play against the produced brain model, the model plays very well. I coundn‘t detect any failures done by the model.

In a second example the values of the 8 cards in every suit are 2,3,4,7,8,9,10,11. The same PPO / self play training now delivers these graphs for the mean rewards and elo:

There is a small bias in the cumulative rewards and, after a roughly 1 million steps, a decrease in elo.
But: the code has been reviewed intensively, so that I am quite confident that the bias isn´t caused by my code. And: the trained model again plays very well versus a human player, so I wonder about the decreasing elo.

My questions: Do you have an idea what can cause such effects? Where can I find really detailed documentation of the training process and Elo calculation?

Thanks a lot!

christophergoy · Dec 21, 2020

HI @Streiicher,
You can find the documentation for self-play here. I will reach out to someone on our research team to see if they can answer your questions.
Cheers,
Chris

andrewcoh_unity · Dec 21, 2020

Hi @Streiicher

It looks like the final rewards are negative. The ELO calculation assumes the final reward determines the winner i.e. a positive reward indicates winning, negative losing, zero draw. So, what appears to be happening is the agent is always 'losing'.

If it doesn't seem possible to specify a reward function that satisfies this for your game, please let me know and we can try to help.

Streiicher · Dec 26, 2020

Hi @andrewcoh_unity, thx for your reply. The reward structure ist: 1 for win and -1 for loss of agent, 0 for a draw. In the documentation Christopher sent before I cannot find something about ELO calculation. Can you give me a hint about to look this up?
Best, Martin

andrewcoh_unity · Jan 4, 2021

Hi @Streiicher

Here is the documentation https://github.com/Unity-Technologi...-Configuration-File.md#note-on-reward-signals

Search Unity

ELO calculation in an ML-Agents self play training process

Streiicher

christophergoy

Unity Technologies

andrewcoh_unity

Unity Technologies

Streiicher

andrewcoh_unity

Unity Technologies

Search Unity

Unity ID

Useful Searches

ELO calculation in an ML-Agents self play training process

Streiicher

christophergoy

Unity Technologies

andrewcoh_unity

Unity Technologies

Streiicher

andrewcoh_unity

Unity Technologies