Search Unity

Self-Play resulting in Negative ELO with mlagents 1.0.0

Discussion in 'ML-Agents' started by Evercloud, May 11, 2020.

  1. Evercloud

    Evercloud

    Joined:
    Apr 29, 2013
    Posts:
    15
    Hey!

    I kindly need some assistance.
    I am training a simple soccer agent AI with PPO and Self Play.
    The orange curve in picture (although almost flat) is an example of a typical ELO trend of my project with 0.14.1: it starts at 1200, it can decrease at the beginning, but then grows.
    As I switched to Unity 2019.3.12 and mlagents 1.0.0 I noticed that I can't find a way to make ELO grow anymore. I even wondered if it is normal for ELO to go negative... I guess it would require each model iteration to be steadily worse than previous iterations, and that explanation is not easy to support, as the training works and the NN learns as expected (well... kind of :D ).
    At the beginning I thought I had done some mistake when migrating from 0.14.1 to 0.16, but after many attempts I still can't make it grow.
    Please advice :)
     

    Attached Files:

    • ELO.PNG
      ELO.PNG
      File size:
      24.3 KB
      Views:
      382
    Last edited: May 11, 2020
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Can you share your reward function? This could happen if agents are always seeing a negative reward at the end of an episode.
     
  3. bobben

    bobben

    Joined:
    Jan 22, 2013
    Posts:
    12
    m_BallTouch
    --curriculum=config/curricula/soccer.yaml
    It's done
     
  4. Evercloud

    Evercloud

    Joined:
    Apr 29, 2013
    Posts:
    15
    @andrewcoh_unity ,

    I am still trying many different reward functions, but the result is still the same: unless I make sure that agents get a positive reward, the ELO decreases. Thanks to your hint I am now testing only positive rewards to try and see if that actually makes a difference.
    I am no expert (at all) but if avg reward increases (although still in the negative space), ELO should increase, shouldn't it?

    @bobben I was thinking about curriculum, too! I will try that, thanks! :) The point is that NN learns despite the ELO value, so it gets a little confusing.
     
    Last edited: May 12, 2020
  5. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    The ELO is calculated using the final reward. If an agent wins, the final reward must be positive. If an agent loses, the final reward must be negative. Otherwise, 0 indicates a draw.
     
  6. BanJaxe

    BanJaxe

    Joined:
    Nov 20, 2017
    Posts:
    47
    Does "final reward" mean the very last reward the agent receives or the final cumulative reward over the entire episode?
     
  7. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    The very last reward the agent receives. Is this too restrictive? It is non-trivial to use accumulated reward when competing with many agents but if there is a clear use case for this we'd have to consider modifying the existing code.
     
  8. BanJaxe

    BanJaxe

    Joined:
    Nov 20, 2017
    Posts:
    47
    No it's not problem I was just clarifying. Thanks.
     
  9. amplez

    amplez

    Joined:
    Aug 19, 2018
    Posts:
    5
    In an environment where multiple agents fight for limited resources, accumulated reward would better represent their skill.
    Also it'd be easier to integrate competitive training into existing training environments that are already using accumulated rewards.
     
  10. KaushalAgrawal

    KaushalAgrawal

    Joined:
    Dec 18, 2019
    Posts:
    8
    Hi, I am making a 4 player turn based card game team of 2 vs 2. While training what is happening is my ElO increases with +ve mean group reward but at 200000 step, when there is team swap, it starts decreasing rapidly with -ve mean group reward. Is this normal?