Search Unity

Constant ELO during self-play training

Discussion in 'ML-Agents' started by niels_modlai, Mar 6, 2020.

  1. niels_modlai

    niels_modlai

    Joined:
    Oct 8, 2019
    Posts:
    5
    The ELO doesn't increase from the initial value of 1200 when I run self-play. What could be the reason?
    Code (CSharp):
    1.  
    2. INFO:mlagents.trainers: ppo: My Behavior: Step: 5000. Time Elapsed: 184.189 s Mean Reward: -3.635. Std of Reward: 1.030. Training.
    3. INFO:mlagents.trainers: Learning brain My Behavior?team=1 ELO: 1200.000
    4. Mean Opponent ELO: 1200.000 Std Opponent ELO: 0.000
    5. INFO:mlagents.trainers: ppo: My Behavior: Step: 10000. Time Elapsed: 349.838 s Mean Reward: -6.406. Std of Reward: 6.074. Training.
    6. INFO:mlagents.trainers: Learning brain My Behavior?team=1 ELO: 1200.000
    7. Mean Opponent ELO: 1200.000 Std Opponent ELO: 0.000
    8. INFO:mlagents.trainers: ppo: My Behavior: Step: 15000. Time Elapsed: 519.350 s Mean Reward: -2.528. Std of Reward: 2.163. Training.
    9. INFO:mlagents.trainers: Learning brain My Behavior?team=1 ELO: 1200.000
    10. Mean Opponent ELO: 1200.000 Std Opponent ELO: 0.000
    11.  
    My config file:
    Code (CSharp):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-3
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     hidden_units: 128
    8.     lambd: 0.95
    9.     learning_rate: 3.0e-4
    10.     learning_rate_schedule: constant
    11.     max_steps: 5.0e5
    12.     memory_size: 128
    13.     normalize: false
    14.     num_epoch: 3
    15.     num_layers: 2
    16.     time_horizon: 64
    17.     sequence_length: 64
    18.     summary_freq: 5000
    19.     use_recurrent: false
    20.     vis_encode_type: simple
    21.     reward_signals:
    22.         extrinsic:
    23.             strength: 1.0
    24.             gamma: 0.99
    25.     self_play:
    26.         window: 10
    27.         play_against_current_self_ratio: 0.5
    28.         save_steps: 5000
    29.         swap_steps: 5000
    My behavior settings:

     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Same here, ELO stays constant at 1200.

    Code (CSharp):
    1.  
    2. (ml-agents14_1) C:\Users\mathias\Documents\Unity Projects\ml-agents-14.1\Project>mlagents-learn config/trainer_config.yaml --train --run-id=football_sp_2x256 --keep-checkpoints=100 --base-port=5006 --env=build/football.exe --no-graphics --num-envs=8 --time-scale=50  --load
    3. WARNING:tensorflow:From c:\users\mathias\anaconda3\envs\ml-agents14_1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    4. Instructions for updating:
    5. non-resource variables are not supported in the long term
    6.  
    7.  
    8.                         ▄▄▄▓▓▓▓
    9.                    ╓▓▓▓▓▓▓█▓▓▓▓▓
    10.               ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
    11.            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
    12.           ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
    13.         ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
    14.         ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
    15.           ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
    16.             '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
    17.                ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
    18.                    `▀█▓▓▓▓▓▓▓▓▓▌
    19.                         ¬`▀▀▀█▓
    20.  
    21.  
    22. Version information:
    23.   ml-agents: 0.14.1,
    24.   ml-agents-envs: 0.14.1,
    25.   Communicator API: API-14,
    26.   TensorFlow: 2.0.1
    27. WARNING:tensorflow:From c:\users\mathias\anaconda3\envs\ml-agents14_1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    28. Instructions for updating:
    29. non-resource variables are not supported in the long term
    30. [... repeats for every environment instance]
    31. INFO:mlagents_envs:Connected new brain:
    32. Football?team=0
    33. INFO:mlagents_envs:Connected new brain:
    34. Football?team=1
    35. [... repeats for every environment instance]
    36. INFO:mlagents.trainers:Hyperparameters for the GhostTrainer of brain Football:
    37.         trainer:        ppo
    38.         batch_size:     2048
    39.         beta:   0.005
    40.         buffer_size:    20480
    41.         epsilon:        0.2
    42.         hidden_units:   256
    43.         lambd:  0.95
    44.         learning_rate:  0.0003
    45.         learning_rate_schedule: constant
    46.         max_steps:      10.0e7
    47.         memory_size:    256
    48.         normalize:      False
    49.         num_epoch:      3
    50.         num_layers:     2
    51.         time_horizon:   1000
    52.         sequence_length:        64
    53.         summary_freq:   10000
    54.         use_recurrent:  False
    55.         vis_encode_type:        simple
    56.         reward_signals:
    57.           extrinsic:
    58.             strength:   1.0
    59.             gamma:      0.99
    60.         summary_path:   football_sp_2x256_Football
    61.         model_path:     ./models/football_sp_2x256/Football
    62.         keep_checkpoints:       100
    63.         self_play:
    64.           window:       10
    65.           play_against_current_self_ratio:      0.5
    66.           save_steps:   50000
    67.           swap_steps:   50000
    68. 2020-03-07 09:01:21.638211: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    69. INFO:mlagents.trainers:Loading Model for brain Football?team=1
    70. INFO:mlagents.trainers:Loading Model for brain Football?team=0
    71. INFO:mlagents.trainers: football_sp_2x256: Football: Step: 2660000. No episode was completed since last summary. Training.
    72. INFO:mlagents.trainers: Learning brain Football?team=1 ELO: 1200.000
    73. ...
     

    Attached Files:

  3. niels_modlai

    niels_modlai

    Joined:
    Oct 8, 2019
    Posts:
    5
    I found my issue while debugging trainers/ghost/trainer.py:_process_trajectory(). I wasn't careful enough with my reward signal so a winner was never detected. Maybe it's the same issue for you mbaske?
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Did you actually send a reward before? I'm setting +1 (win) and -1 (loose), but not seeing any ELO change.
     
  5. WalkLearner

    WalkLearner

    Joined:
    Mar 12, 2020
    Posts:
    10
    I don't know if you have already solved the issue. In order to make ELO work, you need to add 0 for draw, +1 for winning and -1 for losing respectively to both teams as the last reward signal.