Search Unity

Question Self-play training doesn't work

Discussion in 'ML-Agents' started by Yuulis04, Apr 3, 2022.

  1. Yuulis04

    Yuulis04

    Joined:
    Apr 4, 2021
    Posts:
    28
    Hello. I wanted to help my agents learn the game of reversi and I used a self-play learning. However, from the logs it looks like the self-play learning is not working.
    Is there anything I am missing?

    My environment :
    Unity 2021.2.16.f1
    mlagents release 17

    Logs :
    Code (boo):
    1. Version information:
    2.   ml-agents: 0.26.0,
    3.   ml-agents-envs: 0.26.0,
    4.   Communicator API: 1.5.0,
    5.   PyTorch: 1.7.1+cu110
    6. [INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0
    7. [INFO] Connected new brain: ReversiAgent?team=0
    8. [INFO] Hyperparameters for behavior name ReversiAgent:
    9.         trainer_type:   ppo
    10.         hyperparameters:
    11.           batch_size:   64
    12.           buffer_size:  2048
    13.           learning_rate:        0.0001
    14.           beta: 0.0001
    15.           epsilon:      0.2
    16.           lambd:        0.95
    17.           num_epoch:    3
    18.           learning_rate_schedule:       constant
    19.         network_settings:
    20.           normalize:    True
    21.           hidden_units: 512
    22.           num_layers:   3
    23.           vis_encode_type:      simple
    24.           memory:       None
    25.           goal_conditioning_type:       hyper
    26.         reward_signals:
    27.           extrinsic:
    28.             gamma:      0.995
    29.             strength:   1.0
    30.             network_settings:
    31.               normalize:        False
    32.               hidden_units:     128
    33.               num_layers:       2
    34.               vis_encode_type:  simple
    35.               memory:   None
    36.               goal_conditioning_type:   hyper
    37.         init_path:      None
    38.         keep_checkpoints:       200
    39.         checkpoint_interval:    500000
    40.         max_steps:      1000000
    41.         time_horizon:   2048
    42.         summary_freq:   10000
    43.         threaded:       False
    44.         self_play:
    45.           save_steps:   20000
    46.           team_change:  100000
    47.           swap_steps:   10000
    48.           window:       30
    49.           play_against_latest_model_ratio:      0.5
    50.           initial_elo:  1200.0
    51.         behavioral_cloning:     None
    52. [INFO] Connected new brain: ReversiAgent?team=1
    53. [INFO] ReversiAgent. Step: 10000. Time Elapsed: 165.283 s. Mean Reward: 0.000. Std of Reward: 0.035. Training.
    54. [INFO] ReversiAgent. Step: 20000. Time Elapsed: 308.563 s. Mean Reward: -0.000. Std of Reward: 0.033. Training.
    55. [INFO] ReversiAgent. Step: 30000. Time Elapsed: 452.128 s. Mean Reward: 0.001. Std of Reward: 0.035. Training.
    56. ...
    config file :
    Code (yaml):
    1. behaviors:
    2.   ReversiAgent:
    3.     trainer_type: ppo
    4.  
    5.     hyperparameters:
    6.       batch_size: 64
    7.       buffer_size: 2048
    8.       learning_rate: 1e-4
    9.       beta: 1e-4
    10.       epsilon: 0.2
    11.       lambd: 0.95
    12.       num_epoch: 3
    13.       learning_rate_schedule: constant
    14.  
    15.     network_settings:
    16.       normalize: true
    17.       hidden_units: 512
    18.       num_layers: 3
    19.       vis_encode_type: simple
    20.  
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.995
    24.         strength: 1.0
    25.  
    26.     keep_checkpoints: 200
    27.     max_steps: 1000000
    28.     time_horizon: 2048
    29.     summary_freq: 10000
    30.  
    31.     self_play:
    32.       window: 30
    33.       play_against_latest_model_ratio: 0.5
    34.       save_steps: 20000
    35.       swap_steps: 10000
    36.       team_change: 100000
    37.  
    Agent's behavior :
    upload_2022-4-3_17-49-57.png
     
  2. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    bump, I have the same problem