Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Question Self-play training doesn't work

Discussion in 'ML-Agents' started by Yuulis04, Apr 3, 2022.

  1. Yuulis04

    Yuulis04

    Joined:
    Apr 4, 2021
    Posts:
    28
    Hello. I wanted to help my agents learn the game of reversi and I used a self-play learning. However, from the logs it looks like the self-play learning is not working.
    Is there anything I am missing?

    My environment :
    Unity 2021.2.16.f1
    mlagents release 17

    Logs :
    Code (boo):
    1. Version information:
    2.   ml-agents: 0.26.0,
    3.   ml-agents-envs: 0.26.0,
    4.   Communicator API: 1.5.0,
    5.   PyTorch: 1.7.1+cu110
    6. [INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0
    7. [INFO] Connected new brain: ReversiAgent?team=0
    8. [INFO] Hyperparameters for behavior name ReversiAgent:
    9.         trainer_type:   ppo
    10.         hyperparameters:
    11.           batch_size:   64
    12.           buffer_size:  2048
    13.           learning_rate:        0.0001
    14.           beta: 0.0001
    15.           epsilon:      0.2
    16.           lambd:        0.95
    17.           num_epoch:    3
    18.           learning_rate_schedule:       constant
    19.         network_settings:
    20.           normalize:    True
    21.           hidden_units: 512
    22.           num_layers:   3
    23.           vis_encode_type:      simple
    24.           memory:       None
    25.           goal_conditioning_type:       hyper
    26.         reward_signals:
    27.           extrinsic:
    28.             gamma:      0.995
    29.             strength:   1.0
    30.             network_settings:
    31.               normalize:        False
    32.               hidden_units:     128
    33.               num_layers:       2
    34.               vis_encode_type:  simple
    35.               memory:   None
    36.               goal_conditioning_type:   hyper
    37.         init_path:      None
    38.         keep_checkpoints:       200
    39.         checkpoint_interval:    500000
    40.         max_steps:      1000000
    41.         time_horizon:   2048
    42.         summary_freq:   10000
    43.         threaded:       False
    44.         self_play:
    45.           save_steps:   20000
    46.           team_change:  100000
    47.           swap_steps:   10000
    48.           window:       30
    49.           play_against_latest_model_ratio:      0.5
    50.           initial_elo:  1200.0
    51.         behavioral_cloning:     None
    52. [INFO] Connected new brain: ReversiAgent?team=1
    53. [INFO] ReversiAgent. Step: 10000. Time Elapsed: 165.283 s. Mean Reward: 0.000. Std of Reward: 0.035. Training.
    54. [INFO] ReversiAgent. Step: 20000. Time Elapsed: 308.563 s. Mean Reward: -0.000. Std of Reward: 0.033. Training.
    55. [INFO] ReversiAgent. Step: 30000. Time Elapsed: 452.128 s. Mean Reward: 0.001. Std of Reward: 0.035. Training.
    56. ...
    config file :
    Code (yaml):
    1. behaviors:
    2.   ReversiAgent:
    3.     trainer_type: ppo
    4.  
    5.     hyperparameters:
    6.       batch_size: 64
    7.       buffer_size: 2048
    8.       learning_rate: 1e-4
    9.       beta: 1e-4
    10.       epsilon: 0.2
    11.       lambd: 0.95
    12.       num_epoch: 3
    13.       learning_rate_schedule: constant
    14.  
    15.     network_settings:
    16.       normalize: true
    17.       hidden_units: 512
    18.       num_layers: 3
    19.       vis_encode_type: simple
    20.  
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.995
    24.         strength: 1.0
    25.  
    26.     keep_checkpoints: 200
    27.     max_steps: 1000000
    28.     time_horizon: 2048
    29.     summary_freq: 10000
    30.  
    31.     self_play:
    32.       window: 30
    33.       play_against_latest_model_ratio: 0.5
    34.       save_steps: 20000
    35.       swap_steps: 10000
    36.       team_change: 100000
    37.  
    Agent's behavior :
    upload_2022-4-3_17-49-57.png
     
  2. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    bump, I have the same problem