Question Self-play training doesn't work

Yuulis04 · Apr 3, 2022

Hello. I wanted to help my agents learn the game of reversi and I used a self-play learning. However, from the logs it looks like the self-play learning is not working.
Is there anything I am missing?

My environment :
Unity 2021.2.16.f1
mlagents release 17

Logs :

Code (boo):

Version information:

ml-agents: 0.26.0,

ml-agents-envs: 0.26.0,

Communicator API: 1.5.0,

PyTorch: 1.7.1+cu110

[INFO] Connected to Unity environment with package version 2.0.0-exp.1 and communication version 1.5.0

[INFO] Connected new brain: ReversiAgent?team=0

[INFO] Hyperparameters for behavior name ReversiAgent:

trainer_type: ppo

hyperparameters:

batch_size: 64

buffer_size: 2048

learning_rate: 0.0001

beta: 0.0001

epsilon: 0.2

lambd: 0.95

num_epoch: 3

learning_rate_schedule: constant

network_settings:

normalize: True

hidden_units: 512

num_layers: 3

vis_encode_type: simple

memory: None

goal_conditioning_type: hyper

reward_signals:

extrinsic:

gamma: 0.995

strength: 1.0

network_settings:

normalize: False

hidden_units: 128

num_layers: 2

vis_encode_type: simple

memory: None

goal_conditioning_type: hyper

init_path: None

keep_checkpoints: 200

checkpoint_interval: 500000

max_steps: 1000000

time_horizon: 2048

summary_freq: 10000

threaded: False

self_play:

save_steps: 20000

team_change: 100000

swap_steps: 10000

window: 30

play_against_latest_model_ratio: 0.5

initial_elo: 1200.0

behavioral_cloning: None

[INFO] Connected new brain: ReversiAgent?team=1

[INFO] ReversiAgent. Step: 10000. Time Elapsed: 165.283 s. Mean Reward: 0.000. Std of Reward: 0.035. Training.

[INFO] ReversiAgent. Step: 20000. Time Elapsed: 308.563 s. Mean Reward: -0.000. Std of Reward: 0.033. Training.

[INFO] ReversiAgent. Step: 30000. Time Elapsed: 452.128 s. Mean Reward: 0.001. Std of Reward: 0.035. Training.

...

config file :

Code (yaml):

behaviors:

ReversiAgent:

trainer_type: ppo

hyperparameters:

batch_size: 64

buffer_size: 2048

learning_rate: 1e-4

beta: 1e-4

epsilon: 0.2

lambd: 0.95

num_epoch: 3

learning_rate_schedule: constant

network_settings:

normalize: true

hidden_units: 512

num_layers: 3

vis_encode_type: simple

reward_signals:

extrinsic:

gamma: 0.995

strength: 1.0

keep_checkpoints: 200

max_steps: 1000000

time_horizon: 2048

summary_freq: 10000

self_play:

window: 30

play_against_latest_model_ratio: 0.5

save_steps: 20000

swap_steps: 10000

team_change: 100000

Agent's behavior :

beinzheans · Oct 21, 2022

bump, I have the same problem

Search Unity

Unity ID

Useful Searches

Question Self-play training doesn't work

Yuulis04

beinzheans