Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Question Why is Elo not displaying while using self_play?

Discussion in 'ML-Agents' started by beinzheans, Oct 22, 2022.

  1. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    Where can I find the Elo? I've checked tensorboard dev and my training anaconda prompt (with debug on), but I can't see it anywhere.


    Hyperparameters:

    Code (CSharp):
    1. behaviors:
    2.   CarAgentFollow:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       # Hyperparameters common to PPO and SAC
    6.       batch_size: 2048
    7.       buffer_size: 10240
    8.       learning_rate: 8.0e-5
    9.       learning_rate_schedule: constant
    10.       # PPO-specific hyperparameters
    11.       # Replaces the "PPO-specific hyperparameters" section above
    12.       beta: 3e-4
    13.       beta_schedule: constant
    14.       epsilon: 0.2
    15.       epsilon_schedule: constant
    16.       lambd: 0.9
    17.       num_epoch: 10
    18.     # Configuration of the neural network (common to PPO/SAC)
    19.     network_settings:
    20.       vis_encode_type: simple
    21.       normalize: true
    22.       hidden_units: 256
    23.       num_layers: 2
    24.  
    25.     reward_signals:
    26.       # environment reward (default)
    27.       extrinsic:
    28.         strength: 1.0
    29.         gamma: 0.99
    30.  
    31.     # Trainer configurations common to all trainers
    32.     max_steps: 1.0e7
    33.     time_horizon: 512
    34.     summary_freq: 50000
    35.     keep_checkpoints: 5
    36.     checkpoint_interval: 100000
    37.     threaded: false
    38.     init_path: null
    39.     self_play:
    40.       save_steps: 100000
    41.       team_change: 500000
    42.       swap_steps: 50000
    43.       window: 15
    44.       play_against_latest_model_ratio: 0.5

    Anaconda Prompt (sample)

    [

    2022-10-22 20:16:40 INFO [environment.py:297] Connected new brain: CarAgentFollow?team=0
    2022-10-22 20:16:40 INFO [environment.py:297] Connected new brain: CarAgentFollow?team=2
    2022-10-22 20:17:11 INFO [stats.py:203] Hyperparameters for behavior name CarAgentFollow:
    trainer_type: ppo
    hyperparameters:
    batch_size: 2048
    buffer_size: 10240
    learning_rate: 8e-05
    beta: 0.0003
    epsilon: 0.2
    lambd: 0.9
    num_epoch: 10
    learning_rate_schedule: constant
    beta_schedule: constant
    epsilon_schedule: constant
    network_settings:
    normalize: True
    hidden_units: 256
    num_layers: 2
    vis_encode_type: simple
    memory: None
    goal_conditioning_type: hyper
    deterministic: False
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    network_settings:
    normalize: False
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    memory: None
    goal_conditioning_type: hyper
    deterministic: False
    init_path: D:/results\AILearnDrive_18_10_22_1\CarAgentFollow\checkpoint.pt
    keep_checkpoints: 5
    checkpoint_interval: 100000
    max_steps: 10000000
    time_horizon: 512
    summary_freq: 50000
    threaded: False
    self_play:
    save_steps: 100000
    team_change: 500000
    swap_steps: 50000
    window: 15
    play_against_latest_model_ratio: 0.5
    initial_elo: 1200.0
    behavioral_cloning: None
    2022-10-22 20:17:11 INFO [torch_model_saver.py:73] Initializing from D:/results\AILearnDrive_18_10_22_1\CarAgentFollow\checkpoint.pt.
    2022-10-22 20:17:11 WARNING [torch_model_saver.py:126] Failed to load for module Optimizer:value_optimizer. Initializing
    2022-10-22 20:17:11 DEBUG [torch_model_saver.py:127] Module loading error : loaded state dict contains a parameter group that doesn't match the size of optimizer's group
    2022-10-22 20:17:11 WARNING [torch_model_saver.py:110] Did not expect these keys ['value_heads.value_heads.gail.weight', 'value_heads.value_heads.gail.bias'] in checkpoint. Ignoring.
    2022-10-22 20:17:11 INFO [torch_model_saver.py:131] Starting training from step 0 and saving to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow.
    2022-10-22 20:20:43 INFO [stats.py:197] CarAgentFollow. Step: 50000. Time Elapsed: 257.431 s. Mean Reward: -0.200. Std of Reward: 0.980. Training.
    2022-10-22 20:20:43 DEBUG [trainer.py:436] Step 50240: Swapping snapshot current to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:24:11 INFO [stats.py:197] CarAgentFollow. Step: 100000. Time Elapsed: 465.272 s. Mean Reward: -0.310. Std of Reward: 0.951. Training.
    2022-10-22 20:24:11 DEBUG [model_serialization.py:161] Converting to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-99536.onnx
    2022-10-22 20:24:11 INFO [model_serialization.py:173] Exported D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-99536.onnx
    2022-10-22 20:24:47 DEBUG [trainer.py:436] Step 105120: Swapping snapshot current to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:27:39 INFO [stats.py:197] CarAgentFollow. Step: 150000. Time Elapsed: 673.803 s. Mean Reward: -0.429. Std of Reward: 0.904. Training.
    2022-10-22 20:28:16 DEBUG [trainer.py:436] Step 155360: Swapping snapshot 1 to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:31:27 INFO [stats.py:197] CarAgentFollow. Step: 200000. Time Elapsed: 901.962 s. Mean Reward: -0.241. Std of Reward: 0.970. Training.
    2022-10-22 20:31:27 DEBUG [model_serialization.py:161] Converting to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-199536.onnx
    2022-10-22 20:31:28 INFO [model_serialization.py:173] Exported D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-199536.onnx
    2022-10-22 20:32:18 DEBUG [trainer.py:436] Step 210240: Swapping snapshot 1 to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:34:56 INFO [stats.py:197] CarAgentFollow. Step: 250000. Time Elapsed: 1110.840 s. Mean Reward: -0.143. Std of Reward: 0.990. Training.
    2022-10-22 20:36:01 DEBUG [trainer.py:436] Step 265120: Swapping snapshot current to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:38:24 INFO [stats.py:197] CarAgentFollow. Step: 300000. Time Elapsed: 1318.781 s. Mean Reward: -0.241. Std of Reward: 0.970. Training.
    2022-10-22 20:38:24 DEBUG [model_serialization.py:161] Converting to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-299536.onnx
    2022-10-22 20:38:24 INFO [model_serialization.py:173] Exported D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-299536.onnx
    2022-10-22 20:39:52 DEBUG [trainer.py:436] Step 315360: Swapping snapshot current to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:42:14 INFO [stats.py:197] CarAgentFollow. Step: 350000. Time Elapsed: 1548.298 s. Mean Reward: 0.143. Std of Reward: 0.990. Training.
    2022-10-22 20:43:33 DEBUG [trainer.py:436] Step 370240: Swapping snapshot 3 to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:45:42 INFO [stats.py:197] CarAgentFollow. Step: 400000. Time Elapsed: 1756.623 s. Mean Reward: -0.241. Std of Reward: 0.970. Training.
    2022-10-22 20:45:42 DEBUG [model_serialization.py:161] Converting to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-399536.onnx
    2022-10-22 20:45:42 INFO [model_serialization.py:173] Exported D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-399536.onnx
    2022-10-22 20:47:37 DEBUG [trainer.py:436] Step 425120: Swapping snapshot current to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:49:11 INFO [stats.py:197] CarAgentFollow. Step: 450000. Time Elapsed: 1965.482 s. Mean Reward: -0.238. Std of Reward: 0.971. Training.
    2022-10-22 20:51:05 DEBUG [trainer.py:436] Step 475360: Swapping snapshot 4 to id CarAgentFollow?team=0 with team 2 learning
    2022-10-22 20:52:58 INFO [stats.py:197] CarAgentFollow. Step: 500000. Time Elapsed: 2192.958 s. Mean Reward: -0.172. Std of Reward: 0.985. Training.
    2022-10-22 20:52:58 DEBUG [model_serialization.py:161] Converting to D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-499536.onnx
    2022-10-22 20:52:59 INFO [model_serialization.py:173] Exported D:/results\AILearnDrive_22_10_22_2\CarAgentFollow\CarAgentFollow-499536.onnx
    2022-10-22 20:53:13 DEBUG [controller.py:72] Learning team 0 swapped on step 505120
    2022-10-22 20:53:13 DEBUG [trainer.py:436] Step 505120: Swapping snapshot current to id CarAgentFollow?team=2 with team 0 learning
    2022-10-22 20:56:23 INFO [stats.py:197] CarAgentFollow. Step: 550000. Time Elapsed: 2397.757 s. Mean Reward: -0.143. Std of Reward: 1.283. Training.



    As you can see from the terminal, self_play does seem to be working, seen from "Swapping snapshot ...", but my ELO isn't showing up, not on tensorboard or on the prompt itself. Can anyone tell me why? Thanks in advance


    EDIT: Here is also the tensorboard screenshot: Screenshot 2022-10-23 062147.png

    There should be another drop-down menu labeled "Self-play", but as you can see, there's nothing. Only the 3 standard drop-down menus
     
    Last edited: Oct 22, 2022