Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Why is the performance of inference so bad?

Discussion in 'ML-Agents' started by Thorce, Jul 18, 2023.

  1. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38
    I have a setup where I train 2 Agents (A and B) adversarialy. While A trains, B is in inference mode. While B trains, A is in inference mode. I train both of them for X steps and then alternate. A requests a lot more decisions than B (~500 : 1).

    What I noticed is, that the performance when training A is way better than when I train B with A in inference mode.

    I did some deep profiling and saw that when training A a decision request takes about 50ms. When training B, though, the decision requests from a take at least 200+ms.

    Can anybody explain why the performance in inference mode is so bad?
     
  2. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38
    Does anyone have any input on this?
     
  3. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    if you use visual input is gets very slow (waiting for GPU synchronization). Also if B has much more NN then of course it will run slower . Use selfplay for the scenario you describe.
     
  4. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38
    Thanks for the input!
    I only use vector observations though. What do you mean by "NN"? In my case Agent A is my player and Agent B is an upgrade generator. A has 2 layers with 1024 hidden units and B has 2 layers with 128 hidden units.

    My problem is that I have a way lower framerate when Agent A runs in inference than when it is actively learning. In my case this greatly impact the training time of B (since for that Agent A has to run in inference).
     
  5. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    "My problem is that I have a way lower framerate when Agent A runs in inference than when it is actively learning" sorry idk what is the issue there.
    For me Burst inference produced performance problems. "A" needs more time bc in the forward pass (multiplying out all matrixes/weights of the neurons) has more neurons so it of course needs longer.

    Like I said, I would try to use Selfplay (an example is in the football players), it is specifically made to do what you want. i have never set it up so I cant help you unfortunately with details.
     
  6. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38

    You said that A has mroe neurons so it takes longer to calculate matrices, but doesn't it have to do this while learning too? Should the performance of A be better when running in inference vs when training?

    Indeed my setup is quite similar to selfplay but selfplay only works when training a single agent. So you could let A play against itself. In my case A plays against B and vice versa so I had to implement my own version of selfplay.
     
  7. kokimitsunami

    kokimitsunami

    Joined:
    Sep 2, 2021
    Posts:
    25
    Hi @Thorce

    I believe that selfplay works even when training multiple agents. I've done it myself. You can do it by specifying training settings for both the agents in one training yaml file like below:

    Code (Boo):
    1. behaviors:
    2.   agentA:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2048
    6.      ...
    7.     network_settings:
    8.       normalize: false
    9.       hidden_units: 1024
    10.       num_layers: 2
    11.     ...
    12.     self_play:
    13.       save_steps: 250000
    14.       team_change: 500000
    15.       swap_steps: 100000
    16.       window: 50
    17.       play_against_latest_model_ratio: 0.5
    18.       initial_elo: 1200.0
    19.   agentB:
    20.     trainer_type: ppo
    21.     hyperparameters:
    22.       batch_size: 2048
    23.      ...
    24.     network_settings:
    25.       normalize: false
    26.       hidden_units: 128
    27.       num_layers: 2
    28.     ...
    29.     self_play:
    30.       save_steps: 250000
    31.       team_change: 500000
    32.       swap_steps: 100000
    33.       window: 50
    34.       play_against_latest_model_ratio: 0.5
    35.       initial_elo: 1200.0
    36.  
    I hope this helps.
     
  8. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    177
    Code (CSharp):
    1. behaviors:
    2.   Goalie:
    3.     trainer_type: poca
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 20480
    7.       learning_rate: 0.0003
    8.       beta: 0.005
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: constant
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 512
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     max_steps: 30000000
    24.     time_horizon: 1000
    25.     summary_freq: 10000
    26.     self_play:
    27.       save_steps: 50000
    28.       team_change: 200000
    29.       swap_steps: 1000
    30.       window: 10
    31.       play_against_latest_model_ratio: 0.5
    32.       initial_elo: 1200.0
    33.   Striker:
    34.     trainer_type: poca
    35.     hyperparameters:
    36.       batch_size: 2048
    37.       buffer_size: 20480
    38.       learning_rate: 0.0003
    39.       beta: 0.005
    40.       epsilon: 0.2
    41.       lambd: 0.95
    42.       num_epoch: 3
    43.       learning_rate_schedule: constant
    44.     network_settings:
    45.       normalize: false
    46.       hidden_units: 512
    47.       num_layers: 2
    48.       vis_encode_type: simple
    49.     reward_signals:
    50.       extrinsic:
    51.         gamma: 0.99
    52.         strength: 1.0
    53.     keep_checkpoints: 5
    54.     max_steps: 30000000
    55.     time_horizon: 1000
    56.     summary_freq: 10000
    57.     self_play:
    58.       save_steps: 50000
    59.       team_change: 200000
    60.       swap_steps: 4000
    61.       window: 10
    62.       play_against_latest_model_ratio: 0.5
    63.       initial_elo: 1200.0
    64.  
    Sorry I dont know why training is faster then inferencing for you.
    @kokimitsunami
    gave you a nice example how to set it up with two agents, you can also look into the yaml of the 2 strikers vs 1 goaly soccer example:
     
  9. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38
  10. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38
    Maybe someone from the Unity ML-Agents team could chime in here regarding the performance problem?
     
  11. Thorce

    Thorce

    Joined:
    Jul 3, 2019
    Posts:
    38