Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Agent training freezes at buffer_size steps

Discussion in 'ML-Agents' started by beinzheans, Aug 29, 2022.

  1. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    I'm training a self-driving AI car but the training stops every time buffer_size steps is reached (i.e. if buffer_size is 1000, then the training freezes at 1000 steps, 2000 steps, 3000 steps ...). After 5 or 8 minutes later, the training will begin normally again until the next buffer_size is reach, and the process is repeated.

    To my understanding, buffer_size is when the model is being updated. Does this mean that my model is simply too large (20x20 grid sensor with 3 tags with 2 stacks and some more raycasts), or can I improve it with my hyperparameters?

    Here are my hyperparameters:


    Code (CSharp):
    1. behaviors:
    2.   CarAgentFollow:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       # Hyperparameters common to PPO and SAC
    6.       batch_size: 4096
    7.       buffer_size: 65536
    8.       learning_rate: 3.0e-4
    9.       learning_rate_schedule: linear
    10.       # PPO-specific hyperparameters
    11.       # Replaces the "PPO-specific hyperparameters" section above
    12.       beta: 5.0e-3
    13.       beta_schedule: linear
    14.       epsilon: 0.2
    15.       epsilon_schedule: linear
    16.       lambd: 0.9
    17.       num_epoch: 13
    18.     # Configuration of the neural network (common to PPO/SAC)
    19.     network_settings:
    20.       vis_encode_type: simple
    21.       normalize: true
    22.       hidden_units: 128
    23.       num_layers: 2
    24.  
    25.     # Trainer configurations common to all trainers
    26.     max_steps: 3.5e6
    27.     time_horizon: 512
    28.     summary_freq: 10000
    29.     keep_checkpoints: 5
    30.     checkpoint_interval: 40000
    31.     threaded: true
    32.     init_path: null
    33.    
    34.     reward_signals:
    35.       # environment reward (default)
    36.       extrinsic:
    37.         strength: 1.0
    38.         gamma: 0.99
    39.  
    40.       # curiosity module
    41.       curiosity:
    42.         strength: 0.01
    43.         gamma: 0.99
    44.         learning_rate: 3.0e-4
    45.  
    46. environment_parameters:
    47.   levels:
    48.     curriculum:
    49.       - name: ObstaclesDodge_Easy
    50.         completion_criteria:
    51.           measure: reward
    52.           behavior: CarAgentFollow
    53.           signal_smoothing: true
    54.           threshold: 4.75
    55.           min_lesson_length: 100
    56.         value:
    57.           sampler_type: uniform
    58.           sampler_parameters:
    59.             min_value: 1
    60.             max_value: 2
    61.       - name: ObstaclesDodge_Medium
    62.         completion_criteria:
    63.           measure: reward
    64.           behavior: CarAgentFollow
    65.           signal_smoothing: true
    66.           threshold: 4.65
    67.           min_lesson_length: 100
    68.         value:
    69.           sampler_type: uniform
    70.           sampler_parameters:
    71.             min_value: 2
    72.             max_value: 3
    73.       - name: ObstaclesDodge_Hard
    74.         completion_criteria:
    75.           measure: reward
    76.           behavior: CarAgentFollow
    77.           signal_smoothing: true
    78.           threshold: 4.55
    79.           min_lesson_length: 100
    80.         value:
    81.           sampler_type: uniform
    82.           sampler_parameters:
    83.             min_value: 3
    84.             max_value: 5
    85.       - name: ObstaclesDodge_Expert
    86.         value:
    87.           sampler_type: uniform
    88.           sampler_parameters:
    89.             min_value: 5
    90.             max_value: 8

    Any help will be greatly appreciated :)
     
  2. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    Also note that gail and BC makes a big difference in freezing time

    without: ~5 mins
    with: ~8 mins
     
  3. MidnightGameDeveloper

    MidnightGameDeveloper

    Joined:
    Apr 26, 2014
    Posts:
    122
    I am seeing a similar behaviour when training the example environments. The editor/environment freezes every few seconds. Is this normal?
    I can't remember that this was happening when I used ML-Agents two years ago. Back then it just run smooth through the training on the same machine.
     
  4. MidnightGameDeveloper

    MidnightGameDeveloper

    Joined:
    Apr 26, 2014
    Posts:
    122
    Have you found any solution for your problem yet? I have the same problem, training freezes when Buffer_Size is reached.
     
  5. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    Sorry but I haven't... I might change it to be computationally simpler (a lot of raycasts compared to grid sensors) and see how it goes
     
  6. beinzheans

    beinzheans

    Joined:
    Dec 16, 2021
    Posts:
    12
    Hey, just wondering if you have a large buffer_size to batch_size ratio? It seemed to help if I lowered my buffer_size and increase batch_size within reasonable range. (note: I also removed the grid sensors and used raycasts instead, so maybe the less complex AI also cut down the freezing time)