Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice

Bug Random errors with increasing buffer_size > 10240

Discussion in 'ML-Agents' started by An-u-rag, Mar 26, 2024.

  1. An-u-rag

    An-u-rag

    Joined:
    Nov 23, 2020
    Posts:
    6
    Hi, I am experiencing random errors after a few steps into training. This is specially with regards to PPO.

    A question I have is, how do I calculate the memory of my experience buffer ? I keep getting errors related to "sequence_length invalid" or "broken pipe" or " UnityEnvironment worker 0: environment raised an unexpected exception." when I try to increase my buffer_size >= 8192. I know increasing buffer size can lead to more "RAM? VRAM?" consumption but I believe this is a relatively small buffer size and I should not be getting these errors. I will post the error logs below but before that I want to clarify the memory calculation.

    Memory(Experience buffer) = Memory(Observations + Actions + Reward) * buffer_size

    Is this correct?

    In my scenario, I just want a car to put a ball in the goal.

    Car has 2 continuous input actions:
    1. Throttle - Forward / Backward Acceleration - Float
    2. Steering Direction - Float
    Mem(Actions) = 4 + 4 = 8 Bytes

    Observations:
    1. 32 x 32 Grayscale FPS Visual Observation
    2. Single Raycast Distance - Float
    3. Current Steering Direction - Float
    4. Current Throttle - Float
    Mem(Observation) = (32 x 32) + 4 + 4 + 4 = 1024 + 4 + 4 + 4 = 1036 Bytes

    Rewards:
    1. Discrete reward when Car makes contact with Ball
    2. Discrete reward when Ball makes contact with Goal
    3. Inverse Distance Squared from Car to Ball (cutoff after contact with ball)
    4. Inverse Distance Squared from Ball to Goal (starts after contact with ball)
    Mem(Rewards) = 4 (since all are added together to one float)

    Taking these into the equation:
    Mem(Experience Buffer) = (8 + 1036 + 4) * 8192 = 8,585,216 bytes = 8.5 MB

    If this is true, then I should be having no problem with my 16 GB RAM and Nvidia 3070 Ti with dedicated 8 GB VRAM. I am stating both because I fail to understand still how to properly utilize the GPU during mlagents training due to the poor documentation on this subject. The only thing I am doing to utilize my GPU right now is adding --torch-device=cuda to my mlagents-learn command. I have of course downloaded pytorch that is built with cuda and made sure to get the corresponding CUDA toolkit version. I have no idea where this experience buffer is being stored. I checked task manager and that was pretty unhelpful too.

    I would really appreciate it if someone could clarify these for me.


    Error Logs from my latest run batch_size 1024 and buffer_size 10240:
    Code (CSharp):
    1. (mlagents) C:\Users\Anurag\ml-agents-latest_release>mlagents-learn config/Car2Ball_visual_curiosity_config_v3.yaml --run-id=test3_1024_10240 --torch-device=cuda --resume
    2.  
    3.             ┐  ╖
    4.         ╓╖╬│╡  ││╬╖╖
    5.     ╓╖╬│││││┘  ╬│││││╬╖
    6. ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
    7. ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
    8. ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
    9. ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
    10. ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
    11. ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
    12. ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
    13.    ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
    14.       ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
    15.           ╙╬╬╬╣╣╣╜
    16.              ╙
    17.  
    18. Version information:
    19.   ml-agents: 1.0.0,
    20.   ml-agents-envs: 1.0.0,
    21.   Communicator API: 1.5.0,
    22.   PyTorch: 1.13.1+cu117
    23. [WARNING] Training status file not found. Not all functions will resume properly.
    24. [INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
    25. [INFO] Connected to Unity environment with package version 3.0.0-exp.1 and communication version 1.5.0
    26. [INFO] Connected new brain: Car2Ball?team=0
    27. [INFO] Hyperparameters for behavior name Car2Ball:
    28.         trainer_type:   ppo
    29.         hyperparameters:
    30.           batch_size:   1024
    31.           buffer_size:  10240
    32.           learning_rate:        0.0003
    33.           beta: 0.005
    34.           epsilon:      0.2
    35.           lambd:        0.95
    36.           num_epoch:    3
    37.           shared_critic:        True
    38.           learning_rate_schedule:       linear
    39.           beta_schedule:        constant
    40.           epsilon_schedule:     linear
    41.         checkpoint_interval:    500000
    42.         network_settings:
    43.           normalize:    False
    44.           hidden_units: 128
    45.           num_layers:   2
    46.           vis_encode_type:      simple
    47.           memory:       None
    48.           goal_conditioning_type:       hyper
    49.           deterministic:        False
    50.         reward_signals:
    51.           extrinsic:
    52.             gamma:      0.99
    53.             strength:   1.0
    54.             network_settings:
    55.               normalize:        False
    56.               hidden_units:     128
    57.               num_layers:       2
    58.               vis_encode_type:  simple
    59.               memory:   None
    60.               goal_conditioning_type:   hyper
    61.               deterministic:    False
    62.           curiosity:
    63.             gamma:      0.99
    64.             strength:   0.02
    65.             network_settings:
    66.               normalize:        False
    67.               hidden_units:     128
    68.               num_layers:       2
    69.               vis_encode_type:  simple
    70.               memory:   None
    71.               goal_conditioning_type:   hyper
    72.               deterministic:    False
    73.             learning_rate:      0.003
    74.             encoding_size:      None
    75.         init_path:      None
    76.         keep_checkpoints:       5
    77.         even_checkpoints:       False
    78.         max_steps:      30000000
    79.         time_horizon:   128
    80.         summary_freq:   50000
    81.         threaded:       True
    82.         self_play:      None
    83.         behavioral_cloning:     None
    84. [INFO] Resuming from results\test3_1024_10240\Car2Ball.
    85. [INFO] Resuming training from step 499978.
    86. [INFO] Car2Ball. Step: 500000. Time Elapsed: 6.339 s. No episode was completed since last summary. Training.
    87. [INFO] Exported results\test3_1024_10240\Car2Ball\Car2Ball-499978.onnx
    88. [INFO] Car2Ball. Step: 550000. Time Elapsed: 46.842 s. Mean Reward: 166.833. Std of Reward: 85.181. Training.
    89. [INFO] Car2Ball. Step: 600000. Time Elapsed: 89.331 s. Mean Reward: 154.484. Std of Reward: 67.336. Training.
    90. [INFO] Car2Ball. Step: 650000. Time Elapsed: 131.203 s. Mean Reward: 140.996. Std of Reward: 85.288. Training.
    91. [INFO] Car2Ball. Step: 700000. Time Elapsed: 173.653 s. Mean Reward: 152.901. Std of Reward: 66.126. Training.
    92. [INFO] Car2Ball. Step: 750000. Time Elapsed: 217.055 s. Mean Reward: 146.363. Std of Reward: 74.872. Training.
    93. [INFO] Car2Ball. Step: 800000. Time Elapsed: 256.871 s. Mean Reward: 148.012. Std of Reward: 72.254. Training.
    94. [INFO] Car2Ball. Step: 850000. Time Elapsed: 298.509 s. Mean Reward: 152.311. Std of Reward: 93.526. Training.
    95. [INFO] Car2Ball. Step: 900000. Time Elapsed: 340.512 s. Mean Reward: 147.693. Std of Reward: 92.437. Training.
    96. [INFO] Car2Ball. Step: 950000. Time Elapsed: 382.421 s. Mean Reward: 152.774. Std of Reward: 66.762. Training.
    97. Exception in thread Thread-2 (trainer_update_func):
    98. Traceback (most recent call last):
    99. [ERROR] UnityEnvironment worker 0: environment raised an unexpected exception.
    100. Traceback (most recent call last):
    101.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    102.     nread, err = ov.GetOverlappedResult(True)
    103. BrokenPipeError: [WinError 109] The pipe has been ended
    104.  
    105. During handling of the above exception, another exception occurred:
    106.  
    107. Traceback (most recent call last):
    108.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 175, in worker
    109.     req: EnvironmentRequest = parent_conn.recv()
    110.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 250, in recv
    111.     buf = self._recv_bytes()
    112.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    113.     raise EOFError
    114. EOFError
    115. Process Process-1:
    116. Traceback (most recent call last):
    117.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    118.     nread, err = ov.GetOverlappedResult(True)
    119. BrokenPipeError: [WinError 109] The pipe has been ended
    120.  
    121. During handling of the above exception, another exception occurred:
    122.  
    123. Traceback (most recent call last):
    124.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 175, in worker
    125.     req: EnvironmentRequest = parent_conn.recv()
    126.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 250, in recv
    127.     buf = self._recv_bytes()
    128.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    129.     raise EOFError
    130. EOFError
    131.  
    132. During handling of the above exception, another exception occurred:
    133.  
    134. Traceback (most recent call last):
    135.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\process.py", line 314, in _bootstrap
    136.     self.run()
    137.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\process.py", line 108, in run
    138.     self._target(*self._args, **self._kwargs)
    139.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 235, in worker
    140.     _send_response(EnvironmentCommand.ENV_EXITED, ex)
    141.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py", line 150, in _send_response
    142.     parent_conn.send(EnvironmentResponse(cmd_name, worker_id, payload))
    143.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 206, in send
    144.     self._send_bytes(_ForkingPickler.dumps(obj))
    145.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 280, in _send_bytes
    146.     ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
    147. BrokenPipeError: [WinError 232] The pipe is being closed
    Sometimes I get a different error like this:

    Code (CSharp):
    1. Error message:
    2. [INFO] Car2Ball. Step: 600000. Time Elapsed: 720.614 s. Mean Reward: 101.428. Std of Reward: 75.614. Training.
    3. [INFO] Car2Ball. Step: 650000. Time Elapsed: 797.231 s. Mean Reward: 117.052. Std of Reward: 79.043. Training.
    4. [INFO] Car2Ball. Step: 700000. Time Elapsed: 845.481 s. Mean Reward: 110.748. Std of Reward: 77.274. Training.
    5. [INFO] Car2Ball. Step: 750000. Time Elapsed: 910.442 s. Mean Reward: 118.675. Std of Reward: 74.930. Training.
    6. [INFO] Car2Ball. Step: 800000. Time Elapsed: 949.560 s. Mean Reward: 123.312. Std of Reward: 75.215. Training.
    7. Exception in thread Thread-2 (trainer_update_func):
    8. Traceback (most recent call last):
    9.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\threading.py", line 1016, in _bootstrap_inner
    10.     self.run()
    11.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\threading.py", line 953, in run
    12.     self._target(*self._args, **self._kwargs)
    13.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 297, in trainer_update_func
    14.     trainer.advance()
    15.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 302, in advance
    16.     if self._update_policy():
    17.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\on_policy_trainer.py", line 97, in _update_policy
    18.     self.update_buffer.shuffle(sequence_length=self.policy.sequence_length)
    19.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\buffer.py", line 403, in shuffle
    20.     tmp += buffer_field[i * sequence_length : (i + 1) * sequence_length]
    21.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\buffer.py", line 113, in _getitem_
    22.     return AgentBufferField(return_data)
    23.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\buffer.py", line 105, in _init_
    24.     super()._init_(*args, **kwargs)
    25. TypeError: <cell at 0x000002593A5335B0: type object at 0x0000025931A9E990> argument after * must be an iterable, not method-wrapper
    Although I don't have the last type of error's output, it shows me a problem similar to the above itself, with this line of code especially
    tmp += buffer_field[i * sequence_length : (i + 1) * sequence_length]
    .

    I believe all of these have something to do with increasing buffer_size.
    Again, I would really appreciate any thoughts or help on this.

    python: 3.10.13
    mlagents: latest release (release 21)
    Unity: 2022.3.20f1

    Thanks
    Anurag
     
    Last edited: Mar 26, 2024
  2. SevickV

    SevickV

    Joined:
    Feb 29, 2024
    Posts:
    3
    Currently training with 100k buffer - no issue.
    (25 environments, time_horizon of 512 steps, batch_size 10k).

    *I should be having no problem with my 16 GB RAM
    There is only one way to know - just check if you have any free memory when it crushes.
     
  3. An-u-rag

    An-u-rag

    Joined:
    Nov 23, 2020
    Posts:
    6
    I just got new RAM sticks 2x16GB and I still get this problem. It is hard to debug this as the error message is just
    Code (CSharp):
    1. Exception in thread Thread-2 (trainer_update_func):
    2. Traceback (most recent call last):
    3. [ERROR] UnityEnvironment worker 0: environment raised an unexpected exception.
    4. Traceback (most recent call last):
    5.   File "C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    6.     nread, err = ov.GetOverlappedResult(True)
    7. BrokenPipeError: [WinError 109] The pipe has been ended
    What am I supposed to gather from this?
     
  4. An-u-rag

    An-u-rag

    Joined:
    Nov 23, 2020
    Posts:
    6

    Here is the config yaml that I tried. It runs for longer than the 10240 case but still ends with the same error.


    behaviors:
    Car2Ball:
    trainer_type: ppo
    hyperparameters:
    batch_size: 10000
    buffer_size: 100000
    learning_rate: 0.0003
    beta: 0.005
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    beta_schedule: constant
    epsilon_schedule: linear
    shared_critic: true
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    strength: 1.0
    gamma: 0.99
    curiosity:
    strength: 0.02
    gamma: 0.99
    learning_rate: 3.0e-3
    max_steps: 30000000
    time_horizon: 512
    summary_freq: 50000
    threaded: true