Search Unity

SAC, very slow training and freezing environment (No issues with PPO)

Discussion in 'ML-Agents' started by digi170, Jan 7, 2021.

  1. digi170

    digi170

    Joined:
    Nov 7, 2019
    Posts:
    9
    I have an environment with 6 identical agents which interact with each other. (This problem also occurs when I only have 1 agent.) Training with PPO and the below config file works as expected, performing around 30k steps in less than a minute.
    Code (CSharp):
    1.  
    2. behaviors:
    3.   CarAgent:
    4.     trainer_type: ppo
    5.     hyperparameters:
    6.       batch_size: 2048
    7.       buffer_size: 20480
    8.       learning_rate: 0.0003
    9.       beta: 0.005
    10.       epsilon: 0.2
    11.       lambd: 0.95
    12.       num_epoch: 3
    13.       learning_rate_schedule: linear
    14.     network_settings:
    15.       normalize: true
    16.       hidden_units: 512
    17.       num_layers: 3
    18.       vis_encode_type: simple
    19.     reward_signals:
    20.       extrinsic:
    21.         gamma: 0.995
    22.         strength: 1.0
    23.     keep_checkpoints: 5
    24.     max_steps: 30000000
    25.     time_horizon: 1000
    26.     summary_freq: 30000
    27.     threaded: true
    28.  
    Using the same exact environment with the below config file using SAC and its relevant hyper parameters, freezes the environment as is able to compute 30k steps in around 45 minutes to 1.5 hours.

    Code (CSharp):
    1.  
    2. behaviors:
    3.   CarAgent:
    4.     trainer_type: sac
    5.     hyperparameters:
    6.       batch_size: 2048
    7.       buffer_size: 20480
    8.       learning_rate: 0.0003
    9.       learning_rate_schedule: linear
    10.      
    11.       buffer_init_steps: 0
    12.       tau: 0.005
    13.       steps_per_update: 20.0
    14.       save_replay_buffer: false
    15.       init_entcoef: 0.5
    16.       reward_signal_steps_per_update: 10.0
    17.     network_settings:
    18.       normalize: true
    19.       hidden_units: 512
    20.       num_layers: 3
    21.       vis_encode_type: simple
    22.     reward_signals:
    23.       extrinsic:
    24.         gamma: 0.995
    25.         strength: 1.0
    26.     keep_checkpoints: 5
    27.     max_steps: 30000000
    28.     time_horizon: 1000
    29.     summary_freq: 30000
    30.     threaded: true
    31.  
    32.  
    I also tried the suggestions posted in https://forum.unity.com/threads/sac-long-train-time-on-12-cores-amd-5900x.1026391/, running with, --tensorflow, changing cpu_utils,py and increasing the batch_size and steps_per_update, with no improvement.

    I am also trying to create my own RL using the python API and again training time is very slow when compared to the PPO.

    I am using the following command to start training from the CMD

    Code (CSharp):
    1. mlagents-learn config_sac.yaml --run-id=SAC --env="....\FinalEnv\Build" --time-scale=10 --quality-level=0 --width=640 --height=640 --force
    2.  
    The below are a few more details.
    Version information:
    ml-agents: 0.23.0,
    ml-agents-envs: 0.23.0,
    Communicator API: 1.3.0,
    PyTorch: 1.7.0+cu110

    If found this issue as well but not sure what they mean by the inference configuration.(https://stackoverflow.com/questions/55324790/unity-ml-agents-running-very-slowly) (Just including this for completeness, and maybe its a possible solution)

    Any help would be appreciated.
    Thank you in advance.
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,820
    Let me bounce this off of the team for some guidance.
     
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Hi @digi170

    One thing that stands out to me is that your batch size is pretty large and buffer size is pretty small for SAC. Can you try a batch size of 256 and a buffer size of 1000000. I'd be surprised if this was the cause of such a slowdown but it should help a bit. I assume you are also running with time-scale=10 with ppo?

    The stack overflow thread you shared is a problem we had to deal with when porting the codebase to torch but we have essentially addressed that to the best of our ability.
     
  4. digi170

    digi170

    Joined:
    Nov 7, 2019
    Posts:
    9
    @andrewcoh_unity Thank you for your reply and suggestions.

    The speed has increased drastically, completing 30000 steps in around 6 minutes, but the unity environment is still lagging (Not responding message on windows) and this seems to occur mostly just before outputting the summary, i.e. in this case every 30 k steps. I have also tried with a linear (rather than constant) learning_rate_schedule, batch size of 128 and buffer size of 4000000 with tensorflow (--tensorflow appended to the cmd) which further improved training time to around 2 minutes for 30K.

    Yes, I am also using a time-scale of 10 with PPO.

    Is there any environment property that is specific to SAC or should an environment that works with PPO also work with SAC without any changes?

    Furthermore, does reducing and increasing the batch size and buffer size respectively, and changing the learning_rate_schedule effect how the agents are trained? I am asking this because I want to compare other models which I have trained with the above defined hyper parameters and not sure if I can compare them now that these hyper parameters changed.

    Thanks again for your help.
     
  5. digi170

    digi170

    Joined:
    Nov 7, 2019
    Posts:
    9
    Further to my comment about the unity environment freezing when the summary frequency is reached, I also noted that pressing enter in the command prompt when the environment is frozen, "unfreezes" the environment.

    Thanks again.
     
  6. digi170

    digi170

    Joined:
    Nov 7, 2019
    Posts:
    9
    @TreyK-47 @andrewcoh_unity

    I have another question regarding the batch size and buffer size. I used the above-mentioned batch_size and buffer size (128 and 4000000) on 4 environments running 5 agents each (using SAC). Is this combination acceptable, especially since, the maximum suggested buffer size is 1000000?

    Thank you for any help
     
  7. Roboserg

    Roboserg

    Joined:
    Jun 3, 2018
    Posts:
    83
    I have the same problem. 9 months ago SAC was outperforming PPO for me on a 4 core CPU. Now I have a 12 core CPU and SAC is very slow + freezing the environment.
     
  8. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,820
    I'll ping the team for an update for y'all.
     
  9. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Hey Roboserg and others, we do want to get to the bottom of the SAC slowdown issue. It's most likely related to our recent switch from TensorFlow to PyTorch. What's the magnitude of the slowdown between now and 9 months ago, and what version of PyTorch are you running? In our internal testing we saw no more than 20% slowdown between TF and PyTorch on 8-core machines, and a speedup when using certain network architectures (CNNs, LSTMs).

    There are a couple things that are known to be able to speed up PyTorch execution. First, you can try disabling threading in the YAML (threading: false). Second, GPU/CUDA version of PyTorch is much better for parallelization, b/c of how they internally handle multiple network inference and backprop (I do realize this isn't an option for everybody). Third, we cap the number of threads PyTorch uses to 4, as we found that PyTorch could interfere with running Unity at higher thread counts. You can change this in ml-agents/mlagents/torch_utils/torch.py if you'd like to experiment. We'll make this an advanced option in the near future.

    Of course the ultimate workaround is to use R10 of the Python code (with --tensorflow) or earlier. You'll be able to train environments created with the latest Unity package, but without the new Hybrid Actions feature.
     
  10. Radixo

    Radixo

    Joined:
    Oct 20, 2018
    Posts:
    2
    Hey all.

    Is there any update from the dev team regarding fixing this issue?
    I have followed the tread closely as while running experiments for my dissertation I experienced exactly the same issues. After testing and applying all proposed solutions, the training time for SAC decreased only by around 5% while PPO was completing the same scenario 6-10 times faster.
     
  11. zukini_kun

    zukini_kun

    Joined:
    Jul 28, 2021
    Posts:
    4
    I am also having this issue running Pytorch 1.7.1, CUDA 1.10, mlagents 0.22.0 - is there any update from the dev team in another thread, has this been fixed in a more recent version of mlagents or did you find any workaround?
     
  12. zukini_kun

    zukini_kun

    Joined:
    Jul 28, 2021
    Posts:
    4
    I am experiencing a similar issue running Pytorch 1.7.1, CUDA 1.10, mlagents 0.22.0.
    The main problem I am having is that the training as well as the unity environment are freezing very often, with short periods of smooth training operation in between.
    Is there any update from the dev team in another thread or has this been fixed in a more recent version of mlagents?

    This is my training configuration file:
    behaviors:
    xyz:
    trainer_type: sac
    hyperparameters:
    batch_size: 128
    buffer_size: 124000
    learning_rate: 3.0e-4
    learning_rate_schedule: constant
    buffer_init_steps: 0
    init_entcoef: 0.6
    tau: 5e-3
    steps_per_update: 1
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    max_steps: 10000000
    time_horizon: 64
    summary_freq: 50000

    I am using 25 agents
     
  13. MidnightGameDeveloper

    MidnightGameDeveloper

    Joined:
    Apr 26, 2014
    Posts:
    123
    I have the same problem with freezing envrionments whenever the buffer_size steps are reached. I am using POCA for training the example environment dodgeball, and the envrionment freezes for quite some time whenever the buffer_size is reached (the gpu usage goes up then, probably changing the models?!). Is this behviour also because of the switch from tensorflow to pytorch that you mentioned in your post?
    When I was using ML-Agents the last time 2 years ago these freezes didnt happen during training. Back then I was using PPO which might be another reason.


     
  14. miguelalonsojr

    miguelalonsojr

    Unity Technologies

    Joined:
    Jan 13, 2021
    Posts:
    8
  15. carlosm

    carlosm

    Joined:
    Sep 17, 2015
    Posts:
    7
    Any fixes for this? if I --resume the train it's fast again. Any Idea how to fix it? it's a discrete environment (size=400) and buffer size of 2048.
     
  16. OmarVector

    OmarVector

    Joined:
    Apr 18, 2018
    Posts:
    130
    I do have this issue on Mac M1 Max, It take forever during policy change. I set the buffer to 80000 and batch size to 8000 , the training is very smooth until the policy start to update. all environments got freeze , on Windows the freeze time is few seconds, on mac, I'm like around 10min now and still freezed.
     
  17. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    Maybe you have a GPU on your Windows PC, and that is being used to accelerate the policy update phase? Mind you, 10 minutes sounds a bit long...
     
  18. NanushTol

    NanushTol

    Joined:
    Jan 9, 2018
    Posts:
    131
    I'm trying SAC and its so slow its practically unusable in my case, PPO works fine.
    The PPO training for my agents takes around 40~50 hours, I'm trying to improve the training time by using SAC.
    Any ideas?

    Windows PC - Lenovo Legion
    Windows 11 Home
    32 GB RAM
    Intel Core i7-9750H CPU @ 2.60GHz 2.59 GHz
    Nvidia RTX 2070i
    Unity ml agents package version 2.0.1
    Version information:
    ml-agents: 0.30.0,
    ml-agents-envs: 0.30.0,
    Communicator API: 1.5.0,
    PyTorch: 1.7.1+cu110

    I'm working with 3 standalone environments each with 10 agents.
    Episode Length of 3500 steps.
    The environments run with the
    --no-graphics
    command.
    any way of improving this?

    This is a part of my console output:
    you can see that when it reached 500K it start to freezes\slow down for about an hour and a half for each 5000 steps that previously took a few seconds.

    Code (CSharp):
    1.  
    2. [INFO] Driver. Step: 430000. Time Elapsed: 481.979 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
    3. [INFO] Driver. Step: 435000. Time Elapsed: 486.224 s. No episode was completed since last summary. Training.
    4. [INFO] Driver. Step: 440000. Time Elapsed: 490.896 s. No episode was completed since last summary. Training.
    5. [INFO] Driver. Step: 445000. Time Elapsed: 495.746 s. No episode was completed since last summary. Training.
    6. [INFO] Driver. Step: 450000. Time Elapsed: 500.527 s. No episode was completed since last summary. Training.
    7. [INFO] Driver. Step: 455000. Time Elapsed: 503.927 s. No episode was completed since last summary. Training.
    8. [INFO] Driver. Step: 460000. Time Elapsed: 509.504 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
    9. [INFO] Driver. Step: 465000. Time Elapsed: 513.474 s. No episode was completed since last summary. Training.
    10. [INFO] Driver. Step: 470000. Time Elapsed: 516.985 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
    11. [INFO] Driver. Step: 475000. Time Elapsed: 521.133 s. No episode was completed since last summary. Training.
    12. [INFO] Driver. Step: 480000. Time Elapsed: 525.139 s. No episode was completed since last summary. Training.
    13. [INFO] Driver. Step: 485000. Time Elapsed: 530.389 s. No episode was completed since last summary. Training.
    14. [INFO] Driver. Step: 490000. Time Elapsed: 533.989 s. No episode was completed since last summary. Training.
    15. [INFO] Driver. Step: 495000. Time Elapsed: 538.515 s. No episode was completed since last summary. Training.
    16. [INFO] Driver. Step: 500000. Time Elapsed: 542.526 s. No episode was completed since last summary. Training.
    17. [INFO] Driver. Step: 505000. Time Elapsed: 6219.419 s. No episode was completed since last summary. Training.
    18. [INFO] Driver. Step: 510000. Time Elapsed: 12409.326 s. No episode was completed since last summary. Training.
    19. [INFO] Driver. Step: 515000. Time Elapsed: 18577.535 s. No episode was completed since last summary. Training.
    20. [INFO] Driver. Step: 520000. Time Elapsed: 23261.468 s. No episode was completed since last summary. Training.
    21. [INFO] Driver. Step: 525000. Time Elapsed: 29433.116 s. No episode was completed since last summary. Training.
    This is my config yaml:

    Code (CSharp):
    1.  
    2. default_settings: null
    3. behaviors:
    4.   Driver:
    5.     trainer_type: sac
    6.     hyperparameters:
    7.       batch_size: 1024
    8.       buffer_size: 1000000
    9.       learning_rate: 0.0003
    10.       learning_rate_schedule: constant
    11.       # SAC-specific hyperparameters
    12.       buffer_init_steps: 500000
    13.       tau: 0.005
    14.       steps_per_update: 10.0
    15.       save_replay_buffer: true
    16.       init_entcoef: 0.75
    17.       reward_signal_steps_per_update: 10.0
    18.     # Configuration of the neural network
    19.     network_settings:
    20.       normalize: false
    21.       hidden_units: 400
    22.       num_layers: 2
    23.       vis_encode_type: simple
    24.       memory: null
    25.       goal_conditioning_type: none
    26.       deterministic: false
    27.     behavioral_cloning:
    28.       demo_path: Demos/
    29.       strength: 0.00028
    30.     reward_signals:
    31.       extrinsic:
    32.         gamma: 0.99
    33.         strength: 1.0
    34.       gail:
    35.         strength: 0.00055
    36.         gamma: 0.99
    37.         demo_path: Demos/
    38.         use_actions: true
    39.         use_vail: false
    40.         network_settings:
    41.           normalize: false
    42.           hidden_units: 256
    43.           num_layers: 2
    44.     # Trainer configurations
    45.     init_path: null
    46.     keep_checkpoints: 50
    47.     checkpoint_interval: 1000000
    48.     max_steps: 50000000
    49.     time_horizon: 128
    50.     summary_freq: 5000
    51.     threaded: false
    52.     self_play: null
    53. env_settings:
    54.   env_path: null
    55.   env_args: null
    56.   base_port: 5005
    57.   num_envs: 1
    58.   num_areas: 1
    59.   seed: -1
    60.   max_lifetime_restarts: 10
    61.   restarts_rate_limit_n: 1
    62.   restarts_rate_limit_period_s: 60
    63. engine_settings:
    64.   width: 84
    65.   height: 84
    66.   quality_level: 0
    67.   time_scale: 10
    68.   target_frame_rate: -1
    69.   capture_frame_rate: 60
    70.   no_graphics: false
    71. environment_parameters:
    72. checkpoint_settings:
    73.   run_id: sac
    74.   initialize_from: null
    75.   load_model: false
    76.   resume: false
    77.   force: true
    78.   train_model: false
    79.   inference: false
    80.   results_dir: results
    81. torch_settings:
    82.   device: null
    83. debug: false
    84.  
     
    Last edited: Jan 24, 2023
  19. OmarVector

    OmarVector

    Joined:
    Apr 18, 2018
    Posts:
    130
    I've fixed by re-installing tensor flow for metal bec it seems was not installed correctly
     
    hughperkins likes this.
  20. byouloh

    byouloh

    Joined:
    Sep 12, 2022
    Posts:
    2
    The same issue for me. Following your advices, I have tried several tests. My final test is:

    Code (CSharp):
    1.     trainer_type: sac
    2.     hyperparameters:
    3.       learning_rate: 0.0003
    4.       learning_rate_schedule: linear
    5.       batch_size: 2048
    6.       buffer_size: 204800
    7.       buffer_init_steps: 204800
    Before buffer_size not equal to buffer_init_steps, or buffer_init_steps =0, 1000, 10000, it was frozen about in 30 minites, but when buffer_init_steps=buffer_size=204800, it keeps training without frozen for 8 hours and finally frozen in 8 hours.

    I found 3 functions related to buffer_init_step parameter in ml-agents\mlagents\trainers\trainer\off_policy_trainer.py.

    def _is_ready_update(self) -> bool:
    def _update_policy(self) -> bool:
    def _update_reward_signals(self) -> None:

    Looking into first function def _is_ready_update(self) -> bool:

    def _is_ready_update(self) -> bool:
    """
    Returns whether or not the trainer has enough elements to run update model
    :return: A boolean corresponding to whether or not _update_policy() can be run
    """
    return (
    self.update_buffer.num_experiences >= self.hyperparameters.batch_size
    and self._step >= self.hyperparameters.buffer_init_steps
    )

    Finally I think freezing can occurre when updating policy and in my settings is_ready_update is always false.

    And then my sac training is nothing, I think. Please give me your advices.
     
    Last edited: Jan 13, 2024