Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

SAC or PPO

Discussion in 'ML-Agents' started by mateolopezareal, Jun 25, 2020.

  1. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Can I just train my agent by changing the trainer param to sac and adding this lines:
    buffer_init_steps: 0
    tau: 0.005
    steps_per_update: 10.0
    save_replay_buffer: false
    init_entcoef: 0.5
    reward_signal_steps_per_update: 10.0
    ?
    Or do I need to change something in Unity or in the terminal?
     
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    You don't need to change anything in Unity, just the trainer config. Might have to also remove some of the PPO-specific hyperparameters though (the script will error out and tell you which ones).
     
  3. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    I did not have to I have both of the ppo and sac parametrs in the file and it looks like its trining. However I did take almost an hour to take 300000 steps, which is quite a lot, what should I change?
     
  4. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    how long does PPO take for the same environment (300k steps)? SAC is generally slower than PPO
     
  5. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    10 mins
     
  6. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Generally SAC speed is controlled by the network size (num_layers and hidden_units), and steps_per_update. Decreasing network size and increasing steps_per_update will speed up training. But SAC typically also takes fewer steps to achieve the same reward as PPO, so you might not need to run it as long in the first place.
     
  7. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    54
    Thx a lot, how much can I decrease network size, I have 512 hidden units and 2 layers, and how much increase step per update, i have 10 steps?
     
  8. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Hi @mateolopezareal, I'd suggest running it out until the reward reaches your desired reward before making changes. Increasing steps_per_update will speed up the step count, but decrease sample efficiency (i.e. take more steps to reach the same reward).

    Network size is trial and error - if it works with 512 and 2 layers, try 256 and 2 layers. If the Q loss and policy loss in the plot keep rising, you'll have to increase the network size.
     
  9. unity_-DoCqyPS6-iU3A

    unity_-DoCqyPS6-iU3A

    Joined:
    Aug 18, 2018
    Posts:
    26