Search Unity

  1. Unity 2019.4 has been released.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice
  3. Ever participated in one our Game Jams? Want pointers on your project? Our Evangelists will be available on Friday to give feedback. Come share your games with us!
    Dismiss Notice

SAC or PPO

Discussion in 'ML-Agents' started by mateolopezareal, Jun 25, 2020.

  1. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    47
    Can I just train my agent by changing the trainer param to sac and adding this lines:
    buffer_init_steps: 0
    tau: 0.005
    steps_per_update: 10.0
    save_replay_buffer: false
    init_entcoef: 0.5
    reward_signal_steps_per_update: 10.0
    ?
    Or do I need to change something in Unity or in the terminal?
     
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    48
    You don't need to change anything in Unity, just the trainer config. Might have to also remove some of the PPO-specific hyperparameters though (the script will error out and tell you which ones).
     
  3. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    47
    I did not have to I have both of the ppo and sac parametrs in the file and it looks like its trining. However I did take almost an hour to take 300000 steps, which is quite a lot, what should I change?
     
  4. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    14
    how long does PPO take for the same environment (300k steps)? SAC is generally slower than PPO
     
  5. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    47
    10 mins
     
  6. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    48
    Generally SAC speed is controlled by the network size (num_layers and hidden_units), and steps_per_update. Decreasing network size and increasing steps_per_update will speed up training. But SAC typically also takes fewer steps to achieve the same reward as PPO, so you might not need to run it as long in the first place.
     
  7. mateolopezareal

    mateolopezareal

    Joined:
    Jun 4, 2020
    Posts:
    47
    Thx a lot, how much can I decrease network size, I have 512 hidden units and 2 layers, and how much increase step per update, i have 10 steps?
     
  8. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    48
    Hi @mateolopezareal, I'd suggest running it out until the reward reaches your desired reward before making changes. Increasing steps_per_update will speed up the step count, but decrease sample efficiency (i.e. take more steps to reach the same reward).

    Network size is trial and error - if it works with 512 and 2 layers, try 256 and 2 layers. If the Q loss and policy loss in the plot keep rising, you'll have to increase the network size.
     
unityunity