Search Unity

  1. Unity 2019.4 has been released.
    Dismiss Notice
  2. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice
  3. Ever participated in one our Game Jams? Want pointers on your project? Our Evangelists will be available on Friday to give feedback. Come share your games with us!
    Dismiss Notice

Decision Requester, policy update and time step questions

Discussion in 'ML-Agents' started by Jean-Diddy, Jun 29, 2020 at 12:27 PM.

  1. Jean-Diddy

    Jean-Diddy

    Joined:
    Monday
    Posts:
    3
    Hello,
    I have few questions about the decision requester. When I increase the period of decision in the RollerBall tutorial, it speeds up the game. Why is it happening ?
    I also wanted to know if a policy adjustment for a PPO trainer is what we call a decision taken by the agent.
    If we set the decision requester period to 1, a decision is going to be taken at each steps ? Does it mean the PPO policy is going to be updated at each step ?
    Or, is the PPO policy updated only at the end of an episode ?
    An other question : what is the duration of one step ? Does it depend on your CPU / GPU ?
    Thank you for your help !
     
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    48
    Hi @Jean-Diddy, the Unity engine has two types of "Steps" that happen regularly when running, the Update and FixedUpdate. ML-Agents updates its agents on the FixedUpdate step. The decision period represents how many FixedUpdates occur before the agent queries its neural network for an additional action. So increasing the period means less evaluations of the neural network, and hence a speedup. And yes, how long one of these decisions takes is dependent on your CPU/GPU.

    PPO policy is updated every time the agent takes
    buffer_size
    decisions (typically in the thousands to tens of thousands).
     
  3. Jean-Diddy

    Jean-Diddy

    Joined:
    Monday
    Posts:
    3
    Thank you very much @ervteng_unity.

    If I understood well, by increasing the period of the Decision Requester, the agent will accumulate more "rewards" or datas before adding an other kind of ation. But that means that taking a decision, or adding a new kind of action for the agent costs certain quite important amount of time ?

    About the buffer size, I thought that it was like an agent's bag filled by the agent with a plenty of observations until this number of observations reach the buffer_size value. However, if buffer size has an influence when the agent take a decision, why doesn't it impact the speed of the game ?
     
  4. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    48
    The buffer size does impact the speed of the game! Sometimes you can see the game stutter a bit when the buffer is full.

    A bit of context: the buffer functions differently in PPO and SAC. In SAC it functions as you described, as a "bag" of experiences that the trainer draws from. In PPO, once the buffer is full, those experiences are used to update hte policy and then the buffer is wiped, and the process is started over.
     
  5. Jean-Diddy

    Jean-Diddy

    Joined:
    Monday
    Posts:
    3
    Thank you @ervteng_unity, I didn't pay attention to this game stuttering.

    An other question came up to my mind : is there a way to increase the decision requester period without speeding up the game ? Or is it just impossible ? Because if I want to have a game which last 1 minute and I increase the decision period it will change my game duration, and I don't want that.
     
    Last edited: Jul 2, 2020 at 3:18 PM
  6. seboz123

    seboz123

    Joined:
    Mar 7, 2020
    Posts:
    14
    So just for clarification: In PPO agents collect obeservations into the buffer until it is full, then sample batch size experiences and update the model with them and then clear the buffer? Is just one batch_size used for an update? Or does PPO use epochs*batch_size for an update?

    Also I dont get why the model stutters then: Is it because it is performing the update steps of the policy?

    And for SAC: When the agent collect obersavtions when the buffer is full, does it just drop the old ones?
    Thanks!
     
unityunity