Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Buffer Size Parameter - Clarification

Discussion in 'ML-Agents' started by BotAcademy, Jul 18, 2020.

  1. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Hey!

    I was just wondering if I understood the buffer_size parameter correctly. The documentation confused me somehow.

    Documentation

    (default = 10240 for PPO and 50000 for SAC) Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. This should be multiple times larger than batch_size. Typically a larger buffer_size corresponds to more stable training updates. In SAC, the max size of the experience buffer - on the order of thousands of times longer than your episodes, so that SAC can learn from old as well as new experiences.

    Typical range: PPO: 2048 - 409600; SAC: 50000 - 1000000

    My understanding is the following:
    PPO: Policy updates occur every time the number of experiences defined for this parameter are collected. If we set this value to 3k it will take 3k agent steps to collect 3k experiences. Those experiences are then split up into batches of our defined batch_size and fed into the neural network to perform the weight updates. This then happens every 3k agent steps.

    SAC: Defines the size of the experience replay buffer and not the update frequency which is specified in a parameter named steps_per_update.

    If my understanding is correct (please let me know if I got something wrong), I'd rephrase the documentation text to something like this:

    Updated Documentation

    (default = 10240 for PPO and 50000 for SAC) - different behavior for PPO and SAC!
    PPO: Number of experiences to collect before updating the policy model. Corresponds to how many experiences should be collected before we do any learning or updating of the model. This should be multiple times larger than batch_size.Typically a larger buffer_size corresponds to more stable training updates.
    SAC: max size of the experience buffer - on the order of thousands of times greater than the episode length so that SAC can learn from old as well as new experiences.

    Typical range: PPO: 2048 - 409600; SAC: 50000 - 1000000


    If I understood it correctly, please let me know if you prefer the updated text, so that I can make a pull request on GitHub.
     
  2. vincentpierre

    vincentpierre

    Unity Technologies

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,
    Your understanding is correct and I think this change would clarify our documentation. Do you want to make a PR or do you want me to take care of it?
     
  3. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Okay :) I can make a PR - will post the PR link here during the next hour
     
  4. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32