Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Question The docs say that decreasing "num_epoch" makes the traning more stable

Discussion in 'ML-Agents' started by hamik166, Oct 1, 2020.

  1. hamik166

    hamik166

    Joined:
    Oct 19, 2019
    Posts:
    31
    Number of passes to make through the experience buffer when performing gradient descent optimization.The larger the batch_size, the larger it is acceptable to make this. Decreasing this will ensure more stable updates, at the cost of slower learning.

    this is from the docs: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md

    but I do not understand why, or is it a mistake? why would more passes make something faster but less stable? I don't understand how it works.

    Could someone explain it to me?
     
  2. henrypeteet

    henrypeteet

    Unity Technologies

    Joined:
    Aug 19, 2020
    Posts:
    37
    Thanks for posting. I have included an explanation below for why increasing num_epochs can change stability and speed. Please let me know if anything doesn't make sense.

    Why it becomes less stable when reusing the same data over and over:
    Learning from the same data more than once increases the likelihood of overfitting. The less diverse the data is the worse this becomes (hence the comment about larger buffer sizes being more forgiving since a larger number of examples is more likely to be diverse).

    Why it is faster:
    Gathering data (running a game + inference) takes a large amount of time, and commonly takes more time than learning if you have a complex game. This means that reusing the same data multiple times can save on the cost of simulating more rounds.
     
    frieldhelm446 likes this.
  3. Sab_Rango

    Sab_Rango

    Joined:
    Aug 30, 2019
    Posts:
    121

    Hey! I want to learn ML agents' brains working principles, and create my own brain system to get the best of result. how can start digging around and play round with brain system in python!

    Should I learn Deep learning first?
     
  4. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    The best way to start would be to start experimenting with ML-Agents.

    If you are interested in the technical details of reinforcement learning, there are lots of good tutorials out there e.g. https://spinningup.openai.com/en/latest/
     
    Sab_Rango likes this.
  5. blockimperium

    blockimperium

    Joined:
    Jan 21, 2008
    Posts:
    452

    Totally understand the argument for overfitting, but if you decrease the number of epochs and your curiosity doesn't allow you to explore the space enough, you'll never get to a global minima. How can one account for that use case while lowering the epochs? Don't see anything for implementing knockout and the like or managing the experiences in the docs.
     
  6. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    On the contrary, if you're not exploring well, you'll keep seeing the same data in your buffer, so even with a low number of epochs you'll overfit. The opposite case though is possible - that you explore far too much and your epochs are too low to learn anything. Think of it in the same way as having a very low learning rate. Given that your environment is finite, though, it's likely to eventually learn, but after a very long time.
     
  7. blockimperium

    blockimperium

    Joined:
    Jan 21, 2008
    Posts:
    452
    What I really want to accomplish is to enable enough variety that the network doesn't overfit to the environment and can still generalize. I know that was an issue with DeepRacer and the like. Train them long enough and they just end up learning the track and can't generalize to other track scenarios.

    I wanted to use weight decay, level generation to get data augmentation, and dropout (just noticed I called it knockout earlier lol) for the most part. Then use early stopping if it looks like it's getting worse in Tensorboard. With checkpointing I just assumed that the probability of using those approaches would keep me in the "sweet spot" where I'm not underfitting and unable to generalize because I haven't seen enough and I'm lobotomizing the network enough with drop out that it never really 'memorizes' experiences.