Search Unity

Understanding Normalization

Discussion in 'ML-Agents' started by chrisk, May 5, 2021.

  1. chrisk

    chrisk

    Joined:
    Jan 23, 2009
    Posts:
    704
    Hi, I just discovered ML-Agent and I'm having quite a lot of fun using it!
    I have some background working with NN and it wasn't really easy to deal with it until now. Many thanks for making a wonderful toy. ^^

    I'm pretty sure there will be lots more questions but right now I'm a bit puzzled about normalization.

    Generally, normalizations help to get rid of noises thus help NN to solve faster but in my case, it actually takes longer. I ran two different sessions exactly the same with just normalized/unnormalized Observations and ML Agent takes longer to go to the target location.

    I guess I'm just trying to understand what's going on.

    For example, my Positions are between [-20,20] and I normalize them to [-1, 1] before feeding them to Observations. Positions are evenly spread.
    There is a configuration option in network_settings->normalize. I tried to set it both true and false and it doesn't seem to make much difference.

    My questions are.

    1. Why normalizing Positions for the Observation will take longer to converge? Is it ok not to normalize them if it take faster to converge?

    2. If input Positions are not normalized, how does it affect the output? i.e., outputs come out not normalized until it converges eventually. Well, right now ML-Agent doesn't seem to provide fine-grain controls(sigmoid and such) on inputs/outputs and it will help understand what's going on.

    3. What does the normalize configuration option do? How do I properly use it? Do I need to set it to true/false when Observations inputs are normalized or not? Or is it an option to let the ML engine do the normalization internally? It wasn't clear in the doc.

    4. Normalize options can appear in two different places. network_settings->normalize, and reward_signals->extrinsic->network_settings->normalize. If they don't match, what happens? Is it like an override option?

    5. Right now, I'm mixing Continuous and Discrete Input/Output at the same time. NN solver usually deals with only one type of data and I would like to understand how Unity is solving handling two different types of data at the same time. And do you recommend separating the two different data types into separate networks?

    Thanks.

    ps. I'm using the latest ML-Agent (2.0-pre and the main Github branch) on 2020.3 LTS and my configuration is below

    behaviors:
    Hero:
    trainer_type: ppo
    hyperparameters:
    batch_size: 128
    buffer_size: 2048
    learning_rate: 0.0003
    beta: 0.01
    epsilon: 0.2
    lambd: 0.95
    num_epoch: 3
    learning_rate_schedule: linear
    network_settings:
    normalize: true
    hidden_units: 256
    num_layers: 3
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    network_settings:
    normalize: false
    keep_checkpoints: 5
    max_steps: 500000
    time_horizon: 128
    summary_freq: 1000
     
  2. Neohun

    Neohun

    Joined:
    Oct 30, 2018
    Posts:
    77
    Hi, first of all you ask a lot of question at the same time which means you're really confused are you sure that you have checked the ml agents tutorials and examples docs carefully..
    But I'll answer some of your questions:

    normalized inputs are the best practice when feeding into the CollectObservations() func if you normalized them already then you shouldn't normalized them again via hyperparameters since they're already normalized.. and you can use localPosition to don't make them affected by the position change in world space when you move or rotate your environment.. but let's say it's a grid based position like 20x20 grid and you want to go for row : 5 and column 10 then there's no need for normalization since the possible move positions discrete and it can be even harmful as mentioned in the docs:
    quote "
    Whether normalization is applied to the vector observation inputs. This normalization is based on the running average and variance of the vector observation. Normalization can be helpful in cases with complex continuous control problems, but may be harmful with simpler discrete control problems."

    also since you can manually control the normalization there's no need for normalization from the hyperparameters unless there's no other option..

    if your inputs range doesn't change constantly it won't affect the output since it's already a normalized output [-1f, 1f]..

    I have answered this above if you have normalized them while adding to the vector observation you shouldn't set it as true because it'll take the average value of the inputs and normalize the observations based on this average..
    I didn't see such a thing in the docs that the rewards are normalized I think you're mistaken check again.. you just should normalize them manually as best practice.. between each episode the rewards should be between [-1f, 1f] or [0f, 1f] according to your need..

    I don't know what happens on python side.. this is a question for ml-agens team..
     
    Last edited: May 5, 2021
  3. chrisk

    chrisk

    Joined:
    Jan 23, 2009
    Posts:
    704
    Hi, Neohun, thanks for the answers. I learned something new and I appreciate it. I've read most of the documentation and it seems pretty sparse when it comes to understand how internals work. I guess it's inherent to AI but the implementation information can save us time tuning hyperparameters. But in general, the doc is much better than most of the other Unity docs. ^^ However, I still don't understand why having normalized Positions will slow down the optimizing process despite the explanations in the doc.
     
  4. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi, @chrisk,
    Can you show us the code of how you are normalizing locations? It can be a bit tricky, but it should help with training if it's done correctly.
     
  5. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    To answer questions 4 and 5, if you set the normalize option for rewards and/or the network itself, that means that it will normalize the values coming into the network based on what it's already seen using a min, max, and standard deviation.

    If you are using hybrid actions, it is actually using two networks behind the scenes. There are other forum posts that ask about the paper that this was based on if you search. If you cannot find it, please let me know and I will try to help.