Search Unity

Question Observation normalisation - relative or not?

Discussion in 'ML-Agents' started by StewedHarry, Jul 28, 2020.

  1. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    In the documentation it says that agent observations should be normalised to a range between 0 and 1. I am creating a scenario in which the size of the map can expand and contract. One of my agents observations is the distance to some point on the map. Should I normalise this so that 1 is the largest that any map can be, or is it fine to have 1 as the largest value relative the the size of the current map?
     
  2. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    Just to clarify, 1 would be the furthest point in the map, and the distance to some target would be somewhere between 1 and 0. The question is whether 1 should always be the furthest point in any map, or just for the particular map which the agent is inhabiting. Will the observation and inference be relevant to other maps despite the relative change in the max distance?
     
  3. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    the agent will learn relationship between distance and other features (speed, time, etc. depending on your code).
    Varying inputs can make it harder for the agent to understand those relationship.
     
  4. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    Does that mean having 1 as the relative max is a bad idea, and that a global max (set at 1) should be used?
     
  5. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    i'm no expert in the field, but here's my reasoning:

    imagine using position as gps, to give the agent a sense of its position in the space. on a 2d map, you make it so that the input range goes from -1 to 1, center being (0,0), upper left corner (-1, -1), lower right (1, 1). The agent will learn its relative position to the env, doesn't matter the size of the map.
    if i tell you, as a person, that your coordinates in a room are (0, 0.5), you will have no idea about the size of the room, but you will know that you are halfway from the north and south wall, 3/4 from the left wall and 1/4 from the right one.

    But you are talking about a precise distance from an object/area, i suppose the agent will use this info to determine stuff like how much time it takes to get there, how many actions can it take while covering that distance, how much food does it need to get there alive (just random guessing your code).
    What if i tell you to learn to do something in 1 minute, but 1 minute does not always correspond to 1 minute?

    So yes, i think a global max can lead to a more stable comprehension of the environment.
    Of course i might be wrong, as i said i'm no expert after all :)
     
  6. StewedHarry

    StewedHarry

    Joined:
    Jan 20, 2020
    Posts:
    45
    I would have thought, if the maps were randomised during training and the agent couldn't 'overfit' to one particular map size, it would learn that the sizes were relative, based on it's other observations about the world.
     
  7. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    it could easily be as you say, it's an interesting topic, a more professional answer is needed.
     
    StewedHarry likes this.
  8. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    You could also observe different metrics in order to give the agent additional information. If it needs to know its relative position on a map, as well as absolute distances to detectable objects or specific points on the map, why not observe both? Relative agent coordinates would scale with the map size, assuming upper left corner (-1, -1), lower right (1, 1). But distance_to_point/detection_radius would always be a fraction between 0 and 1 that's independent of the map size.