Question Observation normalisation - relative or not?

Discussion in 'ML-Agents' started by StewedHarry, Jul 28, 2020.

1. StewedHarry

Joined:
Jan 20, 2020
Posts:
45
In the documentation it says that agent observations should be normalised to a range between 0 and 1. I am creating a scenario in which the size of the map can expand and contract. One of my agents observations is the distance to some point on the map. Should I normalise this so that 1 is the largest that any map can be, or is it fine to have 1 as the largest value relative the the size of the current map?

2. StewedHarry

Joined:
Jan 20, 2020
Posts:
45
Just to clarify, 1 would be the furthest point in the map, and the distance to some target would be somewhere between 1 and 0. The question is whether 1 should always be the furthest point in any map, or just for the particular map which the agent is inhabiting. Will the observation and inference be relevant to other maps despite the relative change in the max distance?

3. m4l4

Joined:
Jul 28, 2020
Posts:
81
the agent will learn relationship between distance and other features (speed, time, etc. depending on your code).
Varying inputs can make it harder for the agent to understand those relationship.

4. StewedHarry

Joined:
Jan 20, 2020
Posts:
45
Does that mean having 1 as the relative max is a bad idea, and that a global max (set at 1) should be used?

5. m4l4

Joined:
Jul 28, 2020
Posts:
81
i'm no expert in the field, but here's my reasoning:

imagine using position as gps, to give the agent a sense of its position in the space. on a 2d map, you make it so that the input range goes from -1 to 1, center being (0,0), upper left corner (-1, -1), lower right (1, 1). The agent will learn its relative position to the env, doesn't matter the size of the map.
if i tell you, as a person, that your coordinates in a room are (0, 0.5), you will have no idea about the size of the room, but you will know that you are halfway from the north and south wall, 3/4 from the left wall and 1/4 from the right one.

But you are talking about a precise distance from an object/area, i suppose the agent will use this info to determine stuff like how much time it takes to get there, how many actions can it take while covering that distance, how much food does it need to get there alive (just random guessing your code).
What if i tell you to learn to do something in 1 minute, but 1 minute does not always correspond to 1 minute?

So yes, i think a global max can lead to a more stable comprehension of the environment.
Of course i might be wrong, as i said i'm no expert after all

6. StewedHarry

Joined:
Jan 20, 2020
Posts:
45
I would have thought, if the maps were randomised during training and the agent couldn't 'overfit' to one particular map size, it would learn that the sizes were relative, based on it's other observations about the world.

7. m4l4

Joined:
Jul 28, 2020
Posts:
81
it could easily be as you say, it's an interesting topic, a more professional answer is needed.

StewedHarry likes this.