Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question How to choose observations?

Discussion in 'ML-Agents' started by jatinpawar2805, Jul 21, 2020.

  1. jatinpawar2805

    jatinpawar2805

    Joined:
    Mar 4, 2019
    Posts:
    4
    Hi, community

    I am just getting started with ML-Agents. As of now, I am trying to build a simple "Food Collector" with the sole aim of training an agent to collect food randomly spawned in an area.

    As far as I have understood, for training a model we need the following things:
    1) Decide goal - Collect food spawned randomly over an area
    2) Action the agent can take - 1) Can move forward only 2) Can rotate clockwise and anticlockwise.
    3) Reward - Agents get 1 point to collect (collide with) food and 0.1 points is taken away if it falls off the platform
    4) Observations - This is where I think I am going wrong.

    I tried taking the following sets of observations:
    1) Agent.localPosition and Food.localPosition
    2) Agent.locaPostion , Food.localPosition and Agent.localEulerAngles
    3) Agetn.localPosition, Food.localPosition, Agent.loclaEulerAngles and AgentRigidBody.velocity

    For every set of observation mentioned above, I trained a model which trained for 5,00,000 steps with Decision Requestor with a decision period of 5. But none of the given set of observations gave me the desired results.

    What can/should I do? Where could I be possibly going wrong? What should the approach to building a simple Food Collector?

    PS: I am attaching images of my Heuristic(), OnActionRecieved(), CollectObservation(), onCollisionEnter() for you referance.

    Looking forward to community response.
    2_CollectObservation.png 3_OnActionRecieved.png 4_onCollisionEnter.png
     

    Attached Files:

  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Observations should be transformed to the agent's local space. For the direction towards the food, that would be agent_transform.InverseTransformVector(food_transform.position - agent_transform.position), for the velocity it's agent_transform.InverseTransformVector(agent_rb.velocity). This way, the agent doesn't need to observe its own position.
    Personally, I prefer normalizing my observations via code and set the normalize config parameter to false. AFAIK, enabling the normalize config option will cause the python code to home in on the maximum observation values it receives over time. If you already know the maxima, it's more straight forward to apply them right away: agent_transform.InverseTransformVector(food_transform.position - agent_transform.position) / max_food_distance and agent_transform.InverseTransformVector(agent_rb.velocity) / max_velocity.
    If your agent is moving in 2D space only, like on a flat surface, you can simplify this by passing angles, rather than vectors. The direction then would be Vector3.SignedAngle(agent_transform.forward, food_transform.position - agent_transform.position) / 180f. Dividing by 180f to get a normalized angle between -1 and +1.
     
  3. jatinpawar2805

    jatinpawar2805

    Joined:
    Mar 4, 2019
    Posts:
    4
    Hi mbaske,

    First thanks a lot for your quick response.

    Indeed, as per my requirement, the agent was moving in the 2D flat surface.
    As per your suggestion, I used Vector3.SignedAngle(agent_transform.forward, food_transform.position - agent_transform.position) / 180f and agent_transform.InverseTransformVector(agent_rb.velocity) for observation. I didn't know the max_velocity so I didn't normalise the velocity.

    First I trained without normalizing the signedAngle as well as without normalizing the velocity with normalize config parameter set to false. As per this, I didn't get the desired results.

    Then I use the normalized signedAngle and non-normalized velocity with normalize config parameter set to false. Doing this I got the desired results.

    So thank you so much for helping me out. I would like to ask a few more things so as to deepen my understanding.

    1) What is the importance of normalizing? How can I find out the maximum velocity for my agent? Will normalizing the velocity give me better results?
    2) Is there any difference between agent.localPosition vs using agent.inverseTransformVector ? Why did using agent.inverseTransformVector worked and using agent.localPosition didn't work? I am attaching the image of my environment below. It is basically an empty GameObject named "FoodCollectorARea" which contained the platform, agent and the food.
    3) Beginner question: Intuitively as per me, agent.localPosition and food.localPostion should be the observation. How did you come up with the suggested observations?
     

    Attached Files:

    Last edited: Jul 22, 2020
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Hey, no problem. Yes, normalization will generally give you better results. Normalizing observations makes sure that all input values for the neural network are on the same scale, usually betwen 0 and +1 or between -1 and +1. The network of course doesn't know what those values represent. It's just doing matrix multiplications and large input value changes are likely to affect the output more strongly than small ones. Which can be a problem for learning, if the value ranges don't reflect the real-world importance of what is being observed.
    For instance, let's say your agent is observing distances between 0m and 100m and velocities between 0m/s and 10m/s. Without normalization, a distance change of 10 meters will be 10 times more important to the network than a velocity change of 1m/s. Dividing the distances by 100 and the velocities by 10 first, ensures that inputs (and learning) aren't biased towards distance changes at the expense of velocity changes.

    You could just limit the maximum speed of your agent or perhaps log the velocities you're getting in heuristic mode.
    Yeah, that should work, but it's more difficult to learn. Observing in agent space generally means there's less variation in the input data: a relative direction like "10 meters to the right" always means the same thing, regardless of where exacly the agent and the food are located on the plane or how they are rotated.
    If the agent is just given positions on the other hand (global or local), it needs to infer its own first person perspective towards the food from a lot of different possible third person observations. agent_transform.InverseTransformVector(food_transform.position - agent_transform.position) takes care of that, therefore making the agent's job much easier.
     
    kokimitsunami and ailuropoda0 like this.
  5. jatinpawar2805

    jatinpawar2805

    Joined:
    Mar 4, 2019
    Posts:
    4
    Thanks for the insights, buddy.

    This means normalization indeed plays a big role just like your ability to help out people plays a big role. :)

    I just checked your profile on Unity Forum and got to know that you are doing a really good job in helping the community regarding ML-Agents. This shows your depth of the knowledge about the field. I have just started to learn and am really curious if you have written any blog about your works sharing your knowledge. I would love to see your projects and blogs if any so that I can learn from them. It will also be really great if you can point me towards some good resources for learning and strengthening my knowledge about ML-Agents.

    I started to play around ML-Agents a week before. My aim is to train an agent to shoot the basketball. For doing so I did make a simple environment to train but it failed due to lack of my knowledge about ML-Agents. So I decided to do small projects first and then proceed with the BasketBall stuff. This was on one of the small project named "Food Collector".

    1) While playing around with Behaviour parameters, I faced an issue, as per the given options we can choose discrete (for discrete values like 1,2,3 etc) and continuous (for values like 0.1, 0.2 etc).
    What if I want my agent has to choose from a given range like 1 to 100. For example, for the basketball shooting agent, the agent needs to decide a direction to throw and also needs to decide the force with which to throw the ball. Both the direction and the force will be dependent on how far the basket is from the player. So how can I set up a suitable action space for the force range so that the agent can choose random values from the pre-defined range and shoot the ball?

    2) Is it possible to re-train a trained model to learn additional things using the previous learnings? For example, in the Food Collector game, we have already taught the agent to collect food, now we want to further train the model to collect food kept at edges of the platform, or further train the model to eat different kinds of food depending on certain situations?

    A big thank you for being so generous and helping us all out...... :)
     
  6. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Thanks for the kind words. I haven't written any ML-Agents tutorials myself yet - there are quite a few of them out there already, covering the basics. The official docs is where I've learnt most from https://github.com/Unity-Technologies/ml-agents/blob/release_4_docs/docs/Readme.md Medium is also a good resource https://medium.com/search?q=ml-agents - I might give making tutorials a try if I feel like I have something specific to contribute that hasn't really been explored before. I do however have a youtube channel, which is mainly meant to be a portfolio for a couple of projects I've worked on https://www.youtube.com/channel/UCqMSNJyrG5zWrjl-_hYdF0g/videos Some of the source code for them is on Github, but not everything is up-to-date with newer ML-Agent releases https://github.com/mbaske
    Yeah, the example projects are a great place to start, just what I did.
    I think the actions would need to be continuous, in order to cover the whole range of directions and throwing forces. You could set a threshold for triggering throws: action values below 0 wouldn't do anything, above 0 would cause the agent to throw with max_force * action value. It'll probably speed up learning if you assign rewards depending on how close the ball gets to the basket, at least at first, rather than only rewarding the agent when the ball ends up in the basket - which would take a whole lot of trial and error.
    Only if you don't change the action and observation space size and type. You can always pause and resume training in order to tweak things, but once you've started with specific inputs/outputs, you'll have to stick with them. I sometimes start out with some neutral or placeholder observation values, if I know I'll have different training phases. For instance, a walking agent might first learn how to coordinate its legs, with the target direction always being forward. After it can move itself forward well enough, I would then start randomizing the direction so it can learn turns.
     
  7. jatinpawar2805

    jatinpawar2805

    Joined:
    Mar 4, 2019
    Posts:
    4
    Thank you very much for the head start.
    And yes, from no on I will be referring to the ML _Agent Docs and will reach out to you if I am stuck anywhere. Checked out your work on youtube, and man it is inspiring. I really love the drone agent as well as the hoverbike agents. Drone Agent is on my to-do list. Hope to bump into you soon in search of answers to different questions. :p

    Feel free to not to answer the below-asked questions, if you don't feel right answering them here.
    Well, just out of curiosity, What is your good name? What do you do?
     
  8. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Thanks!
    It's Mathias (that's what the m stands for ;)) Currently learning some Unity stuff after my previous career as a freelance Flash developer.
     
    ProGameDevUser likes this.