Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Question Agent interaction with objects in their environment

Discussion in 'ML-Agents' started by sunnyCallum, Dec 13, 2023.

  1. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    Hi there!

    I am a third-year Computer Science student using Unity's ML-Agents package for my final-year project. Counterintuitively, I have never studied Machine Learning, which would have been helpful in foresight, but I am doing my best now.

    Essentially, I am designing an ML-Agent that resembles a "Sim". Well, in actuality, the agent is a 2D cat. This cat has needs (such as hunger, thirst, boredom, etc) that I am trying to teach to interact with different objects (such as food bowls and scratching posts) to fulfill its needs.

    To start, if anyone would like to share any wisdom that will assist me in this topic or suggest any reading, I would greatly appreciate it. I am new to the ML-Agents package and I am learning on the job.

    Currently, I have a cat agent and a scratching post. The agent increases its fun need and subsequently receives rewards by standing in proximity to the scratching post, but I believe there is a better way of doing this, as I plan on making visual feedback for the user that the cat is interacting with its environment. Could anyone possibly point me in the right direction as to how to handle this in the logic? If so I would greatly appreciate it.
     
  2. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    154
    proximity to the reward givers should be fine, could also check facing for extra visual improvements then just animate the cat based on if it's close and facing the object for the interactions
    reward the cat the longer it stays "alive" and if any "need" reaches 0 then end the episode - i imagine the tough part will be getting it to explore more than 1 interactive object but it should eventually if you get the rewarding right.
     
  3. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    Hey there, thank you for your reply, I will consider that. I was wondering if you could potentially identify what could be causing my training to be messed up. Below is a video of what is happening.

    Could this be an issue with my rewarding? At one point, my training was working fine but then out of nowhere it broke and I don't understand the issue. Its almost as if the agent is intentionally running out of bounds?

    This is how I have my rewarding set up:

    Code (CSharp):
    1. public override void OnActionReceived(ActionBuffers actionBuffers)
    2.     {
    3.         Vector2 controlSignal = Vector2.zero; // The control signal for the agent
    4.         float distanceToTarget = Vector2.Distance(this.transform.localPosition, Target.localPosition); // The distance to the target
    5.  
    6.         // Get the action from the action buffer
    7.         controlSignal.x = actionBuffers.ContinuousActions[0]; // Upwards force
    8.         controlSignal.y = actionBuffers.ContinuousActions[1]; // Sideways force
    9.  
    10.         // Check if the agent is outside of the training area
    11.         if (!trainingArea.bounds.Contains(this.transform.localPosition))
    12.         {
    13.             // Punish the agent
    14.             AddReward(-1f);
    15.             EndEpisode();
    16.         }
    17.  
    18.         // Reward the agent based on the fun need
    19.         // This is to incentivise the agent to keep the fun need high in order to gain more rewards
    20.         AddReward(funNeed / 100f);
    21.  
    22.         // Reward the agent based on the distance to the target
    23.         // This is to incentivise the agent to get closer to the target
    24.         AddReward(1f / distanceToTarget);
    25.  
    26.         // Check if the agent is within the target's collider
    27.         //TODO: Update this so the agent "uses" the object instead of sitting inside of it
    28.         if (targetCollider.bounds.Contains(this.transform.localPosition))
    29.         {
    30.             funNeed += regenRate * Time.deltaTime; // Regenerate the fun need
    31.             AddReward(0.1f); // Reward the agent
    32.             funNeed = Mathf.Clamp(funNeed, 0f, 100f); // Clamp the fun need
    33.         }
    34.  
    35.         // When the fun need is satisfied, end the episode
    36.         if (funNeed >= 100f)
    37.         {
    38.             AddReward(2f);
    39.             EndEpisode();
    40.         }
    41.     }

     
  4. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    154
    impossible to tell what is happening at that speed, try record normal time scale
    what do the episode rewards look like?
    what observations are you giving?
     
  5. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    I tinkered with the time scale and it doesn't seem any different no matter the value I use. I recorded a snippet of the original video I posted frame by frame and from what it seems, the episode is resetting as soon as it starts.



    The rewards I am using are inside the original code block I posted. Here are my observations.

    Code (CSharp):
    1. public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.         // Target and Agent positions
    4.         sensor.AddObservation(Target.localPosition);
    5.         sensor.AddObservation(this.transform.localPosition);
    6.  
    7.         // Agent velocity
    8.         sensor.AddObservation(rBody.velocity.x); // Upwards velocity
    9.         sensor.AddObservation(rBody.velocity.y); // Sideways velocity
    10.  
    11.         // Fun need
    12.         sensor.AddObservation(funNeed / 100f); // Normalise to range of 0, 1
    13.     }
     
  6. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    154
    yh, i meant the rewards it actually gets, not what it can get - (i.e. what the reward summary looks like)
    to adjust the time scale, you can set it in the command line

    it looks like the bounds are wrong, never used it personally so my guess is it's either because you're passing local position instead of world position or it has negative extents

    definitely always test in heuristic mode first to ensure your agent works as expected
     
  7. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    I will have another look today if I get the chance and report back to you. I've used the heuristic mode and the same thing that happens within the videos also happens when in heuristic mode, so I assume it is something to do with the training area. I will set up some debug statements and see what I can observe from that.

    Please could you elaborate on this, I don't quite understand.
     
  8. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    I experimented with my agent such as adding debug statements and changing the logic, and it appears the agent is being forced towards the edge of the training area. I removed the Box Collider 2D from the agent and it stopped moving, though now it doesn't move at all, so I am pretty confused.
     
  9. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    154
    when the agent trains it will give a summary in the command line (normally after 1k steps or so)

    does your training area have a box collider on it then? i would assume that is what is pushing the agent away, you should set it to be a trigger if so.

    how is your agent moved?
    i see you are passing the agent input to the control signal variable but what do you actually do with it?
     
  10. sunnyCallum

    sunnyCallum

    Joined:
    Nov 6, 2021
    Posts:
    8
    Hi there, my apologies for the long reply time, I have been busy with other University projects.

    The training area does have a box collider on it. I will perhaps create a debugging script containing the Rigidbody/Collider of the agent to test how the training area interacts with the agent.

    I'm not quite sure how to answer this question. Does the agent not move based on what is passed from the continuous actions to the control signal?
     
  11. Energymover

    Energymover

    Joined:
    Mar 28, 2023
    Posts:
    33
  12. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    186
    hello!

    1. ML is very difficult to use for even the simplest tasks bc it cheats (missuses reward function and bugs) and really works badly when having any error in the code (training breaks, bad training results)

    As far as I see your cat problem should not be using ML; a state machiene seems like the better choice.
    if you want to optimze the overall needs of the cat then go ahead but:

    The less you let ML do the work the better!

    2. When then to use ML?
    For problems that are nearly impossible to solve with traditional methods; like making a 3D cat move with torque/forces.
    basically everything that has something to do with complex physics based movement; combine it with regular Robotic stuff like PID controllers.



    For reading there is a ton of literature. Use e.g. Google Scholar and search for Mlagents and you'll find many papers explaining both mlagents and PPO
    I find ChatGpt 4 very good for learning
     
  13. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    186
    Seems like problems with unity's Physics system. I ahve no experience with 2D though. In 3D it is a nightmare to setup a Rag Doll (all scales to 1, specific hierarchy, colliders pushing each others, horrible)
     
  14. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    154
    agreed but it's good to do simple tasks to begin with so you can see what works and what doesn't - trying to train an agent on tasks that are not really possible to do yourself (via heuristics) leads to way more troubleshooting until you understand how MLA should work but yes it serves little practical purpose other than learning to use MLA for such tasks.

    nah, if you place a collider inside another collider, the two colliders will apply force to push themselves apart (assuming they are able to take forces - i.e. not static etc), this is true for 2D and 3D.

    ragdolls are very simple in unity as long as you have the normal workflow for your character(s), if you don't you can just use character joints and create your own ragdoll, it's still very simple given how flexible it is
     
  15. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    186
    Haha don't get me started o_O

    I needed at least 30 hours of trying out until my custom Ragdoll with config joints was working properly (multiple colliders per object). It just doesn't work out of the box how one would expect.
    E.g. if you have an arm, first has x,y rotation restricted, z is free and put the forearm as only x movement, it goes crazy. i
    still have no clue why fixed z solves the problem/by what it is caused.

    Also I assumed that sibling objects won't collide with each other (if no collision is checked on the parent, but they do push out each other).

    Fixed joint aren't acutally fixed but extremely wobly so you have to crank up some properties for the physics solver (also not God given knowledge)

    If you have scale != 1 it the Transforms gets scewed; applying config joints to already rotated parts is also a very bad idea.

    Custom script for deactivating collisions between siblings

    And also spent too much time figuring out how to set up joints so that they point in the right direction (also not God given that the X axis is the main axis of the jointspace, local to the object's transform)

    All in all it's a bad system with too many pitfalls that aren't explained anywhere. I saw countless videos on the internet where people encounter the same problems with erratic limbs teleporting everywhere.

    I am also not some average Joe but have a Bachelor's Degree in Games Engineering (no work experience though); Joints really killed me bc of the lack of examples and documentation..
     
  16. GamerLordMat

    GamerLordMat

    Joined:
    Oct 10, 2019
    Posts:
    186
    "agreed but it's good to do simple tasks to begin with so you can see what works and what doesn't - trying to train an agent on tasks that are not really possible to do yourself (via heuristics) leads to way more troubleshooting until you understand how MLA should work but yes it serves little practical purpose other than learning to use MLA for such tasks."
    of course, we agree :) ; It was just regarding his Bachelor's Thesis so it should be maybe a somewhat more complicated use case