Is it possible to have an agent know about something, rather than be curious all the time?

Moonwilles · Mar 17, 2020

Hello

I have been experimenting with ML-Agents and have created a 2D environment in which an agent must run around and collect food while avoiding obstacles. I have managed to train the agent and it learned very quickly. From the agent's behavior, though, it can be deducted that the agent is only collecting the food objects after it spots them via its perception sensors. If, for example, the agent collects 8 out of 10 food objects, it will never collect the other 2 if they happen to spawn in a tricky spot (the image I have attached is a perfect example). Instead, it will keep roaming around until it happens to find it.

I am using both visual observations (as already mentioned) and the following vector observations:

The agent's normalized position;

The food objects' normalized X coordinates;

The food objects' normalized Y coordinates.

I have tried training with curiosity both true and false, but it seems to have had no effect.

My question is:
Is it possible to have the agent know where something is, rather than keep searching until it finds it? I guess some might say that this type of behavior can be hard-coded, which is something I'm aware of, but I'd still like to know if it can be done.

Should you require more information, I'd be more than happy to provide it. Thank you.

celion_unity · Mar 18, 2020

Just to clarify - you're using RayPerceptionSensors, or something like a CameraSensor or RenderTexture sensor? Generally we don't consider RayPerceptionSensor "visual" since they only generate 1-D data.

You say you're already giving the object's normalize coordinates - are you doing this for all the objects, or (for example) the closest one? If you're doing it for all of them, you might also need to add a "boolean" observation to note the ones that have been collected already.

What does your reward look like? It might help to give a small negative penalty at each step (1/agent.maxSteps) so that the agent has to collect the food faster to avoid receiving a penalty.

You could also try rendering the whole maze (if you're not already), or create a sensor that produces a "heatmap" of the objects, something like 1/(1+distance) to the nearest object, where distance is either the euclidean or manhattan distance, or the graph distance (so that it accounts for walls). The agent should be able to quickly learn to move in the direction of increasing "heat" to get the reward.

Some more discussion here of how to specify a general "visual" observation without a camera: https://forum.unity.com/threads/can...for-a-visual-observation.832906/#post-5568580

Moonwilles · Mar 18, 2020

Hello. Many thanks for your reply, it is much appreciated.

Just to clarify - you're using RayPerceptionSensors, or something like a CameraSensor or RenderTexture sensor? Generally we don't consider RayPerceptionSensor "visual" since they only generate 1-D data.
Click to expand...

Yes, I was using just RayPerceptionSensors.

You say you're already giving the object's normalize coordinates - are you doing this for all the objects, or (for example) the closest one? If you're doing it for all of them, you might also need to add a "boolean" observation to note the ones that have been collected already.
Click to expand...

I have created two environments: one with multiple food objects and another with just one. With multiple objects, I was adding each of them to a list, which leads to another issue. It might sound stupid, but I don't know how to add vector observations of objects in a list. Can you please explain how to do it?
With one object, I was able to just add its normalized coordinates as observations.

For the record, this is how I am doing it:

Code (CSharp):

public override void CollectObservations()

{

AddVectorObs(transform.position.normalized);

AddVectorObs(tankArea.instantiatedFood.transform.position.normalized.x);

AddVectorObs(tankArea.instantiatedFood.transform.position.normalized.y);

}

I have put the single object environment away for now, and will be using the one with multiple food objects.

What does your reward look like? It might help to give a small negative penalty at each step (1/agent.maxSteps) so that the agent has to collect the food faster to avoid receiving a penalty.
Click to expand...

- I am giving the agent -0.1 in an OnCollisionStay2D like so:

Code (CSharp):

private void OnCollisionStay2D(Collision2D collision)

{

if (collision.transform.CompareTag("Obstacle"))

{

//Debug.LogWarning("Colliding with wall!");

AddReward(-0.1f);

}

}

This is to discourage the agent from colliding with obstacles.

- A penalty of 0.1 per step like you mentioned.
- 2 each time it collects a food object.
- 3 if it collects them all.

After reading your answer, I have tried adding a CameraSensorComponent to each area like so (I put the screenshots inside a spoiler to try and keep the post clean):

Unfortunately, it seems like this did not affect the agent at all. It still couldn't notice the "tricky" ones, like those near the corners hidden behind walls. Did I do it properly?
It is worth mentioning that this time, the positions of the objects were not given, but then again, when the agent was training with a single object, it still failed to find it despite being given its coordinates.

You could also try rendering the whole maze (if you're not already), ...
Click to expand...

What do you mean by "render the whole maze"? Can this be done using the steps in this link:
https://docs.unity3d.com/Manual/class-RenderTexture.html

..or create a sensor that produces a "heatmap" of the objects, something like 1/(1+distance) to the nearest object, where distance is either the euclidean or manhattan distance, or the graph distance (so that it accounts for walls). The agent should be able to quickly learn to move in the direction of increasing "heat" to get the reward.
Click to expand...

I must admit I am a little bit of a beginner, and this is too complicated for me. I have no idea how I could implement something like this.

To be honest with you, I am trying to develop a small prototype in which a user can click somewhere to create a way-point which the agent will then collect, similar to how movement works in an RTS game. Again, I am aware this could be hard coded using path-finding algorithms, but I still would very much like to do it.

celion_unity · Mar 19, 2020

OK, I think I understand part of the problem now. Because you're using
transform.position.normalized.x
(and .y), you're basically taking the position vector, normalizing it (so that it has length 1.0), then getting the coordinates. That means that all of the points on the yellow line here will have the same coordinates (assuming the center of the maze is roughly 0, 0)

Similarly, your agent doesn't really know where it is, just what direction it is from the origin.

When we talk about normalizing the observations, that means scaling them so that they're in the range [-1, 1] or [0, 1], which is a different operation. There's a formula in that doc link, but basically you'll want to do something like
var observation_x = (food.position.x - minXValue)/(maxXValue - minXValue);
and similar for the y value. You'll have to determine minXValue and maxXValue from the size and position of your walls; I'm not sure if there's an easy way to do this in Unity, but if you get stuck, let me know and I'll try to find a way.

I don't know how to add vector observations of objects in a list.
Click to expand...

This is a bit tricky - the observation needs to be a constant size. Let me get back to you on that soon.

Moonwilles · Mar 19, 2020

Hello, thanks again for your reply.

I have decided to switch back to the environment with one food object because it's easier to add its coordinates as observations. I have removed the CameraSensorComponent because it didn't seem like it was improving the training.

Thanks for pointing out my normalization mistake. I think I'm doing it correctly now.

I've reasoned it out like this:
The minimum and maximum Y values that can be traversed by the agent are -7.5 and 7.5 respectively. The minimum and maximum X values are -10.5 and 10.5 respectively.

I have declared 4 variables for these values:

Code (CSharp):

float minPositionX, maxPositionX;

float minPositionY, maxPositionY;

Since I have multiple duplicates of the learning area, I figured I couldn't simply use those values alone, because they would be incorrect for the areas which are not at 0, 0. Since the different TankLearningArea duplicates will be in different positions, I am calculating the minimum and maximum values by adding them to the TankLearningArea's position like so:

Code (CSharp):

void Start()

{

minPositionX = tankArea.transform.position.x - 10.5f;

maxPositionX = tankArea.transform.position.x + 10.5f;

minPositionY = tankArea.transform.position.y - 7.5f;

maxPositionX = tankArea.transform.position.y + 7.5f;

//Debug.LogWarning((transform.position.x - minPositionX) / (maxPositionX - minPositionX));

tankBody = gameObject.transform;

tankRigidbody2D = GetComponent<Rigidbody2D>();

Physics2D.IgnoreLayerCollision(0, 8);

}

Then, I added the observations:

Code (CSharp):

public override void CollectObservations()

{

AddVectorObs((gameObject.transform.position.x - minPositionX) / (maxPositionX - minPositionX));

AddVectorObs((gameObject.transform.position.y - minPositionY) / (maxPositionY - minPositionY));

AddVectorObs((tankArea.instantiatedFood.transform.position.x - minPositionX) / (maxPositionX - minPositionX));

AddVectorObs((tankArea.instantiatedFood.transform.position.y - minPositionY) / (maxPositionY - minPositionY));

AddVectorObs(collected);

}

Unfortunately, this too didn't improve the training. The agent only acts when it spots them via the RayPerceptionSensors, otherwise it simply spins around in circles. It seems as if the agent has no idea where the stuff is despite being given observations. I even tried removing the obstacles and leaving just the border, but nothing has changed. I set the curiosity back to true as a last resort, but it didn't change anything.
My goal is, if it can even be done, to get the agent to get to the target using just vector observations.

In case you want to give it a look, here is a link to my project (tried uploading it here but it's too large):
https://we.tl/t-czvsSG9rKO

I am using ML-Agents 0.13.1 and Unity 2019.3.0f6.
The trainer_config.yaml settings are the same as the ones for the Pyramids example.

Moonwilles · Apr 3, 2020

Hello again.

I have updated my ML-Agents to 0.15.0 (from 0.13.1) and thought I'd follow this tutorial: https://github.com/Unity-Technologies/ml-agents/blob/0.15.0/docs/Learning-Environment-Create-New.md

To accelerate the training, I created an additional 19 duplicates of the area.
With 20 areas training simultaneously, the agent was able to learn in very little time. What amazed me, though, was the fact that the sphere could go to the target using just vector observations. I also wrote a simple script which lets me change the target location to where I click on the floor. It worked wonderfully, because as soon as I changed the target's position, the sphere would immediately stop rolling towards the old target and start rolling towards the new one. This is the closest to what I wanted to achieve.

I tried creating another version of this small example by replacing the sphere with a cube and adding 4 small cubes to act as obstacles). I initially had an issue with the cube. With the exact same code used in the sphere example, the cube would not move up, down, left, or right. It would only move diagonally. I used
rBody.AddTorque(controlSignal * speed)
and it seems to have solved the problem.

My intention was to use just vector observations. I also never managed to add a list of items to an agent's observations, so I added them individually. This is what I've added:

Code (CSharp):

public override void CollectObservations(VectorSensor sensor)

{

// Target and Agent positions

sensor.AddObservation(Target.localPosition);

sensor.AddObservation(this.transform.localPosition);

sensor.AddObservation(this.mBounds.size);

// Agent velocity

sensor.AddObservation(rBody.velocity.x);

sensor.AddObservation(rBody.velocity.z);

sensor.AddObservation(obstacles[0].transform.localPosition);

sensor.AddObservation(mesh0.bounds.size);

sensor.AddObservation(obstacles[1].transform.localPosition);

sensor.AddObservation(mesh1.bounds.size);

sensor.AddObservation(obstacles[2].transform.localPosition);

sensor.AddObservation(mesh2.bounds.size);

sensor.AddObservation(obstacles[3].transform.localPosition);

sensor.AddObservation(mesh3.bounds.size);

sensor.AddObservation(taskFinished);

}

The reasoning behind this was that if the agent knows the obstacles' positions and sizes, it could learn how far it can move from a particular obstacle's position given its size. The results were actually not bad, although it still occasionally hugged the walls. Keep in mind that only vector observations were used here. To stop it from getting stuck to the walls, I added the zero friction physics material to the colliders.

I would like you to explain something, please. I have noticed that slight changes to the environment after training seem to be negatively impacting the agent. If the size of the floor is changed from 1x1 to 2x2, for example, the agent would not go to a target that is in a position that is too far from those that were in the training, as if there is some sort of invisible boundary stopping it. Is this normal? What can be done to fix it?

Search Unity

Unity ID

Useful Searches

Is it possible to have an agent know about something, rather than be curious all the time?

Moonwilles

celion_unity

Moonwilles

celion_unity

Moonwilles

Moonwilles