EndEpisode() in OnCollisionEnter() triggers twice?

Roboserg · May 3, 2020

I was trying to train an agent to land a plane (video -
) but the agent completely disregards my reward penalty for contact speed with the ground. I tried to debug for many hours without luck until I started using Debug.Log() in the OnCollisionEnter(), when the agent lands, the code looks like this:

Code (CSharp):

private void OnCollisionEnter(Collision other)

{

if (_isDone) return;

_isDone = true;

var vectorToTarget = target.transform.position - transform.position;

vectorToTarget.y = 0;

var distScaled = Mathf.Clamp(vectorToTarget.magnitude / 30, 0, 1);

var distScaled2 = Mathf.Clamp(vectorToTarget.magnitude / 10, 0, 1);

var relativeVelocityScaled = Mathf.Clamp(other.relativeVelocity.magnitude / 35, 0, 1);

var relativeVelocityScaled2 = Mathf.Clamp(other.relativeVelocity.magnitude / 5, 0, 1);

var facingForwardDot = Vector3.Dot(Vector3.up, transform.up);

AddReward(4 - distScaled - distScaled2 - relativeVelocityScaled - relativeVelocityScaled2 + facingForwardDot);

Debug.Log(relativeVelocityScaled);

EndEpisode();

}

The Debug.Log(relativeVelocityScaled); happens twice in the console even with the second guard of if (_isDone) return; even though I reset the _isDone to false in the OnEpBegin()

I hope this explains why the agent doesnt learn to slow down but I don't understand why would the OnCollisionEnter run twice in the same episode? EndEpisode(); should end the current episode, but it doesnt immediately?

mbaske · May 3, 2020

Since you're resetting the _isDone flag in OnEpisodeBegin(), seeing the Log message twice should actually indicate that one episode has ended, and a new one just began, right? Regardless of that, it's weird that OnCollisionEnter() is called twice in a row. How and when are you resetting your agent? Are you maybe disabling and re-enabling a collider somewhere?

Instead of relying on collisions, you might instead penalize downwards velocity below a given height threshold. Something like this, assuming your ground normal is Vector3.up

Code (CSharp):

float distanceToGround = transform.position.y - groundLevelY;

float downwardVelocity = agentRigidBody.velocity.y;

if (distanceToGround < threshold && downwardVelocity < 0)

{

float inverseNormalizedDistance = 1f - distanceToGround / threshold;

AddReward(downwardVelocity * inverseNormalizedDistance);

}

EDIT: Probaly compare downwardVelocity < -someSmallValue, so that the agent can actually land softly and doesn't hover above the ground.

Roboserg · May 4, 2020

Thanks for the suggestions, I am yet to try your solution. I wanted to give additional proof that OnCollisionEnter() happens twice.

Here is the code for the OnCollisionEnter() and all the Debug.Logs am I doing - https://i.imgur.com/OGMh5uW.png
Debug.Break() should pause the game right at the moment of contact, before the agent is reset. So at the precise moment the agent is touching the ground.

However, as you can see in the following picture, the agent is already teleported and has its starting spawning coordinates -

No wonder I am having trouble training the agent, it gets wrong observations. It seems that
EndEpisode(); is NOT happening immediately OR the collision happens over several frames (which would be weird, since we teleport the agent, there should not be a collision at the next frame).

mbaske said: ↑

seeing the Log message twice should actually indicate that one episode has ended, and a new one just began, right?
Click to expand...

yes, the second episode starts and ends within one frame after the collision! But it doesnt happen every episode so its easy to miss when observing the environment.

mbaske said: ↑

How and when are you resetting your agent? Are you maybe disabling and re-enabling a collider somewhere?
Click to expand...

I dont touch the collider. Resetting happens like this - https://i.imgur.com/ai6BGDg.png

christophergoy · May 4, 2020

shebotnov said: ↑

I was trying to train an agent to land a plane (video -
) but the agent completely disregards my reward penalty for contact speed with the ground. I tried to debug for many hours without luck until I started using Debug.Log() in the OnCollisionEnter(), when the agent lands, the code looks like this:

Code (CSharp):

private void OnCollisionEnter(Collision other)

{

if (_isDone) return;

_isDone = true;

var vectorToTarget = target.transform.position - transform.position;

vectorToTarget.y = 0;

var distScaled = Mathf.Clamp(vectorToTarget.magnitude / 30, 0, 1);

var distScaled2 = Mathf.Clamp(vectorToTarget.magnitude / 10, 0, 1);

var relativeVelocityScaled = Mathf.Clamp(other.relativeVelocity.magnitude / 35, 0, 1);

var relativeVelocityScaled2 = Mathf.Clamp(other.relativeVelocity.magnitude / 5, 0, 1);

var facingForwardDot = Vector3.Dot(Vector3.up, transform.up);

AddReward(4 - distScaled - distScaled2 - relativeVelocityScaled - relativeVelocityScaled2 + facingForwardDot);

Debug.Log(relativeVelocityScaled);

EndEpisode();

}

The Debug.Log(relativeVelocityScaled); happens twice in the console even with the second guard of if (_isDone) return; even though I reset the _isDone to false in the OnEpBegin()

I hope this explains why the agent doesnt learn to slow down but I don't understand why would the OnCollisionEnter run twice in the same episode? EndEpisode(); should end the current episode, but it doesn't immediately?
Click to expand...

Setting a reward in a reactive way outside of where OnActionReceived may lead to unexpected behavior, especially if you are expecting to give a high reward/penalty on the last frame of an episode.

For example, we hook into the FixedUpdate loop to tick Agents. OnCollisionEnter happens after FixedUpdate according to the MonoBehaviour Timeline from the Unity docs. This means that if you are setting a large negative reward in the last frame before calling EndEpisode, the large negative reward may be associated with the next Reinforcement Learning Step, not the one that has already been processed.

For things like rewards, ending episodes, etc. please try to call those methods from within the ML-Agents API. Specifically for rewards we recommend calling them from the OnActionReceived virtual method on agent to avoid such pitfalls.

Roboserg · May 4, 2020

Thanks for the reply.

I was adding rewards in OnCollisionEnter since it is done in the examples of Unity ML agents in the FoodCollectorAgent.cs

So if that is not a recommended way, how would you recommend me setting the rewards? If I do it the FixedUpdate / OnActionReceived how do I check if the agent colliders have touched the ground? Simply setting the bool of isLanded = true in the OnCollisionEnter and addings rewards based on that bool in the FixedUpdate with terminaton of the episode?

Also I still dont understand why OnCollisionEnter would get called twice even though I have
EndEpisode() in there. Or is it because, as you say, OnCollisionEnter is called after Fixed? But shouldn't EndEpisode() still end the episode immediately?

christophergoy · May 4, 2020

Does your rigid body have more than one collider? It could be that with the jittering that happening, along with EndEpisode being called after FixedUpdate that the next frame gets another OnCollisionEnter. I don't have your project so i don't know. You could Debug.Log in these methods on one agent to see what the actual timeline is.

Instead of setting a bool, you could also accumulate reward and poll it from OnActionReceived.

And yes, our examples are not following all of our own best practices. The irony is not lost on us. We will be updating them as we go. Thank you for pointing the FoodCollector one out.

Roboserg · May 7, 2020

christophergoy said: ↑

Does your rigid body have more than one collider?
Click to expand...

Yes, I used 3x colliders and it seems to have solved the problem of double OnCollisionEnter events, but I still have problems with learning. With rewards being given only at the end of the episode SAC fails completely. PPO learns, but it is weird. Say distanceScaled is the scaled distance from the plane to the landing zone. At the end of the episode I give the reward of (1 - distanceScaled) and the PPO learns to nose dive to the landing zone. Ok, now I remove that reward and instead give a reward for soft landing with (1 - contactGroundSpeed). It works again! Ok, so I combine two rewards to (2 - distanceScaled - contactGroundSpeed) and train from scratch. Now PPO completely ignores the distanceScaled and only learns to soft land. The cumulative reward goes near 1, as it learned to land softly but completely ignores the distance. The agent does have the needed observations, please see the code below.

christophergoy said: ↑

I don't have your project so i don't know.
Click to expand...

Here is the agent code - https://hatebin.com/pigfypfuem
I could also give you my github repo for the project if you have time to look at this. I know this example is very trivial (moon landing from OpenAI) and 1 year ago I had no problems using same rewards for a 2D rocket landing with Unity ML agents. It did also work 2 days ago for my current 3D plane landing, but now 3 days in a row I cant figure the out the problem.

christophergoy said: ↑

Instead of setting a bool, you could also accumulate reward and poll it from OnActionReceived.
Click to expand...

Could you rephrase it please? Do I accumulate the reward in OnCollisionEnter and set the reward and EndEpisode in OnActionReceived? That is what I am doing now in onActionReceived:

christophergoy · May 7, 2020

The only thing that looks suspicious to me is that you're using the velocity from the 'other' collision in your collision detection method. I would expect you to be using the agent's velocity

Search Unity

Unity ID

Useful Searches

EndEpisode() in OnCollisionEnter() triggers twice?