Search Unity

EndEpisode() in OnCollisionEnter() triggers twice?

Discussion in 'ML-Agents' started by Roboserg, May 3, 2020.

  1. Roboserg

    Roboserg

    Joined:
    Jun 3, 2018
    Posts:
    83
    I was trying to train an agent to land a plane (video -
    ) but the agent completely disregards my reward penalty for contact speed with the ground. I tried to debug for many hours without luck until I started using Debug.Log() in the OnCollisionEnter(), when the agent lands, the code looks like this:

    Code (CSharp):
    1.     private void OnCollisionEnter(Collision other)
    2.     {
    3.         if (_isDone) return;
    4.         _isDone = true;
    5.        
    6.         var vectorToTarget = target.transform.position - transform.position;
    7.         vectorToTarget.y = 0;
    8.         var distScaled = Mathf.Clamp(vectorToTarget.magnitude / 30, 0, 1);
    9.         var distScaled2 = Mathf.Clamp(vectorToTarget.magnitude / 10, 0, 1);
    10.         var relativeVelocityScaled = Mathf.Clamp(other.relativeVelocity.magnitude / 35, 0, 1);
    11.         var relativeVelocityScaled2 = Mathf.Clamp(other.relativeVelocity.magnitude / 5, 0, 1);
    12.         var facingForwardDot = Vector3.Dot(Vector3.up, transform.up);
    13.         AddReward(4 - distScaled - distScaled2 - relativeVelocityScaled - relativeVelocityScaled2 + facingForwardDot);
    14.        
    15.         Debug.Log(relativeVelocityScaled);
    16.         EndEpisode();
    17.        
    18.     }
    The Debug.Log(relativeVelocityScaled); happens twice in the console even with the second guard of if (_isDone) return; even though I reset the _isDone to false in the OnEpBegin()

    I hope this explains why the agent doesnt learn to slow down but I don't understand why would the OnCollisionEnter run twice in the same episode? EndEpisode(); should end the current episode, but it doesnt immediately?
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Since you're resetting the _isDone flag in OnEpisodeBegin(), seeing the Log message twice should actually indicate that one episode has ended, and a new one just began, right? Regardless of that, it's weird that OnCollisionEnter() is called twice in a row. How and when are you resetting your agent? Are you maybe disabling and re-enabling a collider somewhere?

    Instead of relying on collisions, you might instead penalize downwards velocity below a given height threshold. Something like this, assuming your ground normal is Vector3.up

    Code (CSharp):
    1. float distanceToGround = transform.position.y - groundLevelY;
    2. float downwardVelocity = agentRigidBody.velocity.y;
    3. if (distanceToGround < threshold && downwardVelocity < 0)
    4. {
    5.     float inverseNormalizedDistance = 1f - distanceToGround / threshold;
    6.     AddReward(downwardVelocity * inverseNormalizedDistance);
    7. }
    EDIT: Probaly compare downwardVelocity < -someSmallValue, so that the agent can actually land softly and doesn't hover above the ground.
     
    Last edited: May 3, 2020
  3. Roboserg

    Roboserg

    Joined:
    Jun 3, 2018
    Posts:
    83
    Thanks for the suggestions, I am yet to try your solution. I wanted to give additional proof that OnCollisionEnter() happens twice.

    Here is the code for the OnCollisionEnter() and all the Debug.Logs am I doing - https://i.imgur.com/OGMh5uW.png
    Debug.Break() should pause the game right at the moment of contact, before the agent is reset. So at the precise moment the agent is touching the ground.

    However, as you can see in the following picture, the agent is already teleported and has its starting spawning coordinates - upload_2020-5-4_15-14-43.png

    No wonder I am having trouble training the agent, it gets wrong observations. It seems that
    EndEpisode(); is NOT happening immediately OR the collision happens over several frames (which would be weird, since we teleport the agent, there should not be a collision at the next frame).

    yes, the second episode starts and ends within one frame after the collision! But it doesnt happen every episode so its easy to miss when observing the environment.

    I dont touch the collider. Resetting happens like this - https://i.imgur.com/ai6BGDg.png
     
  4. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Setting a reward in a reactive way outside of where OnActionReceived may lead to unexpected behavior, especially if you are expecting to give a high reward/penalty on the last frame of an episode.

    For example, we hook into the FixedUpdate loop to tick Agents. OnCollisionEnter happens after FixedUpdate according to the MonoBehaviour Timeline from the Unity docs. This means that if you are setting a large negative reward in the last frame before calling EndEpisode, the large negative reward may be associated with the next Reinforcement Learning Step, not the one that has already been processed.

    For things like rewards, ending episodes, etc. please try to call those methods from within the ML-Agents API. Specifically for rewards we recommend calling them from the OnActionReceived virtual method on agent to avoid such pitfalls.
     
  5. Roboserg

    Roboserg

    Joined:
    Jun 3, 2018
    Posts:
    83
    Thanks for the reply.

    I was adding rewards in OnCollisionEnter since it is done in the examples of Unity ML agents in the FoodCollectorAgent.cs

    So if that is not a recommended way, how would you recommend me setting the rewards? If I do it the FixedUpdate / OnActionReceived how do I check if the agent colliders have touched the ground? Simply setting the bool of isLanded = true in the OnCollisionEnter and addings rewards based on that bool in the FixedUpdate with terminaton of the episode?

    Also I still dont understand why OnCollisionEnter would get called twice even though I have
    EndEpisode() in there. Or is it because, as you say, OnCollisionEnter is called after Fixed? But shouldn't EndEpisode() still end the episode immediately?
     
  6. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Does your rigid body have more than one collider? It could be that with the jittering that happening, along with EndEpisode being called after FixedUpdate that the next frame gets another OnCollisionEnter. I don't have your project so i don't know. You could Debug.Log in these methods on one agent to see what the actual timeline is.

    Instead of setting a bool, you could also accumulate reward and poll it from OnActionReceived.

    And yes, our examples are not following all of our own best practices. The irony is not lost on us. We will be updating them as we go. Thank you for pointing the FoodCollector one out.
     
    DonPuno likes this.
  7. Roboserg

    Roboserg

    Joined:
    Jun 3, 2018
    Posts:
    83
    Yes, I used 3x colliders and it seems to have solved the problem of double OnCollisionEnter events, but I still have problems with learning. With rewards being given only at the end of the episode SAC fails completely. PPO learns, but it is weird. Say distanceScaled is the scaled distance from the plane to the landing zone. At the end of the episode I give the reward of (1 - distanceScaled) and the PPO learns to nose dive to the landing zone. Ok, now I remove that reward and instead give a reward for soft landing with (1 - contactGroundSpeed). It works again! Ok, so I combine two rewards to (2 - distanceScaled - contactGroundSpeed) and train from scratch. Now PPO completely ignores the distanceScaled and only learns to soft land. The cumulative reward goes near 1, as it learned to land softly but completely ignores the distance. The agent does have the needed observations, please see the code below.

    Here is the agent code - https://hatebin.com/pigfypfuem
    I could also give you my github repo for the project if you have time to look at this. I know this example is very trivial (moon landing from OpenAI) and 1 year ago I had no problems using same rewards for a 2D rocket landing with Unity ML agents. It did also work 2 days ago for my current 3D plane landing, but now 3 days in a row I cant figure the out the problem.

    Could you rephrase it please? Do I accumulate the reward in OnCollisionEnter and set the reward and EndEpisode in OnActionReceived? That is what I am doing now in onActionReceived:

     
    christophergoy likes this.
  8. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    The only thing that looks suspicious to me is that you're using the velocity from the 'other' collision in your collision detection method. I would expect you to be using the agent's velocity
     
    DonPuno likes this.