Search Unity

Question Making ML agent handle collsion in racing game.

Discussion in 'ML-Agents' started by nscrivanich112325, Mar 25, 2021.

  1. nscrivanich112325

    nscrivanich112325

    Joined:
    Aug 14, 2018
    Posts:
    12
    Hello,

    I'm currently using ML agents for a car racing game where the AI is supposed to race against the player around the track. I'm having trouble trying to get the ML-Agents to properly handle collisions. Whenever I try to train the AI and they drive head-on into a barrier, they get stuck and fail to go in reverse to get unstuck. I have a picture below:

    upload_2021-3-25_17-31-50.png

    I tried placing a trigger in the front of the car and give a penalty to the AI for hitting the throttle and not the brake while hitting a barrier (The brake and reverse are mapped to the same command).

    Code (CSharp):
    1. AddReward((-AIThrottle + AIBrake) * 0.5f);
    However, this was unsuccessful and results in the AI not moving at all in the later steps of training.

    Does anybody know a good way to implement this desired behavior using ML-agents or just some advice on how to properly implement collisions using ML-agents?

    Here are the rewards/penalties I'm currently using:
    • Reward to encourage faster driving:
      Code (CSharp):
      1. AddReward(Mathf.Clamp01(carController.speed / 200f) * 0.1f);
    • Penalty when initially colliding with a barrier.
    • Reward for going through a checkpoint.
    • Penalty for facing the wrong direction
    An episode ends when a car does not make it through a checkpoint within a certain time frame or when the car finishes the race (3 laps).


    Below is the configuration for the hyperparameters and the AIAgent code:

    Hyperparameters:

    behaviors:
    Race:
    trainer_type: ppo
    hyperparameters:
    batch_size: 1024
    buffer_size: 20480
    learning_rate: 3.0e-4
    learning_rate_schedule: linear
    network_settings:
    normalize: false
    hidden_units: 128
    num_layers: 2
    reward_signals:
    extrinsic:
    gamma: 0.99
    strength: 1.0
    max_steps: 10000000
    time_horizon: 64
    summary_freq: 1000000

    Code:
    Code (CSharp):
    1.  
    2. private void Awake()
    3.     {
    4.         carController = GetComponent<RCC_CarControllerV3>();
    5.         curReset = resets[Random.Range(0, (resets.Count - 1))];
    6.  
    7.     }
    8.  
    9.     private void Start()
    10.     {
    11.         checkPointScript.OnCarCorrectCheckPoint += OnCorrectCheckPoint;
    12.         checkPointScript.OnCarWrongCheckPoint += OnWrongCheckPoint;
    13.     }
    14.  
    15.  
    16.  
    17.  
    18.     void OnCorrectCheckPoint(object sender, TrackCheckPoints.CheckPointSystemArgs e)
    19.     {
    20.         if (e.CarTransform == carCollider)
    21.         {
    22.      
    23.             AddReward(1f);
    24.             secondsCount = 0;
    25.  
    26.             if (e.last)
    27.             {
    28.                 laps++;
    29.                 if (laps == 3)
    30.                 {
    31.                     CheckEndEpisode();
    32.                 }
    33.  
    34.             }
    35.  
    36.         }
    37.     }
    38.  
    39.     private void FixedUpdate()
    40.     {
    41.         secondsCount += Time.fixedDeltaTime;
    42.  
    43.         if (secondsCount >= maxEpisodeTime)
    44.         {
    45.             CheckEndEpisode();
    46.         }
    47.         speed = carController.speed;
    48.     }
    49.  
    50.     public override void CollectObservations(VectorSensor sensor)
    51.     {
    52.  
    53.  
    54.         Vector3 checkPos = checkPointScript.getNextCheckpoint(carCollider).position;
    55.         Vector3 dirToTarget = (checkPos - transform.position).normalized;
    56.         float dirDot = Vector3.Dot(transform.forward, dirToTarget);
    57.    
    58.         if(dirDot < 0.1f)
    59.         {
    60.        
    61.             AddReward(-1f);
    62.         }
    63.  
    64.         sensor.AddObservation(dirDot);
    65.         sensor.AddObservation(dirToTarget);
    66.         sensor.AddObservation(frontCol);
    67.         sensor.AddObservation(Mathf.Clamp01(carController.speed/200f));
    68.         sensor.AddObservation(transform.forward);
    69.         sensor.AddObservation(leftSteerAngle / 64f);
    70.         sensor.AddObservation(rightSteerAngle / 64f);
    71.  
    72.     }
    73.  
    74.     public override void OnActionReceived(float[] vectorAction)
    75.     {
    76.  
    77.         AIThrottle =  vectorAction[0];
    78.  
    79.         if (AIThrottle < 0.0f)
    80.         {
    81.             AIThrottle = 0.0f;
    82.         }
    83.  
    84.  
    85.         AIBrake = vectorAction[1];
    86.  
    87.         if (AIBrake < 0.0f)
    88.         {
    89.             AIBrake = 0.0f;
    90.         }
    91.  
    92.  
    93.         AISteer = vectorAction[2];
    94.  
    95.         if (frontCol)
    96.         {
    97.        
    98.             AddReward((-AIThrottle + AIBrake) * 0.5f);
    99.         }
    100.  
    101.         if (carController.speed > 15f)
    102.         {
    103.             AddReward(Mathf.Clamp01(carController.speed / 200f) * 0.1f);
    104.         }
    105.  
    106.     }
    107.  
    108.  
    109.     public override void Heuristic(float[] actionsOut)
    110.     {
    111.         actionsOut[0] = Input.GetAxis(RCC_Settings.Instance.Xbox_triggerRightInput);
    112.         actionsOut[1] = Input.GetAxis(RCC_Settings.Instance.Xbox_triggerLeftInput);
    113.         actionsOut[2] = Input.GetAxis(RCC_Settings.Instance.Xbox_horizontalInput);
    114.     }
    115.  
    116.  
    117.     void OnCollisionEnter(Collision collision)
    118.     {
    119.    
    120.  
    121.         if (collision.gameObject.tag == "Wall" || collision.gameObject.tag == "AICar" || collision.gameObject.tag == "Player")
    122.         {
    123.             AddReward(-1f);
    124.        
    125.         }
    126.     }
    127.  
    128.  
    129.  
    130.      private void OnTriggerEnter(Collider other)
    131.      {
    132.          if (other.gameObject.CompareTag("Wall"))
    133.          {
    134.              frontCol = true;
    135.          }
    136.      }
    137.  
    138.      private void OnTriggerExit(Collider other)
    139.      {
    140.          if (other.gameObject.CompareTag("Wall"))
    141.          {
    142.              frontCol = false;
    143.          }
    144.      }
    145.  
    146.     public override void OnEpisodeBegin()
    147.     {
    148.         this.transform.position = curReset.position;
    149.         this.transform.rotation = curReset.rotation;
    150.         checkPointScript.ResetCheckPoints(carCollider);
    151.         secondsCount = 0;
    152.         laps = 0;
    153.     }
    154.  
    155.     void CheckEndEpisode()
    156.     {
    157.         curReset = resets[Random.Range(0, (resets.Count - 1))];
    158.         if (curReset.GetComponent<ResetSpawn>().CheckSpawn())
    159.         {
    160.             EndEpisode();
    161.         }
    162.  
    163.     }
    164.  

    Any help on this is much appreciated. Thank you.
     
    Last edited: Mar 30, 2021
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    HI, have you tried using raycast sensors to detect the other cars and walls in the game?
     
  3. nscrivanich112325

    nscrivanich112325

    Joined:
    Aug 14, 2018
    Posts:
    12
    Hello,

    Yes, I forgot to mention that I'm using the Ray Perception sensor 3D component.

    upload_2021-3-25_20-21-59.png
     
  4. nscrivanich112325

    nscrivanich112325

    Joined:
    Aug 14, 2018
    Posts:
    12
     

    Attached Files:

  5. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hey,
    there are a few other things in your code that I'm not quite sure I understand.

    You are adding reward values that are really high. You usually want to keep them low within the -1 to +1 range.
    I see you are setting the speed in fixed update? Do you want to set that from your actionBuffer?
     
    nscrivanich112325 likes this.
  6. nscrivanich112325

    nscrivanich112325

    Joined:
    Aug 14, 2018
    Posts:
    12
    Hey, thanks for the response. I did try changing the code to keep the value of the rewards between 0 and 1 (issue still persists). I updated the code and the hyperparameters in the starter thread. The speed variable that you see in FixedUpdate is just a public variable that does nothing. I just have it there so I can see the speed of each agent in the inspector during training. Despite having a penalty for facing the wrong way, the agents cannot grasp the fact that they should only go one way. Is there anything Im missing regarding the observations? I passed in the dot product with the forward vector of the car and the vector from the car to the next checkpoint so the agent can observe the correct direction to go in. Perhaps I just need to train them more. Although, they don't seem to be improving at this point. :(
     
    Last edited: Mar 30, 2021
  7. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    It looks like you have a pretty complex reward function, perhaps a better reward for the direction would be to just reward it for how well it is pointing in the right direction, instead of penalizing it for not pointing in the right direction.

    For example I'd remove:
    Code (CSharp):
    1. if (dirDot < 0.1f)      
    2. {
    3.         AddReward(-1f);
    4. }
    in favor of:
    Code (CSharp):
    1. AddReward(dirDot);
    or something similar.

    This is much easier for the neural network to maximize than not getting any reward signal for going the right direction, but then getting penalized once it reaches an arbitrary threshold set by you.
    Also, if you ever decide to change that if statement, you'll need to retrain your model.

    I'd also remove the penalty for colliding with anything.

    Try to simplify your reward function:
    - Try to avoid conditional rewards unless absolutely necessary
    - Try to give rewards that reflect how well the network is doing
    - For example, it gets a higher reward for facing the checkpoint more directly, and a lower reward for not.

    Let me know if that helps.
     
    dschu and nscrivanich112325 like this.