Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only.

    Please, do not make any changes to your username or email addresses at id.unity.com during this transition time.

    It's still possible to reply to existing private message conversations during the migration, but any new replies you post will be missing after the main migration is complete. We'll do our best to migrate these messages in a follow-up step.

    On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live.


    Read our full announcement for more information and let us know if you have any questions.

Question Train Machine Learn Agent to Drive the Standard Asset Car from Unity

Discussion in 'ML-Agents' started by DiogoQueiroz, Mar 17, 2021.

  1. DiogoQueiroz

    DiogoQueiroz

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hi all,

    After looking deep into the Internet my team and I are creating this threat to find some help and resources.
    We are creating a procedural maze with a start position, and a random-ish end position. Our idea is to make a
    Machine Learn agent drives the car provided by the Standard Asset from Unity but we are not having any success with it. We managed to make a simpler agent run through the maze and found the end, but for some reason, the car gets keeps running into walls. We have tried different hyperparameters and observations, we also tried P.P.O., S.A.C. and even using Immitation.

    If someone has any advice or resource I would appreciate any help.
    below is the agent code --> CarAgent.cs
    Code (CSharp):
    1. public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.  
    4.         sensor.AddObservation(transform.localPosition); // 3
    5.         sensor.AddObservation(this.transform.forward); // 3
    6.         sensor.AddObservation(this.transform.InverseTransformPoint(target.transform.position)); // 3
    7.         sensor.AddObservation(this.transform.InverseTransformVector(carBody.velocity)); // 3
    8.         sensor.AddObservation(m_Car.CurrentSteerAngle); // 1
    9.         sensor.AddObservation(m_Car.CurrentSpeed); // 1
    10.     }
    11.  
    12.     public override void OnActionReceived(ActionBuffers actions)
    13.     {
    14.         if (completedRace) return;
    15.      
    16.         MoveAgent(actions);
    17.      
    18.         if (transform.localPosition.y < -0.5f)
    19.         {
    20.             StopCar();
    21.             transform.localPosition = raceManager.startPoint;
    22.             EndEpisode();
    23.         }
    24.  
    25.         if (carBody.transform.up.y < 0.75f)
    26.         {
    27.             StopCar();
    28.             EndEpisode();
    29.         }
    30.  
    31.         if (StepCount == MaxStep)
    32.         {
    33.             StopCar();
    34.             EndEpisode();
    35.         }
    36.  
    37.         AddReward(-1f / MaxStep);
    38.  
    39.     }
    40.  
    41.     private void OnTriggerEnter(Collider other)
    42.     {
    43.         if (other.CompareTag("Target"))
    44.         {
    45.             //StartCoroutine(Finished());
    46.             StopCar();
    47.             completedRace = true;
    48.             AddReward(1f)
    49.         }
    50.     }
    51.  
    52.     private void OnCollisionEnter(Collision other)
    53.     {
    54.         if (other.gameObject.CompareTag("Wall"))
    55.         {
    56.             AddReward(-0.05f);
    57.         }
    58.     }
    59.  
    60.     private void OnCollisionStay(Collision other)
    61.     {
    62.         if (other.gameObject.CompareTag("Wall"))
    63.         {
    64.             AddReward(-1f / MaxStep);
    65.         }
    66.     }
    And this is the last behavior parameter we tried --> CarAgent.yaml
    Code (CSharp):
    1. behaviors:
    2.   CarAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 64
    6.       buffer_size: 10240
    7.       learning_rate: 1e-3
    8.       beta: 1e-2
    9.       epsilon: 0.15
    10.       lambd: 0.93
    11.       num_epoch: 8
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 256
    16.       num_layers: 3
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.       curiosity:
    23.         strength: 0.05
    24.         gamma: 0.99
    25.         encoding_size: 256
    26.         learning_rate: 3e-4
    27.     keep_checkpoints: 5
    28.     max_steps: 1e8
    29.     time_horizon: 128
    30.     summary_freq: 10000
    31.     threaded: true
    Thanks.
     
    casbas likes this.
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @DiogoQueiroz,
    Have you tried using The Raycast sensor to detect the walls? We had helped an internal team train carts to drive before and they trained relatively quickly and learned to avoid the walls.

    your per step reward penalty may be incentivizing the agent to “kill itself” more quickly to avoid getting a lower reward.

    could you also post your MoveAgent method?
     
  3. DiogoQueiroz

    DiogoQueiroz

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hi @christophergoy, I believe we are using Raycast to detect walls, we might not be using it correctly because this studies are new for us. Below is the snapshot, we can also see in this snapshot that Raycast can detect the finish point as well.
    upload_2021-3-17_11-7-20.png

    And how we could make the car avoid killing itself and go directly to the finish point?
    Below is the code for the MoveAgent
    Code (CSharp):
    1. private void MoveAgent(ActionBuffers actionBuffers)
    2.     {
    3.         // var discreteActions = actionBuffers.DiscreteActions;
    4.         // float accel = 0;
    5.         // float steer = 0;
    6.         //
    7.         // var action = discreteActions[0];
    8.         // switch (action)
    9.         // {
    10.         //     case 1:
    11.         //         accel = 1f;
    12.         //         break;
    13.         //     case 2:
    14.         //         accel = -1f;
    15.         //         break;
    16.         //     case 3:
    17.         //         steer = 1f;
    18.         //         break;
    19.         //     case 4:
    20.         //         steer = -1f;
    21.         //         break;
    22.         // }
    23.  
    24.         //var continuousActions = actionBuffers.ContinuousActions;
    25.         //var accel = Mathf.Clamp(continuousActions[0], -1f, 1f);
    26.         //var steer = Mathf.Clamp(continuousActions[1], -1f, 1f);
    27.  
    28.         float forwardAmout = 0f;
    29.         float turnAmout = 0f;
    30.  
    31.         switch (actionBuffers.DiscreteActions[0])
    32.         {
    33.             case 0:
    34.                 forwardAmout = 0f;
    35.                 break;
    36.             case 1:
    37.                 forwardAmout = +1f;
    38.                 break;
    39.             case 2:
    40.                 forwardAmout = -1f;
    41.                 break;
    42.         }
    43.      
    44.         switch (actionBuffers.DiscreteActions[1])
    45.         {
    46.             case 0:
    47.                 turnAmout = 0f;
    48.                 break;
    49.             case 1:
    50.                 turnAmout = +1f;
    51.                 break;
    52.             case 2:
    53.                 turnAmout = -1f;
    54.                 break;
    55.         }
    56.      
    57.         m_Car.Move(turnAmout, forwardAmout, forwardAmout, 0f);
    58.  
    59.     }
     
  4. casbas

    casbas

    Joined:
    Nov 22, 2015
    Posts:
    2
    Hi @christophergoy,

    I'm on @DiogoQueiroz team, and just to complement his answer and give you some more info.
    We are trying to train it increasing the complexity little by little. The training starts in a small and empty area like the one in Diogo's snapshot and after some 30 episodes, it increases the complexity a little bit, after the complexity is at the maximum, we increase the maze size and start this process again. Should we try a different flow?

    Also, we are currently using the decision requester period of 3. As the agent moves fast, we tried to use it with a lower value (1), but then the agent basically doesn't move away from the start. What we should look at to find the best value for this?

    I'm running a training for almost 10mi steps and the agent keeps being stuck in the wall like this... upload_2021-3-17_11-52-33.png
     
  5. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Thanks for all of the info @DiogoQueiroz and @casbas,
    Just to clarify, do the walls and the goal target have different tags that are detectable by the raycasts? I could imagine a situation where it thinks the goals and the walls are the same if they aren't differentiated. It may see the wall and think it's headed toward the goal.
     
  6. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    This sounds reasonable to me, there is a property workflow for this called curriculum learning within ML-Agents that you could use. It allows you to pass different environment parameters to the Unity Environment from python based on how well the agent is doing in the current Curriculum.

    3 sounds reasonable, you could try to bump it up to 5 to see if you get better results.
     
  7. DiogoQueiroz

    DiogoQueiroz

    Joined:
    Feb 19, 2019
    Posts:
    9
    Hi @christophergoy, we have different tags shown below. The walls are only one mesh with a mesh collider, this could be an issue with the detection?
    upload_2021-3-17_12-36-24.png


    But if we increase this time for the decision requester this means that the agent will take longer to make a decision?
     
  8. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Yes, it means that every 5 steps, the agent will make a decision.
     
  9. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    For the kart game we worked with, the ray casts were spread all around the vehicle. I'm not sure if your car can back up or not, but it doesn't seem to have any raycast vision behind the front bumper which may make it think it can just back up and turn a certain way when in fact, it can not.
     
  10. DiogoQueiroz

    DiogoQueiroz

    Joined:
    Feb 19, 2019
    Posts:
    9
    We do have more ray casts, I just selected one by mistake.
    upload_2021-3-17_12-44-7.png
     
  11. DiogoQueiroz

    DiogoQueiroz

    Joined:
    Feb 19, 2019
    Posts:
    9


    So, this is a shot gif from our car moving through the maze. Below I'm showing the graph from TensorBoard.
    upload_2021-3-17_14-21-42.png
    upload_2021-3-17_14-21-54.png
    upload_2021-3-17_14-22-15.png

    I'm not sure if those graphs are looking good or the values are going the way they should. Any insight on it?
    Thanks for all the help.
     
  12. WaxyMcRivers

    WaxyMcRivers

    Joined:
    May 9, 2016
    Posts:
    59
    I'm working on something similar and found a few things useful:

    This may seem obvious and you probably have done this: play using heuristic mode and log all rewards, go around with the car and test all possible cases to make sure rewards/penalties are being sent to the agent with the values you'd expect.

    Curiosity + Penalties can cause a survivorship bias (mentioned above). I found it worth while to go as bare bones as possible with network/RL alg parameters and aim for the most simple version of your goal (ie. no curiosity and the most simple version of the task).

    Training only on the first part of the curriculum (empty area w/ walls) and getting a stable model that doesnt run into walls can be used in initialize-from for the next step (pretraining). This can be a sanity check that things are coded properly. If your car is still running into walls after training in the open area, something is wrong with your perception.

    You can consider using GAIL and or Behavioral Cloning to jump start your learning a little bit via demonstrations. This page in the mlagents docs is very informative. If you do this, you will most likely have to create a more sparse reward system.

    I have to give huge props to mbaske - his videos on youtube and his repos are great for learning from. His grid sensor example has a self driving car that you may be able to pull inspiration from.

    My naieve guess is that either your perception is messed up (tags/layers) or your reward presentation isn't representing the concept of your goal to the agent.
     
  13. casbas

    casbas

    Joined:
    Nov 22, 2015
    Posts:
    2
    Hey, @WaxyMcRivers thanks for the hints. We are still trying to find a good way to train it.
    At the moment I'm trying to get a stable model as you said, in just an empty area.

    This youtube channel has a lot of good stuff, I hope it will help. Thanks!