Search Unity

Training car to drive backwards

Discussion in 'ML-Agents' started by Adrian-S492, Mar 13, 2020.

  1. Adrian-S492

    Adrian-S492

    Joined:
    May 21, 2015
    Posts:
    8
    Hello,

    I am working on self driving car for few days. I managed to learn car driving on oval track, but on road with more complicated curves my car doesn't use brakes at all. I am using RayPerceptionSensor3D, car velocity and current steer angle as observation vectors.

    I come up that maybe it is good idea to learn him driving forward as well as backwards, on plain straight road, but I still have issues.

    When I spawn car so it can drive forward agent learns to drive flawless after about 300k steps (final cumulative reward about ~13).
    When I rotate this car 180 degrees so only reasonable choice is to drive backward, it doesn't work even after 3kk steps. Agent drives forward, even though he is at the end of the track and he drives out of the track in less than 0.5s (cumulative reward ~0.04).

    This is core of my AgentAction method.

    Input is continuous, just throttle and turn:
    Code (CSharp):
    1. float turn = Mathf.Clamp(vectorAction[0], -1f, 1f);
    2. float throttle = Mathf.Clamp(vectorAction[1], -1f, 1f);
    3. arcadeCar.UpdateInput(throttle, turn);
    Giving small reward for moving, dependent on car speed. Reward for achieving end of the track and negative reward for going out of the track:
    Code (CSharp):
    1.  if (checkpointsReached > 0)
    2. {
    3.    AddReward(1f);
    4.    Done();
    5. }
    6.    else if(IsOnRoad())
    7. {
    8.    reward = Mathf.Abs(arcadeCar.GetSpeed()) * 0.001f - 0.001f;
    9.    AddReward(reward);
    10. }
    11. else
    12. {
    13.    Done();
    14.    reward = -1f;
    15.    SetReward(reward);
    16. }
    I use default trainer_config file, with increased number of max_steps.

    What is interesting, I managed to learn car driving backwards when I changed throttle input to:
    Code (CSharp):
    1. float throttle = Mathf.Clamp(vectorAction[1], -1f, 0f);
    Does anyone have idea how else can I learn car to drive backwards, other than forcing it by removing ability to drive forward?
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    How do you calculate arcadeCar.GetSpeed() ? If the speed value is based on forward velocity, then setting it as a reward signal will likely prevent the agent from going backwards.
     
  3. Adrian-S492

    Adrian-S492

    Joined:
    May 21, 2015
    Posts:
    8
    You are right, speed is calculated based on forward velocity:
    Code (CSharp):
    1. public float GetSpeed()
    2. {
    3.    Vector3 velocity = rb.velocity;
    4.    Vector3 wsForward = rb.transform.rotation * Vector3.forward;
    5.    float vProj = Vector3.Dot(velocity, wsForward);
    6.    Vector3 projVelocity = vProj * wsForward;
    7.    float speed = projVelocity.magnitude * Mathf.Sign(vProj);
    8.    return speed;
    9. }
    Can you explain me why it prevents the agent from going backwards? Returned speed is negative when car drives backwards, that's why reward is based on an absolute value of speed so reward is positive.
     
  4. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    EDIT: Sorry, I missed the abs() - I think you can simplify this method quite a bit though by using Transform.InverseTransformVector to get the local/relative velocity of the car. Local velocity z would then be forward/backward speed, and you can use the absolute value of that. However, have you tried setting rewards only for waypoints, but ignoring speed? Theoretically, the agent *should* be able to figure out it needs to get to the waypoints/rewards as fast as possible without receiving explicit speed rewards.
     
  5. Adrian-S492

    Adrian-S492

    Joined:
    May 21, 2015
    Posts:
    8
    I tried idea with Transform.InverseTransformVector, but it doesn't really change anything.

    Second idea sounds good, but it doesn't work either. I set multiple checkpoints on the way and changed reward from

    reward = Mathf.Abs(transform.InverseTransformVector(carRb.velocity).z) * 0.001f - 0.001f;
    to

    reward = -0.001f;

    On average it learns to drive forward after 200k-400k steps.

    I tried to learn driving backwards multiple times, somehow it managed to learn it, but it was 1 out of 10 tries. I tried different beta and epsilon values, but nothing works. It almost always learns that the best option is to drive out of the track as fast as possible.