Search Unity

Question Racing Simulator ML-agents

Discussion in 'ML-Agents' started by Coolzy, Sep 14, 2020.

  1. Coolzy

    Coolzy

    Joined:
    Mar 11, 2018
    Posts:
    3
    Hi, I'm having trouble setting up the OnActionReceived() function for mlagents, I'm using the Realistic Car Controller V3 from the asset store. I created a race track and everything works perfectly, other than random behaviour from the car agent. Can anyone please give me some insight on how I should do this? All help is much appreciated.

    Code (CSharp):
    1. public override void OnActionReceived(float[] vectorAction)
    2.     {
    3.  
    4.         controller.gasInput = Mathf.Clamp(vectorAction[0], 0, 1f);
    5.         controller.brakeInput = Mathf.Clamp(vectorAction[1], 0, 1f);
    6.         controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);      
    7.     }
    Gas input is for accelerating, values are 0-1 in the controller script.
    Brake input is for braking, values are 0-1 in the controller script.
    Steer input is for steering, values are -1 to 1 in the controller script.

    These are normally managed with the GetAxis horizontal and vertical, as seen in the heuristic method:

    Code (CSharp):
    1.    public override void Heuristic(float[] actionsOut)
    2.     {
    3.         actionsOut[0] = 0;
    4.         actionsOut[1] = 0;
    5.         actionsOut[2] = 0;
    6.         actionsOut[3] = 0;
    7.  
    8.         if (Input.GetAxis("Vertical") == 1)
    9.         {
    10.             //Accelerating
    11.             actionsOut[0] = 1;
    12.         }
    13.         else if(Input.GetAxis("Vertical") == -1)
    14.         {
    15.             //Braking
    16.             actionsOut[1] = 1;
    17.         }
    18.         else if (Input.GetAxis("Horizontal") == 1)
    19.         {
    20.             //Steer Right
    21.             actionsOut[2] = 1;
    22.         }
    23.         else if(Input.GetAxis("Horizontal") == -1)
    24.         {
    25.             //Steer Left
    26.             actionsOut[3] = 1;
    27.         }
    28.     }
    I will accept all criticism as I'm very new in ml-agents and thank you for all comments.
     
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Initially, the agent will behave randomly in order to 'explore' the state and action space. Over time, the behavior should converge to something that seems 'intentional', given that you've formulated your reward function and observation space reasonably. This can take a long time depending on your problem. I would let it run for 5M timesteps and monitor your training on tensorboard to see if your reward is increasing properly. https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-Tensorboard.md

    Additionally, it looks like your heuristic is using 4 actions whereas your OnActionReceived uses 3. I believe the 3 actions for steering/gas/brake makes sense.
     
  3. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    not sure if that can help, but the continuous action space, outputs values between -1 and 1.
    Clamping the values like you did, means that you are ignoring half of the vectorAction 0 and 1.

    i think a better approach is to clamp the raw output values between -1 and 1 (it's done automatically, but as suggested by the ML team, better do it a second time), then remap the values to the desired range.

    Code (CSharp):
    1. public override void OnActionReceived(float[] vectorAction)
    2.     {
    3.         controller.gasInput = Mathf.Clamp(vectorAction[0], -1f, 1f);
    4.         controller.brakeInput = Mathf.Clamp(vectorAction[1], -1f, 1f);
    5.         controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);
    6.  
    7.         controller.gasInput = Map(controller.gasInput, -1, -1, 0, 1);
    8.         controller.brakeInput = Map(controller.brakeInput, -1, -1, 0, 1);
    9.     }
    10.  
    11.       //1st range is the original one, 2nd is the desired range
    12.     public float Map(float value, float low1, float high1, float low2, float high2){
    13.         float mappedValue = low2 + (value - low1) * (high2 - low2) / (high1 - low1);
    14.  
    15.         if(value < low1 || value > high1 || mappedValue < low2 || mappedValue > high2){
    16.             Debug.Log("Warning, outputs out of range!!!");
    17.         }
    18.         return mappedValue;
    19.     }
    20.  
    21.  
    that way, a gasInput value of -0.2, doesn't get ignored, but treated like a +0.4 output