Search Unity

How to control the car using heuristic option in ML-Agents

Discussion in 'ML-Agents' started by ammad99, Dec 9, 2020.

  1. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    Hello Folks,
    I hope everyone is doing fine. I am new to unity and new to C-sharp coding. I am trying to move the car which is available in the standard asset through the CarAgent Script. The goal is to do reinforcement learning and the car can accelerate, brake or do nothing (3 actions) in order to maintain a distance from the target object let's say another car (which is moving with certain speed with respect to time). For reference I am attaching my C-sharp scripts.
    One thing I would like to mention is I can control the car if I attach carusercontroll script. Instead of using CarAgent Script (Heuristic option).. I don't know what i am missing here... something related to void fixedupdate and void awake is missing i guess but i am not sure

    I would highly appreciate if someone can help me in this regard.
     

    Attached Files:

  2. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    Being a standard asset i doubt there's an error in the code.
    Maybe the problem is in the agent setup.

    the Heuristic Function is called inside the RequestDecision() function.
    Attach a Decision Requester script to the agent. The script will call the function every x time automatically.
    The function will gather observations for the inputs, but without a neural net to process them, it will ask the user to provide outputs through the heuristic function.
     
    ammad99 likes this.
  3. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    Many Thanks m4l4 for your input.

    So what I did is, I already put the decision requester attached to my car but i think what i was missing is decision period which i set to 1 which take action after every one second and check mark "take actions between decisions"
    In behaviour parameters option i changed the space size to 3 (which means i have 3 actions accelerate, brake and do nothing)

    For future reference if someone is having same problem
    the updated CarAgent Script is as follows:

    using UnityEngine;
    using Unity.MLAgents;
    using Unity.MLAgents.Sensors;
    using Unity.MLAgents.Policies;
    using UnityStandardAssets.CrossPlatformInput;


    namespace UnityStandardAssets.Vehicles.Car
    {
    [RequireComponent(typeof(CarController))]
    public class CarAgent : Agent
    {
    private Vector3 originalPosition;

    private Vector3 Targetorginalposition;

    private BehaviorParameters behaviorParameters;

    private CarController carController;

    private Rigidbody rbody;

    public Transform Target;

    public override void Initialize()
    {
    originalPosition = this.transform.localPosition;
    Targetorginalposition = Target.localPosition;

    // here write the orginal position of target

    behaviorParameters = GetComponent<BehaviorParameters>();
    carController = GetComponent<CarController>();
    rbody = carController.GetComponent<Rigidbody>();

    Reset();

    }

    public override void OnEpisodeBegin()
    {
    Reset();
    }

    private void Reset()
    {
    this.transform.localPosition = originalPosition;
    Target.localPosition = Targetorginalposition;
    // here write the code for the position of the target

    }

    public override void CollectObservations(VectorSensor sensor)
    {
    sensor.AddObservation(Target.localPosition);
    sensor.AddObservation(this.transform.localPosition);
    }

    public override void OnActionReceived(float[] vectorAction)
    {
    var direction = Mathf.FloorToInt(vectorAction[0]);

    switch (direction)
    {
    case 0: // do nothing so basically means idle
    break;

    case 1: // move forward
    carController.Move(0f, 1f, 0f, 0f);
    break;

    case 2: // Move backwards
    carController.Move(0f, 0f, -1f, 0f);
    break;

    }

    float distanceToTarget = Vector3.Distance(this.transform.localPosition, Target.localPosition);

    if (distanceToTarget == 8.0f)
    {
    SetReward(1.0f);
    }

    else if (distanceToTarget >= 7.5f || distanceToTarget <= 7.9f) // yahn per koi error a sakta hai.
    {
    SetReward(0.5f);
    }

    else if (distanceToTarget < 7.5f)
    {
    EndEpisode(); // could be Reset
    }

    else if (Target.localPosition.z == 1990.486f)
    {
    EndEpisode();
    }
    // AddReward(-1f/ MaxStep);
    }

    public override void Heuristic(float[] actionsOut)
    {
    actionsOut[0] = 0;
    if (Input.GetKey(KeyCode.UpArrow))
    {
    actionsOut[0] = 1;

    }
    else if (Input.GetKey(KeyCode.DownArrow))
    {
    actionsOut[0] = 2;
    // carController.Move(0f, 0f, -1f, 0);
    }
     
    QuinnCG likes this.
  4. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    looking at the code, seems you are using discrete action space.
    For a car control problem, a continuous action space might be more appropriate.

    Think about it that way:
    With discrete actions, you are choosing to either press the pedal or don't. but if you do, you go full throttle only.
    there's no middle ground since you are not choosing HOW MUCH you want to accelerate.
    same goes with steer and brake.

    with continuous control, the agent will output floats between -1 and 1, you can then multiply the values for your maxSpeed, maxSteer variables, to get the desired acceleration or steer angle.
     
    ammad99 likes this.
  5. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    Once Again thanks m4l4, that is a very good advice. I modified the code accordingly

    Changes in code made:
    public override void OnActionReceived(ActionBuffers actions)
    {
    var accelerate = Mathf.Clamp(actions.ContinuousActions[0], 0f,1f); // values ranges from 0 to 1 for throttle
    var brake = Mathf.Clamp(actions.ContinuousActions[1], -1f, 0f); // values ranges from -1 to 0 for brake
    if (accelerate >= 0 || brake <=0) // should also work without if statement
    {
    carController.Move(0f, accelerate, brake, 0f);
    }
    }
     
  6. m4l4

    m4l4

    Joined:
    Jul 28, 2020
    Posts:
    81
    you are welcome, i've worked on a similar project myself and i remember the headache.

    avoid, clamping the outputs like that.
    clamp(val, 0, 1) means that everything below 0 will be read as 0. That way you are ignoring half of the output.
    instead, remap the values from a (-1, 1) range to a (0, 1) range.

    you can first clamp the output (-1, 1) (it does it automatically, but the docs say it's good practice to do it anyway),
    then remap it with:
    action[0] = (action[0] + 1) * 0.5;

    you'll get a float between 0 and 1, and no part of the output will be ignored or misinterpreted.
     
    ammad99 likes this.
  7. ammad99

    ammad99

    Joined:
    Nov 13, 2020
    Posts:
    14
    yes you are right :) Thanks!