Question Racing Simulator ML-agents

Coolzy · Sep 14, 2020

Hi, I'm having trouble setting up the OnActionReceived() function for mlagents, I'm using the Realistic Car Controller V3 from the asset store. I created a race track and everything works perfectly, other than random behaviour from the car agent. Can anyone please give me some insight on how I should do this? All help is much appreciated.

Code (CSharp):

public override void OnActionReceived(float[] vectorAction)

{

controller.gasInput = Mathf.Clamp(vectorAction[0], 0, 1f);

controller.brakeInput = Mathf.Clamp(vectorAction[1], 0, 1f);

controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);

}

Gas input is for accelerating, values are 0-1 in the controller script.
Brake input is for braking, values are 0-1 in the controller script.
Steer input is for steering, values are -1 to 1 in the controller script.

These are normally managed with the GetAxis horizontal and vertical, as seen in the heuristic method:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = 0;

actionsOut[1] = 0;

actionsOut[2] = 0;

actionsOut[3] = 0;

if (Input.GetAxis("Vertical") == 1)

{

//Accelerating

actionsOut[0] = 1;

}

else if(Input.GetAxis("Vertical") == -1)

{

//Braking

actionsOut[1] = 1;

}

else if (Input.GetAxis("Horizontal") == 1)

{

//Steer Right

actionsOut[2] = 1;

}

else if(Input.GetAxis("Horizontal") == -1)

{

//Steer Left

actionsOut[3] = 1;

}

}

I will accept all criticism as I'm very new in ml-agents and thank you for all comments.

andrewcoh_unity · Sep 14, 2020

Initially, the agent will behave randomly in order to 'explore' the state and action space. Over time, the behavior should converge to something that seems 'intentional', given that you've formulated your reward function and observation space reasonably. This can take a long time depending on your problem. I would let it run for 5M timesteps and monitor your training on tensorboard to see if your reward is increasing properly. https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-Tensorboard.md

Additionally, it looks like your heuristic is using 4 actions whereas your OnActionReceived uses 3. I believe the 3 actions for steering/gas/brake makes sense.

m4l4 · Sep 14, 2020

not sure if that can help, but the continuous action space, outputs values between -1 and 1.
Clamping the values like you did, means that you are ignoring half of the vectorAction 0 and 1.

i think a better approach is to clamp the raw output values between -1 and 1 (it's done automatically, but as suggested by the ML team, better do it a second time), then remap the values to the desired range.

Code (CSharp):

public override void OnActionReceived(float[] vectorAction)

{

controller.gasInput = Mathf.Clamp(vectorAction[0], -1f, 1f);

controller.brakeInput = Mathf.Clamp(vectorAction[1], -1f, 1f);

controller.steerInput = Mathf.Clamp(vectorAction[2], -1f ,1f);

controller.gasInput = Map(controller.gasInput, -1, -1, 0, 1);

controller.brakeInput = Map(controller.brakeInput, -1, -1, 0, 1);

}

//1st range is the original one, 2nd is the desired range

public float Map(float value, float low1, float high1, float low2, float high2){

float mappedValue = low2 + (value - low1) * (high2 - low2) / (high1 - low1);

if(value < low1 || value > high1 || mappedValue < low2 || mappedValue > high2){

Debug.Log("Warning, outputs out of range!!!");

}

return mappedValue;

}

that way, a gasInput value of -0.2, doesn't get ignored, but treated like a +0.4 output

Search Unity

Question Racing Simulator ML-agents

Coolzy

andrewcoh_unity

Unity Technologies

m4l4

Search Unity

Unity ID

Useful Searches

Question Racing Simulator ML-agents

Coolzy

andrewcoh_unity

Unity Technologies

m4l4