Resolved Using imitation learning in Karting Microgame

Wolf00007 · May 5, 2021

Hello,
I'm new to the ML-Agents and was experimenting with the Karting Microgame tutorials. I would like to add imitation learning to my game and I was wondering how to make my Agent Kart take movement from the keyboard. Below is the Kart Agent script used by the agent karts.

So I saw in one of the tutorials that I should override a Heuristic function and add it to this script to allow me to use the arrow keys to record actions of my kart. But how do I do that with this script? I am struggling to find the things I should modify in order to make that work. Any tips would be much appreciated!

Code (CSharp):

using KartGame.KartSystems;

using Unity.MLAgents;

using Unity.MLAgents.Sensors;

using UnityEngine;

using Random = UnityEngine.Random;

namespace KartGame.AI

{

/// <summary>

/// Sensors hold information such as the position of rotation of the origin of the raycast and its hit threshold

/// to consider a "crash".

/// </summary>

[System.Serializable]

public struct Sensor

{

public Transform Transform;

public float RayDistance;

public float HitValidationDistance;

}

/// <summary>

/// We only want certain behaviours when the agent runs.

/// Training would allow certain functions such as OnAgentReset() be called and execute, while Inferencing will

/// assume that the agent will continuously run and not reset.

/// </summary>

public enum AgentMode

{

Training,

Inferencing

}

/// <summary>

/// The KartAgent will drive the inputs for the KartController.

/// </summary>

public class KartAgent : Agent, IInput

{

#region Training Modes

[Tooltip("Are we training the agent or is the agent production ready?")]

public AgentMode Mode = AgentMode.Training;

[Tooltip("What is the initial checkpoint the agent will go to? This value is only for inferencing.")]

public ushort InitCheckpointIndex;

#endregion

#region Senses

[Header("Observation Params")]

[Tooltip("What objects should the raycasts hit and detect?")]

public LayerMask Mask;

[Tooltip("Sensors contain ray information to sense out the world, you can have as many sensors as you need.")]

public Sensor[] Sensors;

[Header("Checkpoints"), Tooltip("What are the series of checkpoints for the agent to seek and pass through?")]

public Collider[] Colliders;

[Tooltip("What layer are the checkpoints on? This should be an exclusive layer for the agent to use.")]

public LayerMask CheckpointMask;

[Space]

[Tooltip("Would the agent need a custom transform to be able to raycast and hit the track? " +

"If not assigned, then the root transform will be used.")]

public Transform AgentSensorTransform;

#endregion

#region Rewards

[Header("Rewards"), Tooltip("What penatly is given when the agent crashes?")]

public float HitPenalty = -1f;

[Tooltip("How much reward is given when the agent successfully passes the checkpoints?")]

public float PassCheckpointReward;

[Tooltip("Should typically be a small value, but we reward the agent for moving in the right direction.")]

public float TowardsCheckpointReward;

[Tooltip("Typically if the agent moves faster, we want to reward it for finishing the track quickly.")]

public float SpeedReward;

[Tooltip("Reward the agent when it keeps accelerating")]

public float AccelerationReward;

#endregion

#region ResetParams

[Header("Inference Reset Params")]

[Tooltip("What is the unique mask that the agent should detect when it falls out of the track?")]

public LayerMask OutOfBoundsMask;

[Tooltip("What are the layers we want to detect for the track and the ground?")]

public LayerMask TrackMask;

[Tooltip("How far should the ray be when casted? For larger karts - this value should be larger too.")]

public float GroundCastDistance;

#endregion

#region Debugging

[Header("Debug Option")] [Tooltip("Should we visualize the rays that the agent draws?")]

public bool ShowRaycasts;

#endregion

ArcadeKart m_Kart;

bool m_Acceleration;

bool m_Brake;

float m_Steering;

int m_CheckpointIndex;

bool m_EndEpisode;

float m_LastAccumulatedReward;

void Awake()

{

m_Kart = GetComponent<ArcadeKart>();

if (AgentSensorTransform == null) AgentSensorTransform = transform;

}

void Start()

{

// If the agent is training, then at the start of the simulation, pick a random checkpoint to train the agent.

OnEpisodeBegin();

if (Mode == AgentMode.Inferencing) m_CheckpointIndex = InitCheckpointIndex;

}

void Update()

{

if (m_EndEpisode)

{

m_EndEpisode = false;

AddReward(m_LastAccumulatedReward);

EndEpisode();

OnEpisodeBegin();

}

}

void LateUpdate()

{

switch (Mode)

{

case AgentMode.Inferencing:

if (ShowRaycasts)

Debug.DrawRay(transform.position, Vector3.down * GroundCastDistance, Color.cyan);

// We want to place the agent back on the track if the agent happens to launch itself outside of the track.

if (Physics.Raycast(transform.position + Vector3.up, Vector3.down, out var hit, GroundCastDistance, TrackMask)

&& ((1 << hit.collider.gameObject.layer) & OutOfBoundsMask) > 0)

{

// Reset the agent back to its last known agent checkpoint

var checkpoint = Colliders[m_CheckpointIndex].transform;

transform.localRotation = checkpoint.rotation;

transform.position = checkpoint.position;

m_Kart.Rigidbody.velocity = default;

m_Steering = 0f;

m_Acceleration = m_Brake = false;

}

break;

}

}

void OnTriggerEnter(Collider other)

{

var maskedValue = 1 << other.gameObject.layer;

var triggered = maskedValue & CheckpointMask;

FindCheckpointIndex(other, out var index);

// Ensure that the agent touched the checkpoint and the new index is greater than the m_CheckpointIndex.

if (triggered > 0 && index > m_CheckpointIndex || index == 0 && m_CheckpointIndex == Colliders.Length - 1)

{

AddReward(PassCheckpointReward);

m_CheckpointIndex = index;

}

}

void FindCheckpointIndex(Collider checkPoint, out int index)

{

for (int i = 0; i < Colliders.Length; i++)

{

if (Colliders[i].GetInstanceID() == checkPoint.GetInstanceID())

{

index = i;

return;

}

}

index = -1;

}

float Sign(float value)

{

if (value > 0)

{

return 1;

}

if (value < 0)

{

return -1;

}

return 0;

}

public override void CollectObservations(VectorSensor sensor)

{

sensor.AddObservation(m_Kart.LocalSpeed());

// Add an observation for direction of the agent to the next checkpoint.

var next = (m_CheckpointIndex + 1) % Colliders.Length;

var nextCollider = Colliders[next];

if (nextCollider == null)

return;

var direction = (nextCollider.transform.position - m_Kart.transform.position).normalized;

sensor.AddObservation(Vector3.Dot(m_Kart.Rigidbody.velocity.normalized, direction));

if (ShowRaycasts)

Debug.DrawLine(AgentSensorTransform.position, nextCollider.transform.position, Color.magenta);

m_LastAccumulatedReward = 0.0f;

m_EndEpisode = false;

for (var i = 0; i < Sensors.Length; i++)

{

var current = Sensors[i];

var xform = current.Transform;

var hit = Physics.Raycast(AgentSensorTransform.position, xform.forward, out var hitInfo,

current.RayDistance, Mask, QueryTriggerInteraction.Ignore);

if (ShowRaycasts)

{

Debug.DrawRay(AgentSensorTransform.position, xform.forward * current.RayDistance, Color.green);

Debug.DrawRay(AgentSensorTransform.position, xform.forward * current.HitValidationDistance,

Color.red);

if (hit && hitInfo.distance < current.HitValidationDistance)

{

Debug.DrawRay(hitInfo.point, Vector3.up * 3.0f, Color.blue);

}

}

if (hit)

{

if (hitInfo.distance < current.HitValidationDistance)

{

m_LastAccumulatedReward += HitPenalty;

m_EndEpisode = true;

}

}

sensor.AddObservation(hit ? hitInfo.distance : current.RayDistance);

}

sensor.AddObservation(m_Acceleration);

}

public override void OnActionReceived(float[] vectorAction)

{

base.OnActionReceived(vectorAction);

InterpretDiscreteActions(vectorAction);

// Find the next checkpoint when registering the current checkpoint that the agent has passed.

var next = (m_CheckpointIndex + 1) % Colliders.Length;

var nextCollider = Colliders[next];

var direction = (nextCollider.transform.position - m_Kart.transform.position).normalized;

var reward = Vector3.Dot(m_Kart.Rigidbody.velocity.normalized, direction);

if (ShowRaycasts) Debug.DrawRay(AgentSensorTransform.position, m_Kart.Rigidbody.velocity, Color.blue);

// Add rewards if the agent is heading in the right direction

AddReward(reward * TowardsCheckpointReward);

AddReward((m_Acceleration && !m_Brake ? 1.0f : 0.0f) * AccelerationReward);

AddReward(m_Kart.LocalSpeed() * SpeedReward);

}

public override void OnEpisodeBegin()

{

switch (Mode)

{

case AgentMode.Training:

m_CheckpointIndex = Random.Range(0, Colliders.Length - 1);

var collider = Colliders[m_CheckpointIndex];

transform.localRotation = collider.transform.rotation;

transform.position = collider.transform.position;

m_Kart.Rigidbody.velocity = default;

m_Acceleration = false;

m_Brake = false;

m_Steering = 0f;

break;

default:

break;

}

}

void InterpretDiscreteActions(float[] actions)

{

m_Steering = actions[0] - 1f;

m_Acceleration = actions[1] >= 1.0f;

m_Brake = actions[1] < 1.0f;

}

public InputData GenerateInput()

{

return new InputData

{

Accelerate = m_Acceleration,

Brake = m_Brake,

TurnInput = m_Steering

};

}

christophergoy · May 5, 2021

Hi @Wolf00007,
You are correct that you need to override the Heuristic method in this script.
The Heuristic method passes a float[] that you would then write to. That array would then get passed to OnActionReceived and your agent would use that array as input.

So just override Heuristic, fill the array with the keyboard input values to match the actions that are read in OnActionReceived.

Let me know if you have any other questions.

Wolf00007 · May 5, 2021

I have added values like below:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

actionsOut[1] = Input.GetButton("Accelerate");

actionsOut[2] = Input.GetButton("Brake");

}

But I'm getting this error for both Accelerate and Brake action:

Which is expected because the these two actions are of bool type in the code:

Code (CSharp):

bool m_Acceleration;

bool m_Brake;

float m_Steering;

So I'm not sure how to add accelerating and braking in this script.
Thanks for your help btw and if you need more info from my side, let me know

christophergoy · May 5, 2021

if get button returns a bool you can do something like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

actionsOut[1] = Input.GetButton("Accelerate") ? 1 : 0;

actionsOut[2] = Input.GetButton("Brake") ? 1 : 0;

}

And if you are setting those variables elsewhere you need to convert back.
It looks like this is done for you in the method InterpretDiscreteActions

Wolf00007 · May 5, 2021

I also tried like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

actionsOut[1] = Input.GetAxis("Vertical");

}

And it kinda worked because I can move the Agent but the Agent still wants to move by itself! I did set the Behavior Type to Heuristic Only and the agent does not have any neural network assigned so I'm not sure what I'm missing here...

Wolf00007 · May 5, 2021

christophergoy said: ↑

if get button returns a bool you can do something like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

actionsOut[1] = Input.GetButton("Accelerate") ? 1 : 0;

actionsOut[2] = Input.GetButton("Brake") ? 1 : 0;

}

Click to expand...

I have just did that and the error was gone but after hitting Play, I got hundreds of errors like this:

christophergoy · May 5, 2021

Ah, so it looks like accelerate was using a float value between -1 and 1. You need to add another discrete action in your behavior parameters so that you have 3 discrete actions instead of 2

christophergoy · May 5, 2021

Wolf00007 said: ↑

I also tried like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

actionsOut[1] = Input.GetAxis("Vertical");

}

And it kinda worked because I can move the Agent but the Agent still wants to move by itself! I did set the Behavior Type to Heuristic Only and the agent does not have any neural network assigned so I'm not sure what I'm missing here...
Click to expand...

This approach works as well. It may be something to do with the default value of the vertical axis. Since it was using floats before it may be getting messed up since you are now using discrete actions (ints)

Wolf00007 · May 5, 2021

christophergoy said: ↑

Ah, so it looks like accelerate was using a float value between -1 and 1. You need to add another discrete action in your behavior parameters so that you have 3 discrete actions instead of 2
Click to expand...

I already have two branches with 3 actions per branch

Again, as this is a microgame, this was already prepared like this so I'm not sure why it has two branches.

christophergoy · May 5, 2021

ah, ok. So that means for the neural network output:
steering looks like:
-1 = turn left
0 = go straight
1 = turn right

for accelerate/brake
-1 = brake
0 = do nothing
1 = accelerate

so you need to map your actions accordingly from the keyboard to these values.

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

actionsOut[0] = Input.GetAxis("Horizontal");

var brakeVal = Input.GetButton("Brake") ? -1 : 0

actionsOut[1] = Input.GetButton("Accelerate") ? 1 : brakeVal;

}

christophergoy · May 5, 2021

Sorry for the misunderstanding on my part. let me know if that helps

Wolf00007 · May 5, 2021

christophergoy said: ↑

Sorry for the misunderstanding on my part. let me know if that helps
Click to expand...

That's okay, thank you for the quick responses

I have tried the mapping like above and the errors are gone but once again, the agent is still trying to move by itself. He seems to be going backwards and steering left only for some reason. I can countersteer his steering but that's it. Do you have any other ideas?

christophergoy · May 6, 2021

Sorry, I was wrong again.

Code (CSharp):

void InterpretDiscreteActions(float[] actions)

{

m_Steering = actions[0] - 1f;

m_Acceleration = actions[1] >= 1.0f;

m_Brake = actions[1] < 1.0f;

}

Steering looks like it's:
0 is turn left
1 is to go straight
2 is to turn right

Accelerate looks like:
0 for brake
1 for do nothing
2 for accelerate

which would change the code I sent you to look like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

var brake = 0f;

var idle = 1f;

var accelerate = 2f;

actionsOut[0] = Input.GetAxis("Horizontal");

var brakeVal = Input.GetButton("Brake") ? brake : idle;

actionsOut[1] = Input.GetButton("Accelerate") ? accelerate : brakeVal;

}

Wolf00007 · May 6, 2021

christophergoy said: ↑

Sorry, I was wrong again.

Code (CSharp):

void InterpretDiscreteActions(float[] actions)

{

m_Steering = actions[0] - 1f;

m_Acceleration = actions[1] >= 1.0f;

m_Brake = actions[1] < 1.0f;

}

Steering looks like it's:
0 is turn left
1 is to go straight
2 is to turn right

Accelerate looks like:
0 for brake
1 for do nothing
2 for accelerate

which would change the code I sent you to look like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

var brake = 0f;

var idle = 1f;

var accelerate = 2f;

actionsOut[0] = Input.GetAxis("Horizontal");

var brakeVal = Input.GetButton("Brake") ? brake : idle;

actionsOut[1] = Input.GetButton("Accelerate") ? accelerate : brakeVal;

}

Click to expand...

Now the car goes forward by itself, still steering right (@Edit - left, sorry)

I noticed that currently I am using both Agent and IInput class (not sure what IInput class is as Visual Studio does not recognize it):

Code (CSharp):

public class KartAgent : Agent, IInput

But when I deleted the IInput class, the agent cannot move anymore but I still can through Heuristics. I was able to record some data through Demonstration Recorder and it looks good (I think) but this seems wrong as I have to disable agent movement (and that's what Heuristic should do?).

Wolf00007 · May 6, 2021

Okay I changed the following:

Code (CSharp):

void InterpretDiscreteActions(float[] actions)

{

m_Steering = actions[0];

m_Brake = actions[1] < 1.0f;

m_Acceleration = actions[1] > 1.0f;

}

without deleting the IInput class and it works as well, but again, it does not work when I then switch to Inference behavior rather than Heuristic. Is it okay to change the code for recording with Demonstration Recorder and then change the code back when training the Agent? It looks like I can't get it to work for both Inference and Heuristic without modifying the code when switching between the two... :/

christophergoy · May 6, 2021

You shouldn't modify the code to record demos since that modified code is how you need to get things to work. I think if you spend a bit more time tinkering you'll eventually get it working.

Wolf00007 · May 7, 2021

I think I found the issue. Agents do not have "idle" as an action - they either go forward or brake/go backwards. This is why in Heuristic behavior they go either forward or backwards by themselves (this depends which values we use for "actionsOut".

This is why I added "+1" to the steering input to balance out the "-1" in the InterpretDiscreteActions function. Also, I changed the rest to be like this:

Code (CSharp):

public override void Heuristic(float[] actionsOut)

{

var brake = 0f;

var accelerate = 1f;

actionsOut[0] = Input.GetAxis("Horizontal") + 1;

actionsOut[1] = Input.GetButton("Brake") ? brake : accelerate;

}

The car still goes forward by itself but I don't see how I could change this without changing the way Agents move (so modifying the rest of the code). But I was able to record the demos like that and all seems to work just fine.

However, maybe I'm wrong and this is incorrect so if anyone has any other ideas, please let me know

Search Unity

Resolved Using imitation learning in Karting Microgame

Wolf00007

christophergoy

Wolf00007

christophergoy

Wolf00007

Wolf00007

christophergoy

christophergoy

Wolf00007

Attached Files:

upload_2021-5-5_22-29-12.png

upload_2021-5-5_22-29-28.png

christophergoy

christophergoy

Wolf00007

christophergoy

Wolf00007

Wolf00007

christophergoy

Wolf00007

Search Unity

Unity ID

Useful Searches

Resolved Using imitation learning in Karting Microgame

Attached Files: