Hello, I started learning how to use ML Agents a couple of days ago, and I have set up a simple reinforcement learning environment for an AI driver. The environment has a straight road. The agent starts at the edge of the road facing the other edge, and there's a checkpoint about 10 units from the agent. The checkpoints are empty game objects visualized with gizmos. The checkpoints are also childed to an empty game object with a component named "Race Track". This component handles the behavior of the checkpoints. If there's more than one checkpoint objects childed to this Race Track object, they form a track. In the current environment, I use a single checkpoint. The agent is supposed to drive towards the checkpoint. For the most part it does, but I'm not sure if it's because it's too close to it and can't miss it, or because it actually behaves like it should to get to the checkpoint. Over an hour into the training, when looking at the agent moving, it sometimes drives straight forward towards the checkpoint, and sometimes it turns to the left or right and randomly miss it and fall out of the environment. I'm not sure why it does that, because as far as I understand, with reinforcement learning the agent is supposed to figure out what action gives it the most rewards, and it should eventually figure out how consistently get rewards. That is how I set my environment, getting to the checkpoint, gives the agent the highest reward, while steering off and falling takes away the highest reward. These are the result graphs on my TensorBoard: This is the agent class, DriverAgent: Spoiler Code (CSharp): using System.Collections; using System.Collections.Generic; using UnityEngine; using Unity.MLAgents; using Unity.MLAgents.Sensors; public class DriverAgent : Agent { [SerializeField] RaceTrack raceTrack; [SerializeField] float waypointDistanceThreshold = 1.42f; VehicleMovement vehicleMovement; Rigidbody rigidBody; Vector3 initialAgentPosition; Quaternion initialAgentRotation; int currentWaypointIndex = 0; Vector3 dirToTarget; public override void Initialize() { vehicleMovement = GetComponent<VehicleMovement>(); rigidBody = GetComponent<Rigidbody>(); initialAgentPosition = transform.localPosition; initialAgentRotation = transform.localRotation; currentWaypointIndex = raceTrack.FirstWaypointIndex; } public override void OnEpisodeBegin() { // Reset agent velocity rigidBody.angularVelocity = Vector3.zero; rigidBody.velocity = Vector3.zero; // Reset the agent to the initial starting position and rotation transform.localPosition = initialAgentPosition; transform.localRotation = initialAgentRotation; // Reset current waypoint index currentWaypointIndex = raceTrack.FirstWaypointIndex; } public override void CollectObservations(VectorSensor sensor) { // Agent sensor.AddObservation(vehicleMovement.Speed); sensor.AddObservation(transform.position.normalized); sensor.AddObservation(transform.InverseTransformVector(rigidBody.velocity.normalized)); sensor.AddObservation(transform.forward); sensor.AddObservation(transform.right); // Waypoints if(raceTrack != null) { sensor.AddObservation(raceTrack.GetWaypointPosition(currentWaypointIndex)); dirToTarget = (raceTrack.GetWaypointPosition(currentWaypointIndex) - transform.position).normalized; sensor.AddObservation(transform.InverseTransformDirection(dirToTarget)); } } private void OnCollisionEnter(Collision other) { if(other.gameObject.CompareTag("Wall") || other.gameObject.CompareTag("Obstacle") || other.gameObject.CompareTag("Vehicle")) { AddReward(-0.01f); Debug.Log("Collided with " + other.gameObject.tag); EndEpisode(); } } public void MoveAgent(float[] vectorAction) { float acceleration = 0; float steering = 0; float braking = 0; acceleration = vectorAction[0]; steering = vectorAction[1]; braking = vectorAction[2]; vehicleMovement.Accelerate(acceleration); vehicleMovement.Steer(steering); vehicleMovement.Brake(braking); // Increase reward when moving toward waypoint float velocityAlignment = Vector3.Dot(dirToTarget, rigidBody.velocity); AddReward(0.001f * velocityAlignment); } public override void OnActionReceived(float[] vectorAction) { MoveAgent(vectorAction); // Distance from agent to current target waypoint float distanceToTarget = Vector3.Distance(transform.position, raceTrack.GetWaypointPosition(currentWaypointIndex)); // If reached LAST target waypoint if (distanceToTarget < waypointDistanceThreshold && raceTrack.GetNextIndex(currentWaypointIndex) == 0) { SetReward(1.0f); Debug.Log("Reached latest waypoint (" + raceTrack.GetWaypoint(currentWaypointIndex).name + ") - Episode ended"); EndEpisode(); } // If reached target waypoint else if(distanceToTarget < waypointDistanceThreshold) { SetReward(0.5f); Debug.Log("Reached waypoint: " + raceTrack.GetWaypoint(currentWaypointIndex).name); currentWaypointIndex++; } // If fell down if (transform.localPosition.y < 0) { SetReward(-0.25f); Debug.Log("Fell down"); EndEpisode(); } } public override void Heuristic(float[] actionsOut) { actionsOut[0] = Input.GetAxis("Vertical"); actionsOut[1] = Input.GetAxis("Horizontal"); actionsOut[2] = 0; if(Input.GetKey(KeyCode.Space)) actionsOut[2] = 1; } } I really appreciate any help. Thanks! EDIT: It seems like the awards are stabilized and are more consistent. But they still steer away from the checkpoint. EDIT 2: I moved the checkpoint and increased the distance between it and the agent, and it seems like they handle it. Although, there seemed to be a big drop in rewards which got better right after resuming the training. EDIT 3: It seems like the "random" drops in reward still happen, I'm not sure why. There was a drop right after resuming training right after the 5 million mark. Then, there was a drop during training after the 5.5 million mark.
I'm not sure, but can the rewards confuse the agent off his goal? This is the current reward setup. It is shown in the code in the main post. Decrease reward by -0.01f when collided with wall, obstacle or vehicle. Increase rewards when moving toward waypoint 0.001f * Vector3.Dot(dirToTarget, rigidBody.velocity). Set reward to 1 when reached last waypoint. Set reward to 0.5 when reached any waypoint not last. Set reward to -0.25 when fell down.
It's possible this line is causing an issue Increase rewards when moving toward waypoint - 0.001f * Vector3.Dot(dirToTarget, rigidBody.velocity) When the agent moves past the waypoint, it's no longer facing the direction of the waypoint and getting negative reward. It may actually be more valuable to end an episode and get -.25 then to remain alive and accumulate the negative reward. Try removing that and leaving all else the same.
Thank you for your reply I will try removing the line. Is what you described the case even if the waypoints index is updated when an action is received from the agent? When the agent reaches a waypoint, the index either goes up by 1, or gets reset to 0 based on if it is the last waypoint or not. If the agent has passed a waypoint that is not the last one (the next waypoint index is not 0), 1 will be added to the current index value, so if the index is 1, it will be 2. If it's the last waypoint (the next waypoint index is 0), it ends the episode and resets the currentWaypointIndex to 0. This is what I check in this block of code in the OnActionReceived method: Code (CSharp): // If reached LAST target waypoint if (distanceToTarget < waypointDistanceThreshold && raceTrack.GetNextIndex(currentWaypointIndex) == 0) { SetReward(1.0f); Debug.Log("Reached latest waypoint (" + raceTrack.GetWaypoint(currentWaypointIndex).name + ") - Episode ended"); EndEpisode(); } // If reached target waypoint else if(distanceToTarget < waypointDistanceThreshold) { SetReward(0.5f); Debug.Log("Reached waypoint: " + raceTrack.GetWaypoint(currentWaypointIndex).name); currentWaypointIndex++; } Edit: Actually, I think I understand you're theory. Please, correct me if I'm wrong. When just driving towards the waypoint, as fast and as direct as possible, they earn more rewards. When getting to the waypoint, the reward is just being set (SetReward) to 1, which is probably less than how much they would be getting by just driving towards the waypoint.
Hello! I tried your suggestion, and commented that specific line in the code, and the simulation has been running for about 7 hours now. While the graphs seems kinda better, the agent itself is still sometimes steering off its target. At start it steers to the left, then to the right, then sometimes it doesn't steer away at all. This is what it looked like when the agent was steering left: Board: Console: This is what it looked like when the agent was steering right: Board: Console:
UPDATE 2: I replaced the waypoints system with a cube like in the Roller Ball tutorial project on the ML-Agents Github repository. It seems like they're tracking the target a lot better now. They do now what they couldn't do after 6 hours with the waypoints system This is the new script using only a cube: Spoiler Code (CSharp): using System.Collections; using System.Collections.Generic; using UnityEngine; using Unity.MLAgents; using Unity.MLAgents.Sensors; public class RollerAgent : Agent { [SerializeField] Transform target; [SerializeField] float waypointDistanceThreshold = 1.42f; VehicleMovement vehicleMovement; Rigidbody rigidBody; Vector3 initialAgentPosition; Quaternion initialAgentRotation; public override void Initialize() { vehicleMovement = GetComponent<VehicleMovement>(); rigidBody = GetComponent<Rigidbody>(); initialAgentPosition = transform.localPosition; initialAgentRotation = transform.localRotation; } public override void OnEpisodeBegin() { // If the Agent fell, zero its momentum rigidBody.angularVelocity = Vector3.zero; rigidBody.velocity = Vector3.zero; transform.localPosition = initialAgentPosition; transform.localRotation = initialAgentRotation; // Move the target to a new spot target.localPosition = new Vector3(Random.Range(-3f, 8f), 0.5f, Random.Range(60f, 70f)); } public override void CollectObservations(VectorSensor sensor) { // Target and Agent positions sensor.AddObservation(target.localPosition); sensor.AddObservation(transform.localPosition); // Agent velocity sensor.AddObservation(rigidBody.velocity.x); sensor.AddObservation(rigidBody.velocity.z); } private void OnCollisionEnter(Collision other) { if(other.gameObject.CompareTag("Wall") || other.gameObject.CompareTag("Obstacle") || other.gameObject.CompareTag("Vehicle")) { AddReward(-0.01f); Debug.Log("Collided with " + other.gameObject.tag); EndEpisode(); } } public void MoveAgent(float[] vectorAction) { float acceleration = 0; float steering = 0; float braking = 0; acceleration = vectorAction[0]; steering = vectorAction[1]; braking = vectorAction[2]; vehicleMovement.Accelerate(acceleration); vehicleMovement.Steer(steering); vehicleMovement.Brake(braking); } public override void OnActionReceived(float[] vectorAction) { MoveAgent(vectorAction); // Rewards float distanceToTarget = Vector3.Distance(transform.localPosition, target.localPosition); // Reached target if (distanceToTarget < waypointDistanceThreshold) { SetReward(1.0f); EndEpisode(); } // Fell off platform if (transform.localPosition.y < 0) { EndEpisode(); } } public override void Heuristic(float[] actionsOut) { actionsOut[0] = Input.GetAxis("Vertical"); actionsOut[1] = Input.GetAxis("Horizontal"); actionsOut[2] = 0; if(Input.GetKey(KeyCode.Space)) actionsOut[2] = 1; } }
I don't know if there is something buggy with the waypoints, because I tried implementing the waypoint system into the Roller Ball project instead of the cube target. After editing the project, I ran a training session with the original Roller Ball project for about 45 minutes, and then I did the same with the new Roller Ball project with the waypoints, and they gave pretty much the same results. The Orange line is of the project without the waypoints, and the Blue line is of the project with the waypoints.