Search Unity

Question Is agent using camera sensor observations? Why is it terrible (video) after 500000 steps?

Discussion in 'ML-Agents' started by maxkcy, Mar 5, 2023.

  1. maxkcy

    maxkcy

    Joined:
    Oct 11, 2021
    Posts:
    62
    Why is my agent not learning from camera sensor, or do I need to give it more training time?

    I was having trouble with training the agent of my game with a camera sensor, so I decided to open up a practice project and instead of position observations, I gave it just the camera sensor and a camera. From the sources I've learned, this is all I needed to do. The observations are 32x32 without greyscale (because with grayscale it wasn't learning either ...and I hoped... but same result).




    This ai feels blind. Is is even getting observations from it's camera sensor. I am unsure if this is relevent, but I can not have nvidia drivers (so do I need to install something?) . After 500,000 steps it goes in one dirrection with about ~15% chance of getting the goal.
    On the otherhand the ai that observed just position vectors was reaching the goal just fine.

    Am I missing something I needed to install?

    I can see now why my earlier ai

    Since the video, now trained to 1,500,000 steps mostly sits around, until the end of time, pun intended, doesn't pick up guns or shoot.

    Code (CSharp):
    1. using UnityEngine;
    2. using Unity.MLAgents;
    3. using Unity.MLAgents.Actuators;
    4. using Unity.MLAgents.Sensors;
    5.  
    6. public class CamAgent : Agent
    7. {
    8.  
    9.     [SerializeField] private Transform _targetTransform;
    10.     [SerializeField] private SpriteRenderer _fieldSR;
    11.     [SerializeField] private float EpisodeTime = 600f;
    12.     private float _time = 0f;
    13.  
    14.  
    15.     public override void CollectObservations(VectorSensor sensor)
    16.     {
    17.         sensor.AddObservation(EpisodeTime);
    18.     }
    19.  
    20.     private void FixedUpdate()
    21.     {
    22.         _time += Time.fixedDeltaTime;
    23.         if (EpisodeTime < 0)
    24.         {
    25.             SetReward(-1f);
    26.             EndEpisode();
    27.             _fieldSR.color = Color.red;
    28.             CancelInvoke();
    29.             Invoke(nameof(ResetFieldColor), 5f);
    30.         }
    31.     }
    32.     public override void OnEpisodeBegin()
    33.     {
    34.         transform.localPosition = Vector3.zero;
    35.         do { _targetTransform.localPosition = new Vector2(Random.Range(-6f, 6f), Random.Range(-3.5f, 3.5f)); }
    36.         while (Mathf.Abs(_targetTransform.localPosition.x - transform.localPosition.x) < 3f &&
    37.                 Mathf.Abs(_targetTransform.localPosition.y - transform.localPosition.y) < 2f);
    38.         _time = 0f;
    39.     }
    40.     public override void OnActionReceived(ActionBuffers actions)
    41.     {
    42.         float moveX = actions.ContinuousActions[0];
    43.         float moveY]); = actions.ContinuousActions[1];
    44.         float moveSpeed = 2.5f;
    45.  
    46.         transform.position += new Vector3(moveX, moveY, 0) * Time.deltaTime * moveSpeed;
    47.     }
    48.  
    49.     public override void Heuristic(in ActionBuffers actionsOut)
    50.     {
    51.         var continousActions = actionsOut.ContinuousActions;
    52.         continousActions[0] = Input.GetAxis("Horizontal");
    53.         continousActions[1] = Input.GetAxis("Vertical");
    54.     }
    55.  
    56.     private void OnTriggerEnter2D(Collider2D collision)
    57.     {
    58.  
    59.         if (collision.TryGetComponent<GoalComponent>(out GoalComponent goal))
    60.         {
    61.             SetReward(1f + EpisodeTime);
    62.             EndEpisode();
    63.             _fieldSR.color = Color.green;
    64.             CancelInvoke();
    65.             Invoke(nameof(ResetFieldColor), 5f);
    66.         }
    67.         if (collision.TryGetComponent<WallsComponent>(out WallsComponent wall))
    68.         {
    69.             SetReward(-1f);
    70.             EndEpisode();
    71.             _fieldSR.color = Color.red;
    72.             CancelInvoke();
    73.             Invoke(nameof(ResetFieldColor), 5f);
    74.         }
    75.     }
    76.     private void ResetFieldColor() => _fieldSR.color = Color.white;
    77. }
    78.  
    I am using the default generated configuration.yaml.

    Yes, I know about curiosity, immitation learning. I am not going to use these so early on for a practice project demo of getting to the goal, I think I should be able to figure it out without that. I am using selfplay, curiosity for my current main project, and perhaps immitation if needed. But for practice project default yaml should be enough.
     
    Last edited: Mar 5, 2023