Search Unity

Question Agent REALLY seems to like walls...unless it can *see* the goal in RayPerceptionSensor?

Discussion in 'ML-Agents' started by mrmiketheripper, Apr 23, 2021.

  1. mrmiketheripper

    mrmiketheripper

    Joined:
    Mar 13, 2019
    Posts:
    6
    Unity 2020.3.3f / ml-agents 0.23.0 / communicator 1.3.0 / PyTorch 1.7.0 on macOS Big Sur 11.2.3

    I've been excitedly playing with ML-Agents for about a month now on and off. I followed through some of the basic tutorials on YouTube and have been trying to find ANY kind of reading material I can on the topics, unfortunately it seems most of the results are all for the same few threads so I thought I'd ask my question here.



    I have a basic training environment setup. It's a small plane with 4 walls surrounding. The agent has a RayPerceptionSensor3D setup about mid height. It projects all around the environment and only detects two tags: "Obstacle" and "Goal". The sensor can also detect layers "Default" and "TerrainLayer". The walls are on the "TerrainLayer" and the goal object is on the "Default" layer. I have confirmed that the RayPerceptionSensor3D does indeed detect walls and the goal, but the usage is fairly fuzzy to me still and how to strengthen the associations.

    The agent also has 9 observations of its own:
    - Normalized agent position (x,y,z)
    - Normalized goal position (x,y,z)
    - Agent Forward (x,y,z)





    My agent script detects when the Agent hangs out against a wall for too long and ends the episode with a negative reward.

    For the first 40 times, the agent needs to move from its spawn to the goal. It receives a +1 reward for touching the goal and receives an additive -.075 reward while it's touching any walls. After 40 attempts, the agent gets pretty good at this so I add walls by switching the environment. The environment size stays the same but 3 walls are placed in the environment, also on the "TerrainLayer" with tag "Obstacle".



    Once the walls are added, the agent does REALLY good if the goal is within its immediate sight of the RayPerceptionSensor3D. However, the agent is on one side of a wall and the goal is on the other side, it just seems to continually try and move itself into the wall and grind against the wall until I end the episode. I would expect odd behaviour like this as it tries to figure things out, however it seems to just do this. Even occasionally, it will fail the simple environment tests (no walls) simply because it moves itself into one corner, locks on the wall, and takes the negative reward until it's respawned.

    I did have one training run that ran overnight (about 6 million steps) that resulted in a good cumulative reward, but when I tried to use that brain to run through the training I observed something similar: sometimes the agent would go right for the goal, other times it would just seemingly give up and run itself into walls.

    I'm not quite sure what I'm doing wrong here and as stated previously, the limited threads on this fairly new topic make finding answers quite confusing. I've tried tweaking things such as the length of the rays. Initially, the rays were long enough to span and touch all sides of the room which I strongly believe confused the AI into thinking it had limited moves.

    I tried shortening the rays significantly hoping it would push the AI away when it gets close, but it just seems to latch onto the wall once it sees it.

    A few days before posting this thread, I was incorrectly normalizing the coordinates and thought that would be it. Now, I calculate the bounds of the level and normalize coordinates like that. From my debug view, it works and is correct but I'm not so sure that made such a difference.

    I have of course, tried tuning hyperparameters, but that doesn't seem to make a huge difference in the training.

    To me, it seems like the AI is not understanding that hugging walls is being negatively reinforced. I hesitate to make the negative rewards *too* strong (< -1) since that could potentially screw with AI normalization?

    I *had* a small negative reward every step, but removed it because I thought that was being detrimental to training around the walls.

    Hyperparameters:
    Code (Boo):
    1. behaviors:
    2.   FindExit:
    3.     framework: pytorch
    4.     trainer_type: ppo
    5.     hyperparameters:
    6.       batch_size: 128
    7.       buffer_size: 2048
    8.     #   batch_size: 4096
    9.     #   buffer_size: 10240
    10.       learning_rate: 3.0e-4
    11.       beta: 5.0e-4
    12.       epsilon: 0.2
    13.       lambd: 0.99
    14.       num_epoch: 3
    15.       learning_rate_schedule: linear
    16.     network_settings:
    17.       normalize: false
    18.       hidden_units: 128
    19.       num_layers: 3
    20.       vis_encode_type: simple
    21.     reward_signals:
    22.       extrinsic:
    23.         gamma: 0.99
    24.         strength: 1.0
    25.     keep_checkpoints: 5
    26.     max_steps: 6.0e6
    27.     time_horizon: 64
    28.     summary_freq: 12000
    29.     threaded: true
    Agent:
    Code (CSharp):
    1. public class MyAgent : Agent
    2. {
    3.     private Vector3 _SpawnPoint;
    4.  
    5.     [Header("References")]
    6.     private ThirdPersonCharacter _thirdPersonController;
    7.     [SerializeField] BoundsCollector _LevelBounds;
    8.     [SerializeField] Transform ExtractionPoint;
    9.     [SerializeField] Testing_FindRandomPosition Training_RandomizeBoundsContainer;
    10.  
    11.     [Header("Events")]
    12.     [SerializeField] UnityEvent _OnEpisodeBegin;
    13.     [SerializeField] UnityEvent _OnEpisodePass, _OnEpisodeFail;
    14.  
    15.     [Header("Properties")]
    16.     [SerializeField] private bool _IsTouchingWall = false;
    17.     [SerializeField] float _TimeTouchingWall = 0f;
    18.     [SerializeField] float MoveSpeed = 2f;
    19.  
    20.     [Header("Observations")]
    21.     [Tooltip("Now Normalized Extraction Point coordinate")]
    22.     [SerializeField] public Vector3 NormalizedDistanceFromGoal;
    23.     [SerializeField] public Vector3 NormalizedPlayerPosition;
    24.  
    25.     [Header("Input")]
    26.     [SerializeField] private Vector3 _MovementVector3 = Vector3.zero;
    27.  
    28.     private bool jump = false;
    29.     private bool fullyGrounded = false;
    30.  
    31.     public override void Initialize()
    32.     {
    33.         base.Initialize();
    34.         _thirdPersonController = GetComponent<ThirdPersonCharacter>();
    35.         _SpawnPoint = transform.position;
    36.     }
    37.  
    38.     private void Respawn()
    39.     {
    40.         transform.position = _SpawnPoint;
    41.     }
    42.  
    43.     public override void OnEpisodeBegin()
    44.     {
    45.         base.OnEpisodeBegin();
    46.  
    47.         _MovementVector3 = Vector3.zero;
    48.  
    49.         _TimeTouchingWall = 0f;
    50.  
    51.         _OnEpisodeBegin?.Invoke();
    52.  
    53.         transform.position = _SpawnPoint;
    54.     }
    55.  
    56.     public override void Heuristic(in ActionBuffers actionsOut)
    57.     {
    58.         var discreteOut = actionsOut.DiscreteActions;
    59.  
    60.         if (Input.GetKey(KeyCode.A)) discreteOut[0] = 1; // L
    61.         else if (Input.GetKey(KeyCode.D)) discreteOut[0] = 2; // R
    62.  
    63.         if (Input.GetKey(KeyCode.W)) discreteOut[0] = 3; // Up/forward
    64.         else if (Input.GetKey(KeyCode.S)) discreteOut[0] = 4; // down/backward
    65.     }
    66.  
    67.     public override void OnActionReceived(ActionBuffers actionsOut)
    68.     {
    69.         var discreteActions = actionsOut.DiscreteActions;
    70.         switch((int)discreteActions[0])
    71.         {
    72.             case 1: //L
    73.                 _MovementVector3 = MoveSpeed * Vector3.left;
    74.                 break;
    75.             case 2: // R
    76.                 _MovementVector3 = MoveSpeed * Vector3.right;
    77.                 break;
    78.             case 3: // Up
    79.                 _MovementVector3 = MoveSpeed * Vector3.forward;
    80.                 break;
    81.             case 4: //down
    82.                 _MovementVector3 = MoveSpeed * Vector3.back;
    83.                 break;
    84.             case 0: _MovementVector3 = Vector3.zero;
    85.                 break;
    86.         }
    87.  
    88.         _thirdPersonController.Move(_MovementVector3, false, jump);
    89.         jump = false;
    90.  
    91.  
    92.         if(_IsTouchingWall
    93.             && _TimeTouchingWall > 75f)
    94.         {
    95.             _IsTouchingWall = false;
    96.             _TimeTouchingWall = 0f;
    97.  
    98.             SetReward(-.1f);
    99.             EndEpisode();
    100.             _OnEpisodeFail?.Invoke();
    101.         }
    102.     }
    103.  
    104.     private void FixedUpdate()
    105.     {
    106.         if(_IsTouchingWall)
    107.         {
    108.             _TimeTouchingWall += 1.0f * Time.fixedDeltaTime;
    109.         }
    110.  
    111.         if(transform.position.y < -10f)
    112.         {
    113.             SetReward(-0.5f);
    114.             EndEpisode();
    115.         }
    116.     }
    117.  
    118.  
    119.     private void OnCollisionExit(Collision collision) => OnCollisionTriggerExit(collision.collider);
    120.     private void OnTriggerExit(Collider other) => OnCollisionTriggerExit(other);
    121.     private void OnCollisionEnter(Collision collision) => OnCollisionTriggerEnterStay(collision.collider);
    122.     private void OnTriggerEnter(Collider other) => OnCollisionTriggerEnterStay(other);
    123.  
    124.     private void OnCollisionTriggerExit(Collider other)
    125.     {
    126.         if((other.gameObject.tag == "Water"
    127.             || other.gameObject.tag == "Obstacle") && _IsTouchingWall)
    128.         {
    129.             _IsTouchingWall = false;
    130.         }
    131.     }
    132.  
    133.     private void OnCollisionTriggerEnterStay(Collider other)
    134.     {
    135.         if (other.gameObject.tag == "Goal")
    136.         {
    137.             SetReward(1f);
    138.             EndEpisode();
    139.             _OnEpisodePass?.Invoke();
    140.         }
    141.  
    142.         if (other.gameObject.tag == "Water"
    143.             || other.gameObject.tag == "Obstacle")
    144.         {
    145.             if (_IsTouchingWall == false) _IsTouchingWall = true;
    146.  
    147.             AddReward(-.075f);
    148.             //EndEpisode();
    149.             //_OnEpisodeFail?.Invoke();
    150.         }
    151.     }
    152.  
    153.     private Vector3 NormalizePositions(Vector3 input, Vector3 min, Vector3 max, out Vector3 vec)
    154.     {
    155.         vec.x = (input.x - min.x) / (max.x - min.x);
    156.         vec.y = (input.y - min.y) / (max.y - min.y);
    157.         vec.z = (input.z - min.z) / (max.z - min.z);
    158.         return vec;
    159.     }
    160.  
    161.     public override void CollectObservations(VectorSensor sensor)
    162.     {
    163.         // the bounds are determined on the fly when level starts,
    164.         // so this is a simple getter.
    165.         var b = _LevelBounds.GetGroupedBounds;
    166.  
    167.         /// OBSERVATION #1 - 3x float (x, y, z)
    168.         if (ExtractionPoint != null)
    169.         {
    170.             NormalizePositions(ExtractionPoint.position, b.min, b.max, out NormalizedDistanceFromGoal);
    171.  
    172.             sensor.AddObservation(NormalizedDistanceFromGoal);
    173.         }
    174.         else
    175.         {
    176.             Debug.Log($"My extraction point is null.", gameObject);
    177.             sensor.AddObservation(Vector3.zero);
    178.         }
    179.  
    180.         NormalizePositions(transform.position, b.min, b.max, out NormalizedPlayerPosition);
    181.         NormalizedPlayerPosition.y += .28f;
    182.  
    183.         // 3x float (x,y,z)
    184.         sensor.AddObservation(NormalizedPlayerPosition);
    185.  
    186.         // 3x float (x,y,z)
    187.         sensor.AddObservation(transform.forward);
    188.  
    189.         //AddReward(-0.00006f);
    190.     }
    191. }
    192.  
    As is, if the AI happens to spawn close to the goal cube and can see it, goes right for it. Other than that, it seems to be "no thoughts, head empty" when it comes to interpreting the goal's normalized position.

    If there's any other information you guys need, let me know. Thank you in advance for your help with figuring this new & exciting technology out!
     
    Last edited: Apr 23, 2021
  2. mrmiketheripper

    mrmiketheripper

    Joined:
    Mar 13, 2019
    Posts:
    6
    Video of the agent after about 200,000 steps:



    EDIT: Something else I thought of but don't *think* it's messing the AI up that much:



    You can see there's about a .6f difference in the Y between the goal position and player position. It's pretty small, but could be huge to the AI? Also an interesting observation that when the AI doesn't know what to do, it wants to go to the upper limits on X/Z
     
    Last edited: Apr 23, 2021
  3. mrmiketheripper

    mrmiketheripper

    Joined:
    Mar 13, 2019
    Posts:
    6
    Update: Even after removing all Ray Perception Sensors and making sure the only observations were NormalizedPosition, NormalizedGoalPosition, and NormalizedRotation, it still seems to only want to run into walls and mostly run towards the two extremes (x=1, y=1) of the map. Seems to have 0 interest in moving towards the goal.
     
  4. mrmiketheripper

    mrmiketheripper

    Joined:
    Mar 13, 2019
    Posts:
    6
    Ah, I think I've figured it out. I stripped everything down to a basic set. No ray sensors, just observing normalized goal position and normalized player position and it still performed very poorly. Well, I made the mistake of basing my player off of the Standard Assets ThirdPersonPlayerController which continually rotates whatever GameObject its applied to. This was definitely causing confusion on the AI's part. I changed it to a super simple RigidBody movement script with no rotation and within 50,000 steps it seems the AI has already grasped the basic concept of moving towards his goal. He even figured out how to (eventually) go around the walls.



    I'm doing a test run now with a RayPerceptionSensor3D attached that *just* detects walls. I'm excited to see how this run goes!
     
  5. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Sorry for the delayed response, but glad you got it sorted out...