Search Unity

Survival Shooter AI not learning

Discussion in 'ML-Agents' started by vaculikjan, Mar 23, 2021.

  1. vaculikjan

    vaculikjan

    Joined:
    Jul 30, 2020
    Posts:
    3
    Hey guys, I've been working on this on and off for the past 2 weeks trying to teach the agent in Unity's Survival Shooter (the tutorial series aka Nightmares).

    The agent collect's observations through a pair of Ray Perception Sensors looking for enemies and any obstacles in the way. It also knows its position and rotation.

    The best result I've gotten was an AI that could score about 1500 points. For that I used continuous action space where the agent could move around the battlefield, shoot and turning was done as turning around fixed axis, where basically if you pressed left the agent would start turning left and vice versa for pressing right. No matter for how long I ran the training however, it didn't get better (about 10 000 000 steps).

    So what I tried as well was a twin stick approach where the agent would turn to the direction a joystick was facing (basically to coordinates of 2 axes), which would be more precise and could for the agent be basically instantaneous instead of having to rotate all the way. This approach however bore no fruit at all as the agent is after 7 000 000 steps still pretty directionless.

    For rewards I give a small reward whenever the agent hits an enemy (0.01) a bigger reward for killing an enemy (0.05) and negative reward (-0.1) for getting hit by an enemy.

    I suspect there also might be a problem with shooting at higher time scales, however according to me it should be working as intended even at higher time scale. I'm calling shoot directly from the agent script. However I'm using this implementation as this is the original project implementation.

    Code (CSharp):
    1. void FixedUpdate ()
    2.     {
    3.         timer += Time.deltaTime;
    4.  
    5.         if(Input.GetButton ("Fire1") && timer >= timeBetweenBullets && Time.timeScale != 0)
    6.         {
    7.             Shoot ();
    8.         }
    9.  
    10.         if(timer >= timeBetweenBullets * effectsDisplayTime)
    11.         {
    12.             DisableEffects ();
    13.         }
    14.     }
    15.  
    16. public void Shoot ()
    17.     {
    18.         timer = 0f;
    19.  
    20.         //gunAudio.Play ();
    21.  
    22.         gunLight.enabled = true;
    23.  
    24.         gunParticles.Stop ();
    25.         gunParticles.Play ();
    26.  
    27.         gunLine.enabled = true;
    28.         gunLine.SetPosition (0, transform.position);
    29.  
    30.         shootRay.origin = transform.position;
    31.         shootRay.direction = transform.forward;
    32.  
    33.         if(Physics.Raycast (shootRay, out shootHit, range, shootableMask))
    34.         {
    35.             EnemyHealth enemyHealth = shootHit.collider.GetComponent <EnemyHealth> ();
    36.             if(enemyHealth != null)
    37.             {
    38.                 enemyHealth.TakeDamage (damagePerShot, shootHit.point);
    39.             }
    40.             gunLine.SetPosition (1, shootHit.point);
    41.         }
    42.         else
    43.         {
    44.             gunLine.SetPosition (1, shootRay.origin + shootRay.direction * range);
    45.         }
    46.     }
    I am using curiosity for training, these are the hyperparams:

    Code (csharp):
    1. behaviors:
    2.   Shooter:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 1024
    6.       buffer_size: 8192
    7.       learning_rate: 0.00003
    8.       beta: 0.001
    9.       epsilon: 0.2
    10.       lambd: 0.925
    11.       num_epoch: 5
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 64
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.95
    21.         strength: 1.0
    22.       curiosity:
    23.           strength: 0.02
    24.           gamma: 0.99
    25.           encoding_size: 64
    26.           learning_rate: 3.0e-4
    27.     keep_checkpoints: 10
    28.     max_steps: 5000000
    29.     time_horizon: 256
    30.     summary_freq: 10000
    31.     threaded: true
    32.    
    Any advice as to why the agents isn't learning or what I could change?
     
  2. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hey,
    Have you tried training without curiosity? Sometimes making things as simple as possible gets you better results.
     
  3. vaculikjan

    vaculikjan

    Joined:
    Jul 30, 2020
    Posts:
    3
    Hey, thanks for the suggestion however I've tried both with and without curiosity to no avail. Curiosity actually produced the best results, albeit that might have been because it was also the longest run.
     
  4. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Could you share your collect observations function and OnActionReceived function? And how your reward functions.

    Are you observing global rotation or local? It could make a big difference.

    having your logic for shooting in fixed update needs to be translated to OnActionReceived and Heuristic.
     
  5. vaculikjan

    vaculikjan

    Joined:
    Jul 30, 2020
    Posts:
    3

    This is the code for the 2 methods:

    Code (CSharp):
    1.  public override void CollectObservations(VectorSensor sensor) { //Observations for coordinates, rotation and health
    2.  
    3.         sensor.AddObservation(transform.rotation.y);
    4.         sensor.AddObservation(transform.position.x);
    5.         sensor.AddObservation(transform.position.z);
    6.         sensor.AddObservation(pHealth.currentHealth);
    7.     }
    8.  
    9.     public override void OnActionReceived(ActionBuffers actionBuffers) { //Actions available for the agent
    10.        
    11.         var continuousActions = actionBuffers.ContinuousActions;
    12.  
    13.         //Movement floats
    14.         float h = continuousActions[0];
    15.         float v = continuousActions[1];
    16.  
    17.         //Rotation floats
    18.         float hr = continuousActions[2];
    19.         float vr = continuousActions[3];
    20.        
    21.         //Creating quaternion for rotation
    22.         lookDirection = new Vector3(hr, 0, -vr);
    23.         lookRotation = Quaternion.LookRotation(lookDirection, Vector3.up);
    24.  
    25.         step = rotationSpeed * Time.deltaTime;
    26.        
    27.         if (continuousActions[4] > 0) {
    28.             Shoot();
    29.         }
    30.  
    31.         Move(h, v);
    32.         Animating(h, v);
    33.  
    34.     }
    The shooting logic is in a different script altogether I only call the method Shoot. I invoke it like so from Agent script:

    Code (CSharp):
    1. void Shoot() {
    2.         if (pShooting.timer >= timeBetweenBullets && Time.timeScale != 0) {
    3.             pShooting.Shoot ();
    4.         }
    5.     }

    As for rewards:

    Code (CSharp):
    1.  public void TakeDamage (int amount, Vector3 hitPoint)
    2.     {
    3.         if(isDead) return;
    4.      
    5.         player.AddReward(0.01f); //Positive reward for hitting an enemy
    6.  
    7.         enemyAudio.Play ();
    8.  
    9.         currentHealth -= amount;
    10.            
    11.         hitParticles.transform.position = hitPoint;
    12.         hitParticles.Play();
    13.  
    14.         if(currentHealth <= 0)
    15.         {
    16.             Death ();
    17.         }
    18.     }
    Code (CSharp):
    1. public void StartSinking ()
    2.     {
    3.         GetComponent <UnityEngine.AI.NavMeshAgent> ().enabled = false;
    4.         GetComponent <Rigidbody> ().isKinematic = true;
    5.         isSinking = true;
    6.         ScoreManager.score += scoreValue;
    7.         Destroy (gameObject, 2f);
    8.         player.AddReward(0.05f);
    9.     }
    Code (CSharp):
    1. public void TakeDamage (int amount)
    2.     {
    3.         damaged = true;
    4.        
    5.         player.AddReward(-0.2f); //Negative reward for getting damaged
    6.  
    7.         currentHealth -= amount;
    8.  
    9.         healthSlider.value = currentHealth;
    10.  
    11.         playerAudio.Play ();
    12.  
    13.         if(currentHealth <= 0 && !isDead)
    14.         {
    15.             //Death disabled for purposes of training the agent
    16.             //Death ();
    17.         }
    18.     }
     
  6. christophergoy

    christophergoy

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hey, thanks for posting the code. Observing the global position (transform.position.x, transform.rotation.y) can make it hard for the neural network to learn since these values aren't normalized.

    For rotation, you could do transform.localRotation.y.

    For the position, it would be best if you could normalize the value somehow. You can usually achieve this by getting the center position of the area your agents are working in, subtract that from your agent's position, and then divide the x and z results by the extends of the area bounds.

    Long story short: It is hard for neural networks to learn on sets of non-normalized values.