Survival Shooter AI not learning

vaculikjan · Mar 23, 2021

Hey guys, I've been working on this on and off for the past 2 weeks trying to teach the agent in Unity's Survival Shooter (the tutorial series aka Nightmares).

The agent collect's observations through a pair of Ray Perception Sensors looking for enemies and any obstacles in the way. It also knows its position and rotation.

The best result I've gotten was an AI that could score about 1500 points. For that I used continuous action space where the agent could move around the battlefield, shoot and turning was done as turning around fixed axis, where basically if you pressed left the agent would start turning left and vice versa for pressing right. No matter for how long I ran the training however, it didn't get better (about 10 000 000 steps).

So what I tried as well was a twin stick approach where the agent would turn to the direction a joystick was facing (basically to coordinates of 2 axes), which would be more precise and could for the agent be basically instantaneous instead of having to rotate all the way. This approach however bore no fruit at all as the agent is after 7 000 000 steps still pretty directionless.

For rewards I give a small reward whenever the agent hits an enemy (0.01) a bigger reward for killing an enemy (0.05) and negative reward (-0.1) for getting hit by an enemy.

I suspect there also might be a problem with shooting at higher time scales, however according to me it should be working as intended even at higher time scale. I'm calling shoot directly from the agent script. However I'm using this implementation as this is the original project implementation.

Code (CSharp):

void FixedUpdate ()

{

timer += Time.deltaTime;

if(Input.GetButton ("Fire1") && timer >= timeBetweenBullets && Time.timeScale != 0)

{

Shoot ();

}

if(timer >= timeBetweenBullets * effectsDisplayTime)

{

DisableEffects ();

}

}

public void Shoot ()

{

timer = 0f;

//gunAudio.Play ();

gunLight.enabled = true;

gunParticles.Stop ();

gunParticles.Play ();

gunLine.enabled = true;

gunLine.SetPosition (0, transform.position);

shootRay.origin = transform.position;

shootRay.direction = transform.forward;

if(Physics.Raycast (shootRay, out shootHit, range, shootableMask))

{

EnemyHealth enemyHealth = shootHit.collider.GetComponent <EnemyHealth> ();

if(enemyHealth != null)

{

enemyHealth.TakeDamage (damagePerShot, shootHit.point);

}

gunLine.SetPosition (1, shootHit.point);

}

else

{

gunLine.SetPosition (1, shootRay.origin + shootRay.direction * range);

}

}

I am using curiosity for training, these are the hyperparams:

Code (csharp):

behaviors:

Shooter:

trainer_type: ppo

hyperparameters:

batch_size: 1024

buffer_size: 8192

learning_rate: 0.00003

beta: 0.001

epsilon: 0.2

lambd: 0.925

num_epoch: 5

learning_rate_schedule: linear

network_settings:

normalize: true

hidden_units: 64

num_layers: 2

vis_encode_type: simple

reward_signals:

extrinsic:

gamma: 0.95

strength: 1.0

curiosity:

strength: 0.02

gamma: 0.99

encoding_size: 64

learning_rate: 3.0e-4

keep_checkpoints: 10

max_steps: 5000000

time_horizon: 256

summary_freq: 10000

threaded: true

Any advice as to why the agents isn't learning or what I could change?

christophergoy · Mar 23, 2021

Hey,
Have you tried training without curiosity? Sometimes making things as simple as possible gets you better results.

vaculikjan · Mar 23, 2021

christophergoy said: ↑

Hey,
Have you tried training without curiosity? Sometimes making things as simple as possible gets you better results.
Click to expand...

Hey, thanks for the suggestion however I've tried both with and without curiosity to no avail. Curiosity actually produced the best results, albeit that might have been because it was also the longest run.

christophergoy · Mar 25, 2021

Could you share your collect observations function and OnActionReceived function? And how your reward functions.

Are you observing global rotation or local? It could make a big difference.

having your logic for shooting in fixed update needs to be translated to OnActionReceived and Heuristic.

vaculikjan · Mar 25, 2021

christophergoy said: ↑

Could you share your collect observations function and OnActionReceived function? And how your reward functions.

Are you observing global rotation or local? It could make a big difference.

having your logic for shooting in fixed update needs to be translated to OnActionReceived and Heuristic.
Click to expand...

This is the code for the 2 methods:

Code (CSharp):

public override void CollectObservations(VectorSensor sensor) { //Observations for coordinates, rotation and health

sensor.AddObservation(transform.rotation.y);

sensor.AddObservation(transform.position.x);

sensor.AddObservation(transform.position.z);

sensor.AddObservation(pHealth.currentHealth);

}

public override void OnActionReceived(ActionBuffers actionBuffers) { //Actions available for the agent

var continuousActions = actionBuffers.ContinuousActions;

//Movement floats

float h = continuousActions[0];

float v = continuousActions[1];

//Rotation floats

float hr = continuousActions[2];

float vr = continuousActions[3];

//Creating quaternion for rotation

lookDirection = new Vector3(hr, 0, -vr);

lookRotation = Quaternion.LookRotation(lookDirection, Vector3.up);

step = rotationSpeed * Time.deltaTime;

if (continuousActions[4] > 0) {

Shoot();

}

Move(h, v);

Animating(h, v);

}

The shooting logic is in a different script altogether I only call the method Shoot. I invoke it like so from Agent script:

Code (CSharp):

void Shoot() {

if (pShooting.timer >= timeBetweenBullets && Time.timeScale != 0) {

pShooting.Shoot ();

}

}

As for rewards:

Code (CSharp):

public void TakeDamage (int amount, Vector3 hitPoint)

{

if(isDead) return;

player.AddReward(0.01f); //Positive reward for hitting an enemy

enemyAudio.Play ();

currentHealth -= amount;

hitParticles.transform.position = hitPoint;

hitParticles.Play();

if(currentHealth <= 0)

{

Death ();

}

}

Code (CSharp):

public void StartSinking ()

{

GetComponent <UnityEngine.AI.NavMeshAgent> ().enabled = false;

GetComponent <Rigidbody> ().isKinematic = true;

isSinking = true;

ScoreManager.score += scoreValue;

Destroy (gameObject, 2f);

player.AddReward(0.05f);

}

Code (CSharp):

public void TakeDamage (int amount)

{

damaged = true;

player.AddReward(-0.2f); //Negative reward for getting damaged

currentHealth -= amount;

healthSlider.value = currentHealth;

playerAudio.Play ();

if(currentHealth <= 0 && !isDead)

{

//Death disabled for purposes of training the agent

//Death ();

}

}

christophergoy · Mar 29, 2021

Hey, thanks for posting the code. Observing the global position (transform.position.x, transform.rotation.y) can make it hard for the neural network to learn since these values aren't normalized.

For rotation, you could do transform.localRotation.y.

For the position, it would be best if you could normalize the value somehow. You can usually achieve this by getting the center position of the area your agents are working in, subtract that from your agent's position, and then divide the x and z results by the extends of the area bounds.

Long story short: It is hard for neural networks to learn on sets of non-normalized values.

Search Unity

Unity ID

Useful Searches

Survival Shooter AI not learning

vaculikjan

christophergoy

vaculikjan

christophergoy

vaculikjan

christophergoy