Rewards for free-for-all shooter

oriolgalceran · Mar 30, 2022

Hey guys,

I'm trying to train a shooter agent where there are 10 agents in a playing field, with some walls randomly distributed inside it. The agents have a gun with a fixed fire rate and ammo which always points forward. I have a 3D ray perception sensor that goes 360º and has 16 rays and also as observations I have local position, velocity, and rotation on the y axis. As actuators, I have addforce on 2 axes, rotational force, and shoot/don't shoot. The gun is automatic and keeps firing as long as the shoot actuator is +0.

The game works like this: I spawn the board with random walls, I spawn the players at random places, and they shoot each other. Each time one dies, I set its award to -1, end episode, and destroy it. The game can end in two ways:

a) one agent is left (i assign +1 reward, end episode, and start again)
b) the game reaches a fixed number of frames (i take the max damage dealt and assign rewards according to that, so if the highest agent has dealt 200 hp, it gets a reward of 1, if the second has dealt 100hp, it gets 0.5, and so on.)

Right now I'm training about 40 tables at once and I've been at it for about 30 hours and the agents just use up all their ammo while spinning around and just basically stay where they are. Is there anything I'm doing wrong?

Thanks!

Code (CSharp):

behaviors:

Agent:

trainer_type: ppo

hyperparameters:

batch_size: 2048

buffer_size: 20480

learning_rate: 3.0e-4

beta: 0.1

epsilon: 0.2

lambd: 0.99

num_epoch: 3

learning_rate_schedule: linear

beta_schedule: constant

epsilon_schedule: linear

network_settings:

normalize: true

hidden_units: 512

num_layers: 2

reward_signals:

extrinsic:

strength: 1.0

gamma: 0.99

curiosity:

strength: 0.01

gamma: 0.99

encoding_size: 128

max_steps: 500000000

time_horizon: 1000

summary_freq: 10000

Search Unity

Unity ID

Useful Searches

Rewards for free-for-all shooter

oriolgalceran