Search Unity

Rewards for free-for-all shooter

Discussion in 'ML-Agents' started by oriolgalceran, Mar 30, 2022.

  1. oriolgalceran

    oriolgalceran

    Joined:
    Jul 22, 2018
    Posts:
    2
    Hey guys,

    I'm trying to train a shooter agent where there are 10 agents in a playing field, with some walls randomly distributed inside it. The agents have a gun with a fixed fire rate and ammo which always points forward. I have a 3D ray perception sensor that goes 360º and has 16 rays and also as observations I have local position, velocity, and rotation on the y axis. As actuators, I have addforce on 2 axes, rotational force, and shoot/don't shoot. The gun is automatic and keeps firing as long as the shoot actuator is +0.

    The game works like this: I spawn the board with random walls, I spawn the players at random places, and they shoot each other. Each time one dies, I set its award to -1, end episode, and destroy it. The game can end in two ways:

    a) one agent is left (i assign +1 reward, end episode, and start again)
    b) the game reaches a fixed number of frames (i take the max damage dealt and assign rewards according to that, so if the highest agent has dealt 200 hp, it gets a reward of 1, if the second has dealt 100hp, it gets 0.5, and so on.)

    Right now I'm training about 40 tables at once and I've been at it for about 30 hours and the agents just use up all their ammo while spinning around and just basically stay where they are. Is there anything I'm doing wrong?

    Thanks!

    Code (CSharp):
    1. behaviors:
    2.   Agent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 20480
    7.       learning_rate: 3.0e-4
    8.       beta: 0.1
    9.       epsilon: 0.2
    10.       lambd: 0.99
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.       beta_schedule: constant
    14.       epsilon_schedule: linear
    15.     network_settings:
    16.       normalize: true
    17.       hidden_units: 512
    18.       num_layers: 2
    19.     reward_signals:
    20.         extrinsic:
    21.             strength: 1.0
    22.             gamma: 0.99
    23.         curiosity:
    24.             strength: 0.01
    25.             gamma: 0.99
    26.             encoding_size: 128
    27.     max_steps: 500000000
    28.     time_horizon: 1000
    29.     summary_freq: 10000