Question Ml-Agents suddenly gives up

Rudaisvells · Feb 17, 2023

Hello!
I have created a game (little bit similar to pool or snooker) where Ai needs to hit puck which then in turn needs to target correct color pawns so they end up in corner pockets.

I'm starting to train it with most simple possible situations. With puck and correct pawn only. Or with puck and two pawns - one correct and one oponent's.

Ai needs to decide 4 continuous actions: x and z coordinates of puck, angle of hit and power of hit.
It observes camera and coordinates of all objects.

Rewards work like this:
For scoring pawn +10
For hitting correct pawn +3
For near misses (puck to pawn or pawn to pocket) +0.1 to +0.4 depending of how close it got.
For any error (hitting wrong pawn, scoring a puck or missing all pawns) -10

Episode ends and game is reset after each hit. (when all peases have stopped, scored or hit outside of table)

Ai is slowly getting better and better in getting this right.
Mean reward slowly climbs from around -9 to +5.5 but then sometimes after 200k to 800k episodes it suddenly it gives up and starts to repeat same decision over and over.

My yaml file looks like this

Code (JavaScript):

behaviors:

AiHitterS:

trainer_type: ppo

hyperparameters:

# Hyperparameters common to PPO and SAC

batch_size: 1024

buffer_size: 10240

learning_rate: 3.0e-4

learning_rate_schedule: linear

# PPO-specific hyperparameters

# Replaces the "PPO-specific hyperparameters" section above

beta: 5.0e-3

epsilon: 0.2

lambd: 0.95

num_epoch: 3

# Configuration of the neural network (common to PPO/SAC)

network_settings:

vis_encoder_type: simple

normalize: true

hidden_units: 128

num_layers: 2

# memory

memory:

sequence_length: 64

memory_size: 256

reward_signals:

extrinsic:

gamma: 0.99

strength: 1

# Trainer configurations common to all trainers

max_steps: 50.0e5

time_horizon: 64

summary_freq: 10000

keep_checkpoints: 5

checkpoint_interval: 50000

threaded: true

init_path: null

# self-play

self_play:

window: 10

play_against_latest_model_ratio: 0.5

save_steps: 50000

swap_steps: 2000

team_change: 100000

# use TensorFlow backend

framework: tensorflow

engine_settings:

width: 84

height: 84

quality_level: 5

time_scale: 1.0

target_frame_rate: -1

capture_frame_rate: 60

no_graphics: false

I tried to use normalize: true and false, but my Ai collapses with both of them.

What I am doing wrong? Is this still to hard and to random results for Ai to understand?

hughperkins · Feb 19, 2023

RL is pretty unstable in general. It's quite hard to get stable learning. Adding some entropy regularization can sometimes help. I have a video about entreg at
, though it's not actually targeting your specific problem, but principle still stands.

Rudaisvells · Feb 20, 2023

hughperkins said: ↑

RL is pretty unstable in general. It's quite hard to get stable learning. Adding some entropy regularization can sometimes help. I have a video about entreg at
, though it's not actually targeting your specific problem, but principle still stands.
Click to expand...

Thanks for your reply. I will definitely try this.

Search Unity

Unity ID

Useful Searches

Question Ml-Agents suddenly gives up

Rudaisvells

hughperkins

Rudaisvells