Question Weird change in cumulative reward

Hsgngr · Jul 7, 2020

I was working on a project and my cumulative reward changed so weirdly I thought I should post this.

So I read about curiosity can lead this kind of behavior however I am only using extrinsic reward.
My configuration file:

Code (CSharp):

behaviors:

PandemicAgent:

trainer_type: ppo

hyperparameters:

batch_size: 2048

buffer_size: 20480

learning_rate: 0.0003

beta: 0.005

epsilon: 0.2

lambd: 0.95

num_epoch: 6

learning_rate_schedule: linear

network_settings:

normalize: false

hidden_units: 512 #256

num_layers: 4 #2

vis_encode_type: simple

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

keep_checkpoints: 5

checkpoint_interval: 500000

max_steps: 1.0e7

time_horizon: 128

summary_freq: 10000

threaded: true

The task is simple blue agents tries to collect yellow cubes as fast as possible.

Any idea why this happened ?

BotAcademy · Jul 7, 2020

do you have set a max_step for an episode or is it a continuous environment which does not reset? I guess the agent encountered a situation that he couldn't get out off, like being stuck in a corner or a flaw in the neural network during training where the agents wants to go forward based on the pixel values in a corner and therefore consistently walking against the corner.

Hsgngr · Jul 8, 2020

There was a maximum_step. It wasn't a continuous environment. Therefore I don't think it was that. When I look at the simulation I saw the cube keep spinning rather than going to the reward. @BotAcademy

BotAcademy · Jul 8, 2020

Thats interesting. Hopefully someone from the dev team can help you out!

andrewcoh_unity · Jul 8, 2020

Would you mind sharing your policy/value loss and policy entropy curves? Also, you could try running with threaded: false which might help stability.

Search Unity

Question Weird change in cumulative reward

Hsgngr

BotAcademy

Hsgngr

BotAcademy

andrewcoh_unity

Unity Technologies

Search Unity

Unity ID

Useful Searches

Question Weird change in cumulative reward

Hsgngr

BotAcademy

Hsgngr

BotAcademy

andrewcoh_unity

Unity Technologies