Question Help me to figure out this behavior

EternalMe · Jul 13, 2022

So I have a simple agent, a cube with collider and rigidbody. Beneath plane with collider. The agent observes its velocity, position and current force applied. Actions are force into x,z directions. When agent goes beyond the plane, I give it a -1 reward and restart the episode. No positive rewards.

At the beginning as expected it goes into all directions and goes beyond the plane again and again. But gets better over time and starts to do so less and less. At some point it doesn't go beyond (Mean Reward: 0.000), but is still pretty active at the middle. However when I run the training for longer time it freezes and doesn't move at all. It like it is traumatized from all the falling and better sits still, just to to be 1000000% sure.

So the questions are:

1) Why does this happen, with the freezing? Can somebody explain it in terms of RL?
2) What would be the strategy to avoid this? So it remains active in middle?
3) How is that, when I stop the learning and restart with `--resume`, it becomes active again? It feels like starting from 10% again. This is kind of critical issue for me.

My config:

Code (CSharp):

behaviors:

Survive:

trainer_type: sac

hyperparameters:

learning_rate: 0.0003

learning_rate_schedule: constant

batch_size: 1024

buffer_size: 1000000

buffer_init_steps: 0

tau: 0.005

steps_per_update: 20.0

save_replay_buffer: false

init_entcoef: 1.0

reward_signal_steps_per_update: 20.0

network_settings:

normalize: true

hidden_units: 512

num_layers: 3

vis_encode_type: simple

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

curiosity:

strength: 0.02

gamma: 0.99

network_settings:

hidden_units: 256

learning_rate: 0.0003

keep_checkpoints: 5

max_steps: 15000000

time_horizon: 100

summary_freq: 20000

smallg2023 · Jul 14, 2022

seems like your training is working as expected as you aren't giving it any reward for moving so why would it move?

EternalMe · Jul 14, 2022

garytrickett said: ↑

seems like your training is working as expected as you aren't giving it any reward for moving so why would it move?
Click to expand...

Ok so when the training starts, the network is randomized, so the agent doe's random action by observations. When agent goes beyond the plane, it gets -1. So it adjusts the network from learning and at some point it doesn't fail any more. Still it is active in the middle sector. From there the agent is not receiving any awards, not minus not plus. So theoretically the network is not changing. So if its not, what makes the agent go completely still after longer period? And why does it start to move again, when I resume the training?

So yes on very high level you could say that its expected, but I am here for a bit deeper knowledge. And this behavior is not very intuitive, especially for the --resume part.

My current guess it that it has something todo with `init_entcoef` and `entropy`.

smallg2023 · Jul 14, 2022

ah i see, your problem is more about why it stops trying to explore
did you try increase the curiosity strength?

Search Unity

Unity ID

Useful Searches

Question Help me to figure out this behavior

EternalMe

smallg2023

EternalMe

smallg2023