Search Unity

Question Help me to figure out this behavior

Discussion in 'ML-Agents' started by EternalMe, Jul 13, 2022.

  1. EternalMe

    EternalMe

    Joined:
    Sep 12, 2014
    Posts:
    183
    So I have a simple agent, a cube with collider and rigidbody. Beneath plane with collider. The agent observes its velocity, position and current force applied. Actions are force into x,z directions. When agent goes beyond the plane, I give it a -1 reward and restart the episode. No positive rewards.

    At the beginning as expected it goes into all directions and goes beyond the plane again and again. But gets better over time and starts to do so less and less. At some point it doesn't go beyond (Mean Reward: 0.000), but is still pretty active at the middle. However when I run the training for longer time it freezes and doesn't move at all. It like it is traumatized from all the falling and better sits still, just to to be 1000000% sure.

    So the questions are:

    1) Why does this happen, with the freezing? Can somebody explain it in terms of RL?
    2) What would be the strategy to avoid this? So it remains active in middle?
    3) How is that, when I stop the learning and restart with `--resume`, it becomes active again? It feels like starting from 10% again. This is kind of critical issue for me.

    My config:

    Code (CSharp):
    1. behaviors:
    2.   Survive:
    3.     trainer_type: sac
    4.     hyperparameters:
    5.       learning_rate: 0.0003
    6.       learning_rate_schedule: constant
    7.       batch_size: 1024
    8.       buffer_size: 1000000
    9.       buffer_init_steps: 0
    10.       tau: 0.005
    11.       steps_per_update: 20.0
    12.       save_replay_buffer: false
    13.       init_entcoef: 1.0
    14.       reward_signal_steps_per_update: 20.0
    15.     network_settings:
    16.       normalize: true
    17.       hidden_units: 512
    18.       num_layers: 3
    19.       vis_encode_type: simple
    20.     reward_signals:
    21.       extrinsic:
    22.         gamma: 0.99
    23.         strength: 1.0
    24.       curiosity:
    25.         strength: 0.02
    26.         gamma: 0.99
    27.         network_settings:
    28.           hidden_units: 256
    29.         learning_rate: 0.0003
    30.     keep_checkpoints: 5
    31.     max_steps: 15000000
    32.     time_horizon: 100
    33.     summary_freq: 20000
     
  2. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    147
    seems like your training is working as expected as you aren't giving it any reward for moving so why would it move?
     
  3. EternalMe

    EternalMe

    Joined:
    Sep 12, 2014
    Posts:
    183
    Ok so when the training starts, the network is randomized, so the agent doe's random action by observations. When agent goes beyond the plane, it gets -1. So it adjusts the network from learning and at some point it doesn't fail any more. Still it is active in the middle sector. From there the agent is not receiving any awards, not minus not plus. So theoretically the network is not changing. So if its not, what makes the agent go completely still after longer period? And why does it start to move again, when I resume the training?

    So yes on very high level you could say that its expected, but I am here for a bit deeper knowledge. And this behavior is not very intuitive, especially for the --resume part.

    My current guess it that it has something todo with `init_entcoef` and `entropy`.
     
    Last edited: Jul 14, 2022
  4. smallg2023

    smallg2023

    Joined:
    Sep 2, 2018
    Posts:
    147
    ah i see, your problem is more about why it stops trying to explore
    did you try increase the curiosity strength?