Search Unity

Question Behavioural Cloning with Gail agent behaving wired

Discussion in 'ML-Agents' started by shohan4556, Apr 28, 2021.

  1. shohan4556

    shohan4556

    Joined:
    Feb 3, 2016
    Posts:
    11
    I have been trying to reproduce the similar behavior by following this tutorial



    I am using the behavioural cloning with Gail but after lots of training the agent is not performing as expected. I am using discrete action for autonomous agent e.g: car and trying to make the behave exactly like the demonstration, I am also using the same time-scale in training and game. Someone please assist me, this is my first approach to use ML Agent toolkit.

    • Unity Version: 2019.4.2
    • OS + version: MacOs
    • ML-Agents version: 1.0.6

    Code (CSharp):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-4
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     lambd: 0.95
    8.     learning_rate: 3.0e-4
    9.     learning_rate_schedule: linear
    10.     max_steps: 9.0e7
    11.     memory_size: 128
    12.     normalize: false
    13.     num_epoch: 3
    14.     num_layers: 2
    15.     time_horizon: 64
    16.     sequence_length: 64
    17.     summary_freq: 10000
    18.     use_recurrent: false
    19.     vis_encode_type: simple
    20.     time_horizon: 128
    21.     hidden_units: 512
    22.     reward_signals:
    23.         extrinsic:
    24.             strength: 0.01
    25.             gamma: 0.99
    26.      
    27. RaceAgent:
    28.     trainer: ppo
    29.     batch_size: 1024
    30.     beta: 5.0e-4
    31.     buffer_size: 10240
    32.     epsilon: 0.2
    33.     lambd: 0.95
    34.     learning_rate: 3.0e-4
    35.     learning_rate_schedule: linear
    36.     max_steps: 50000000
    37.     memory_size: 128
    38.     normalize: false
    39.     num_epoch: 3
    40.     num_layers: 2
    41.     sequence_length: 64
    42.     summary_freq: 10000
    43.     use_recurrent: false
    44.     vis_encode_type: simple
    45.     time_horizon: 128
    46.     hidden_units: 512
    47.     behavioral_cloning:
    48.         demo_path: expert_6.demo
    49.         strength: 1.0
    50.         steps: 100000
    51.     reward_signals:
    52.         extrinsic:
    53.             strength: 0.1 #0.01
    54.             gamma: 0.99
    55.         curiosity:
    56.             strength: 0.02 #0.02
    57.             gamma: 0.99 #0.90
    58.             encoding_size: 256
    59.         gail:
    60.             strength: 1.0
    61.             gamma: 0.99
    62.             encoding_size: 128
    63.             demo_path: expert_6.demo
     
  2. celion_unity

    celion_unity

    Joined:
    Jun 12, 2019
    Posts:
    289
    Hi,
    A few thoughts on your configuration:
    * I don't think you need curiosity for a racing game, I would recommend removing it.
    * With extrinsic strength of .1 and behavioral_cloning and gail strength of 1, your agent is basically ignoring all the rewards from the environment and trying to match the demonstrations. So if your demonstrations aren't very good, or there aren't many of them, or they don't cover all the scenarios in the game, the agent is not going to behave well.
    * If you want a more human-like behavior that still performs well, I'd recommend raising the extrinsic strength, lowering the gail strength, and removing behavioral_cloning.
    * If you want want the agent to perform as well as possible, but get a head start on training using the demonstrations, remove gail and just use behavioral_cloning.