Search Unity

Question Ml-Agents suddenly gives up

Discussion in 'ML-Agents' started by Rudaisvells, Feb 17, 2023.

  1. Rudaisvells

    Rudaisvells

    Joined:
    Oct 10, 2013
    Posts:
    2
    Hello!
    I have created a game (little bit similar to pool or snooker) where Ai needs to hit puck which then in turn needs to target correct color pawns so they end up in corner pockets.
    upload_2023-2-17_18-53-12.png
    I'm starting to train it with most simple possible situations. With puck and correct pawn only. Or with puck and two pawns - one correct and one oponent's.

    Ai needs to decide 4 continuous actions: x and z coordinates of puck, angle of hit and power of hit.
    It observes camera and coordinates of all objects.

    Rewards work like this:
    For scoring pawn +10
    For hitting correct pawn +3
    For near misses (puck to pawn or pawn to pocket) +0.1 to +0.4 depending of how close it got.
    For any error (hitting wrong pawn, scoring a puck or missing all pawns) -10

    Episode ends and game is reset after each hit. (when all peases have stopped, scored or hit outside of table)

    Ai is slowly getting better and better in getting this right.
    Mean reward slowly climbs from around -9 to +5.5 but then sometimes after 200k to 800k episodes it suddenly it gives up and starts to repeat same decision over and over.

    My yaml file looks like this
    Code (JavaScript):
    1. behaviors:
    2.   AiHitterS:
    3.     trainer_type: ppo
    4.  
    5.     hyperparameters:
    6.       # Hyperparameters common to PPO and SAC
    7.       batch_size: 1024
    8.       buffer_size: 10240
    9.       learning_rate: 3.0e-4
    10.       learning_rate_schedule: linear
    11.  
    12.       # PPO-specific hyperparameters
    13.       # Replaces the "PPO-specific hyperparameters" section above
    14.       beta: 5.0e-3
    15.       epsilon: 0.2
    16.       lambd: 0.95
    17.       num_epoch: 3
    18.  
    19.     # Configuration of the neural network (common to PPO/SAC)
    20.     network_settings:
    21.       vis_encoder_type: simple
    22.       normalize: true
    23.       hidden_units: 128
    24.       num_layers: 2
    25.       # memory
    26.       memory:
    27.         sequence_length: 64
    28.         memory_size: 256
    29.     reward_signals:
    30.       extrinsic:
    31.         gamma: 0.99
    32.         strength: 1
    33.     # Trainer configurations common to all trainers
    34.     max_steps: 50.0e5
    35.     time_horizon: 64
    36.     summary_freq: 10000
    37.     keep_checkpoints: 5
    38.     checkpoint_interval: 50000
    39.     threaded: true
    40.     init_path: null
    41.  
    42.  
    43.     # self-play
    44.     self_play:
    45.       window: 10
    46.       play_against_latest_model_ratio: 0.5
    47.       save_steps: 50000
    48.       swap_steps: 2000
    49.       team_change: 100000
    50.  
    51.     # use TensorFlow backend
    52.     framework: tensorflow
    53.    
    54. engine_settings:
    55.   width: 84
    56.   height: 84
    57.   quality_level: 5
    58.   time_scale: 1.0
    59.   target_frame_rate: -1
    60.   capture_frame_rate: 60
    61.   no_graphics: false
    I tried to use normalize: true and false, but my Ai collapses with both of them.

    What I am doing wrong? Is this still to hard and to random results for Ai to understand?
     
  2. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    RL is pretty unstable in general. It's quite hard to get stable learning. Adding some entropy regularization can sometimes help. I have a video about entreg at
    , though it's not actually targeting your specific problem, but principle still stands.
     
    Rudaisvells likes this.
  3. Rudaisvells

    Rudaisvells

    Joined:
    Oct 10, 2013
    Posts:
    2
    Thanks for your reply. I will definitely try this.