Question My agent is not learning, Mean Reward is not increasing

Discussion in 'ML-Agents' started by jloskot, Jul 20, 2023.

  jloskot


    Sep 26, 2021
    Hi everyone,

    I hope you are having a fantastic day!

    I am starting to be desperate related to the training of my agent. I have a simple agent that has ray sensors 18 for the detection of ground, and 15 for the detection of the target.
    The agent can move forward, backward and rotate left and right.
    When an episode starts I randomly generate the map - it is a simple grid where the local position zero is the agent spawn point and on random position is the spawn point of the target. I am trying to learn the agent to reach the target.
    The rewards are:
    +1 for hitting the target
    -1 for falling off the floor
    -1 when reaching Max Step (1000)

    My config file:
    Code (CSharp):
    2. behaviors:
    3.   AgentBeh_v2:
    4.     trainer_type: ppo
    5.     hyperparameters:
    6.       batch_size: 2048
    7.       buffer_size: 32768
    8.       learning_rate: 0.003
    9.       beta: 5.0e-4
    10.       epsilon: 0.2
    11.       lambd: 0.99
    12.       num_epoch: 3
    13.       learning_rate_schedule: linear
    14.       beta_schedule: constant
    15.       epsilon_schedule: linear
    16.     network_settings:
    17.       normalize: true
    18.       hidden_units: 256
    19.       num_layers: 3
    20.       memory:
    21.         use_recurrent: true
    22.         memory_size: 128
    23.         sequence_length: 128
    24.     reward_signals:
    25.       extrinsic:
    26.         gamma: 0.99
    27.         strength: 0.95
    28.       curiosity:
    29.         strength: 0.05
    30.         gamma: 0.99
    31.     max_steps: 5000000
    32.     time_horizon: 2048
    33.     summary_freq: 10000
    The agent is unable to learn even after a few million steps. The Mean Reward is not improving from -1. The agent often gets stuck at the corner of the floor, not far from the respawn. There has to be something fundamentally wrong with my config file.

    Any help, any ideas are very much appreciated!

    Best regards,

  jloskot


    Sep 26, 2021
    Hi everyone,

    I managed to resolve the issue. Just for others who might be facing similar issues here is what helped me:

    I was adding a negative reward once the agent fell from "white ground" and hit "lava" below. However, from the point the agent fell until it hit the lava, there was like a second during which the agent was falling, while still able to "move".
    I suspect that the agent failed to connect the negative reward with the act of falling from the white ground.
    I adjusted the scene in a way that the agent gets a negative reward immediately once it lefts the white ground. From this point, the agent was able to learn rather quickly. After 200k steps it was able to solve even slightly more complicated mazes.
