Search Unity

Question Large Drops in Episode Length/Reward

Discussion in 'ML-Agents' started by nemo_dev, Jan 31, 2023.

  1. nemo_dev

    nemo_dev

    Joined:
    Jun 16, 2021
    Posts:
    4
    Hello! I am trying to make an agent that can balance a ball on a post. However, the agent seems to be experiencing massive drops in reward/time while training. Any ideas what can be causing this?
    There are 4 observations: position, velocity, post angle, post angular velocity.
    and 2 continuous actions: force left, force right
    The agent is rewarded each update by: (how close angle is to 0) * (episode length ^2) * (Delta Time)
    My hyperparameters are as follows:
    Code (CSharp):
    1. behaviors:
    2.   CartBalanceAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 128
    6.       buffer_size: 4096
    7.       learning_rate: 0.0003
    8.       beta: 0.01
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 256
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     max_steps: 50000000
    24.     time_horizon: 64
    25.     summary_freq: 60000
    Any help/feedback is appreciated!