Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Large Drops in Episode Length/Reward

Discussion in 'ML-Agents' started by nemo_dev, Jan 31, 2023.

  1. nemo_dev

    nemo_dev

    Joined:
    Jun 16, 2021
    Posts:
    4
    Hello! I am trying to make an agent that can balance a ball on a post. However, the agent seems to be experiencing massive drops in reward/time while training. Any ideas what can be causing this?
    There are 4 observations: position, velocity, post angle, post angular velocity.
    and 2 continuous actions: force left, force right
    The agent is rewarded each update by: (how close angle is to 0) * (episode length ^2) * (Delta Time)
    My hyperparameters are as follows:
    Code (CSharp):
    1. behaviors:
    2.   CartBalanceAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 128
    6.       buffer_size: 4096
    7.       learning_rate: 0.0003
    8.       beta: 0.01
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 256
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     max_steps: 50000000
    24.     time_horizon: 64
    25.     summary_freq: 60000
    Any help/feedback is appreciated!