Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Question Weird change in cumulative reward

Discussion in 'ML-Agents' started by Hsgngr, Jul 7, 2020.

  1. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    I was working on a project and my cumulative reward changed so weirdly I thought I should post this.
    upload_2020-7-7_4-51-8.png
    So I read about curiosity can lead this kind of behavior however I am only using extrinsic reward.
    My configuration file:

    Code (CSharp):
    1. behaviors:
    2.   PandemicAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 20480
    7.       learning_rate: 0.0003
    8.       beta: 0.005
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 6
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 512 #256
    16.       num_layers: 4 #2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     checkpoint_interval: 500000
    24.     max_steps: 1.0e7
    25.     time_horizon: 128
    26.     summary_freq: 10000
    27.     threaded: true
    The task is simple blue agents tries to collect yellow cubes as fast as possible.
    upload_2020-7-7_4-53-2.png

    Any idea why this happened ?
     
  2. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    do you have set a max_step for an episode or is it a continuous environment which does not reset? I guess the agent encountered a situation that he couldn't get out off, like being stuck in a corner or a flaw in the neural network during training where the agents wants to go forward based on the pixel values in a corner and therefore consistently walking against the corner.
     
  3. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    There was a maximum_step. It wasn't a continuous environment. Therefore I don't think it was that. When I look at the simulation I saw the cube keep spinning rather than going to the reward. @BotAcademy
     
  4. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Thats interesting. Hopefully someone from the dev team can help you out!
     
  5. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Would you mind sharing your policy/value loss and policy entropy curves? Also, you could try running with threaded: false which might help stability.