Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Weird change in cumulative reward

Discussion in 'ML-Agents' started by Hsgngr, Jul 7, 2020.

  1. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    I was working on a project and my cumulative reward changed so weirdly I thought I should post this.
    upload_2020-7-7_4-51-8.png
    So I read about curiosity can lead this kind of behavior however I am only using extrinsic reward.
    My configuration file:

    Code (CSharp):
    1. behaviors:
    2.   PandemicAgent:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 2048
    6.       buffer_size: 20480
    7.       learning_rate: 0.0003
    8.       beta: 0.005
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 6
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 512 #256
    16.       num_layers: 4 #2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     checkpoint_interval: 500000
    24.     max_steps: 1.0e7
    25.     time_horizon: 128
    26.     summary_freq: 10000
    27.     threaded: true
    The task is simple blue agents tries to collect yellow cubes as fast as possible.
    upload_2020-7-7_4-53-2.png

    Any idea why this happened ?
     
  2. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    do you have set a max_step for an episode or is it a continuous environment which does not reset? I guess the agent encountered a situation that he couldn't get out off, like being stuck in a corner or a flaw in the neural network during training where the agents wants to go forward based on the pixel values in a corner and therefore consistently walking against the corner.
     
  3. Hsgngr

    Hsgngr

    Joined:
    Dec 28, 2015
    Posts:
    61
    There was a maximum_step. It wasn't a continuous environment. Therefore I don't think it was that. When I look at the simulation I saw the cube keep spinning rather than going to the reward. @BotAcademy
     
  4. BotAcademy

    BotAcademy

    Joined:
    May 15, 2020
    Posts:
    32
    Thats interesting. Hopefully someone from the dev team can help you out!
     
  5. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Would you mind sharing your policy/value loss and policy entropy curves? Also, you could try running with threaded: false which might help stability.