Search Unity

Resolved Cumulative reward decreases when episode length is also decreasing.

Discussion in 'ML-Agents' started by meldeg, Mar 1, 2023.

  1. meldeg

    meldeg

    Joined:
    Jan 22, 2021
    Posts:
    5
    Hi,


    The Cumulative reward is decreasing, when episode length is decreasing. When the episode ends it receives 20 points. Why does the Cumulative reward decrease, but the agent is getting better and better to solve the task?. (Attaching stat for agent.)

    Capture9.JPG

    Rewards:

    -0.2 for colliding with border
    2 for taking a treasure from chamber
    4 for taking this treasure to own chamber
    20 for winning the game.



    Conf file.
    Code (CSharp):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-3
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     hidden_units: 128
    8.     lambd: 0.95
    9.     learning_rate: 3.0e-4
    10.     learning_rate_schedule: linear
    11.     max_steps: 5.0e5
    12.     memory_size: 128
    13.     normalize: false
    14.     num_epoch: 3
    15.     num_layers: 2
    16.     time_horizon: 64
    17.     sequence_length: 64
    18.     summary_freq: 20000
    19.     use_recurrent: false
    20.     vis_encode_type: simple
    21.     reward_signals:
    22.         extrinsic:
    23.             strength: 1.0
    24.             gamma: 0.99
    25.         curiosity:
    26.             strength: 0.02
    27.             gamma: 0.99
    28.             encoding_size: 64
    29.             learning_rate: 3.0e-3
    30.  
    31. PlayerAgent:
    32.     time_horizon: 256
    33.     batch_size: 4096
    34.     buffer_size: 40960
    35.     hidden_units: 512
    36.     max_steps: 5.0e6
    37.     beta: 7.5e-3
    38.  
    Github Repository. You need to import ML-agent 1.0.8 Package manually in the project.

    https://github.com/Badsalt/AI

    /Melvin
     

    Attached Files:

  2. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    Cumulative reward is the total reward earned in the episode.

    Imagine you earn $1 per hour. You work 8 hours. Your cumulative reward is $8.

    Now imagine that you are being paid to do a task. You learn to do it faster, in only 3 hours. Now your cumulative reward is only $3.
     
  3. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    It's ok for cumulative reward to go down by the way.
     
  4. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    Oh, you don't have per-timestep rewards. Never mind then :p This is not the reason then :p
     
  5. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    I'd recommend watching what the agent is actually doing. How is it earning reward? Why is the episode getting shorter? There's either a bug in your code, or the agent is somehow 'gaming' your rewards somehow. Looking at what it is actually doing will likely reveal some insight to you.
     
  6. meldeg

    meldeg

    Joined:
    Jan 22, 2021
    Posts:
    5
    I solved the problem by setting the final reward based on the amounts of steps.