Search Unity

EndEpisode() called multiple times?

Discussion in 'ML-Agents' started by nightrooster52, Jan 12, 2021.

  1. nightrooster52

    nightrooster52

    Joined:
    Jan 12, 2021
    Posts:
    3
    Hi,

    I have an issue where my cumulative reward is unexpectedly going negative.
    The only place in my code I subtract from the reward is where I want to slash 90% of the agents total reward on failure conditions.

    This is the code:
    if (failureCondition){
    SetReward(-(GetCumulativeReward()*0.90f));
    EndEpisode();
    }

    Outside this code, only positive rewards are given.

    How could this code push my reward below zero?

    Is the OnActionReceived() function called again after EndEpisode?

    Does EndEpisode only have an affect when a decision is requested? (my decision period is 5)
     
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    SetReward will set the reward for the entire timestep i.e. any AddReward or SetReward called before it within the timestep will not contribute to the reward in the given timestep. I see there is a negative sign in

    SetReward(-(GetCumulativeReward()*0.90f));

    So if GetCumulativeReward() > 0 this will set the reward to a negative value.

    Additionally, if you want to affect rewards administered in the past, a suggested alternative is to try to accumulate the reward in a variable and then administer it all at once.
     
  3. nightrooster52

    nightrooster52

    Joined:
    Jan 12, 2021
    Posts:
    3
    My point is the cumulative reward for an episode is going negative, which should be impossible with the above code.

    e.g.
    CumulativeReward = 1
    SetReward(-(GetCumulativeReward()*0.90);
    CumulativeReward = 0.1 <<this should be the result

    even if it runs again, it should become 0.01, 0.001 and so on, never crossing 0.
     
  4. nightrooster52

    nightrooster52

    Joined:
    Jan 12, 2021
    Posts:
    3
    I've been debugging further and I think there may be an issue with the calculation of meanreward.

    I tried setting up debug logwarnings for if I EVER subtract from the reward outside the failconditions (no hits)
    I tried log the cumulative reward after the failure condition runs, and it never falls below 0.

    However, when I train, the meanreward shows up as negative...I'm really scratching my head here.

    Is the meanreward per step or per episode? I can see if it's per step the large punishments on failure might push some statistic negative, but mean average per step should still be positive since totals are positive.
     
  5. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    You use "SetReward" instead of "AddReward". This means that the reward the agent received during the step is overwritten, but it is still part of GetCumulativeReward. This mean that during the last step of the life of the Agent, a reward was given (lets say a reward of 1), but the Agent dies, then the code you posted gets called. The reward of the agent is taken away and it receives a penalty of 0.9. So the total cumulative reward after the code above is called is -0.9. If you want it to be 0.1 (which is I think what you want), you should call "AddReward" instead.