Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Understanding Reward functions

Discussion in 'ML-Agents' started by infinityplusb, May 1, 2020.

  1. infinityplusb

    infinityplusb

    Joined:
    Jan 26, 2017
    Posts:
    3
    I had a simple project that I created, where the user clicks on the screen, and a GameObject spawns.
    I've managed to incorporate the ml-agents into the game, but the issue I find when running in Heuristic mode (i.e. me controlling) is that it appears that the Reward gets reset to zero constantly (so my player doesn't continue to gain and/or lose).
    I'm modelling my interaction on the Penguin example https://www.immersivelimit.com/tutorials/unity-ml-agents-penguins which works fine.
    I noticed reading the docs "The reward value is reset to zero when the agent receives a new decision." in https://github.com/Unity-Technologi...Learning-Environment-Design-Agents.md#rewards and was wondering what that referred to, and where a "decision" was defined.

    Question: what would be the circumstances that would reset the reward to zero, if it's not manually set by code I create?
     
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @infinityplusb,
    The reward can actually be set by you at every step. A step consists of
    1. observations made by the agent
    2. Those observations are then used to request a decision
    3. The request returns a set of actions the agent uses to act
    4. rewards are set by the user (you in this case) to let the agent know if its actions should be encouraged or not
    Rewards are associated with a step to either encourage or discourage its behavior.

    Does that answer your question?

    You can find more on reward functions here.
     
  3. infinityplusb

    infinityplusb

    Joined:
    Jan 26, 2017
    Posts:
    3
    Hi @christophergoy
    Thanks. I know I can set the reward, the issue is something (not me) is resetting it, every second or so.
    So I have made a simple task that adds a negative reward each time I click. This works.
    i haven't distinctly set anything to add a positive reward, but the cumulative reward keeps going back to zero, approximately every second.

    I'm trying to uncover why, and I don't think it's in my code, but some mechanism in the backend.
    Does that make sense?
     
    Last edited: May 4, 2020
  4. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @infinityplusb,
    The reward is reset for every step. So if you set a reward in one step, in the next step it is back to 0. The reward for that step is to help train the network to discourage or encourage the behavior for that step.

    If you are only setting rewards based on user interaction then I imagine the reward would be close zero all of the time as there are 60 steps per second by default.
     
  5. infinityplusb

    infinityplusb

    Joined:
    Jan 26, 2017
    Posts:
    3
    Thanks @christophergoy
    Apologies, what I was referring to, and getting confused about was, in fact Agent.GetCumulativeReward()
    This, as is my understanding, shouldn't reset every step, however this is what I am trying to output, and this is getting reset periodically!
    I can see if I run the Penguin example, it works fine, but when I try to implement the same code in my own scenario, the GetCumulativeReward is in fact resetting to zero.
     
  6. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735