Search Unity

Other What effect would adding the previous reward result along with previous rewards to observations?

Discussion in 'ML-Agents' started by Creaturtle, Feb 14, 2021.

  1. Creaturtle

    Creaturtle

    Joined:
    Jan 24, 2018
    Posts:
    33
    If the agent was fed, along with its current observations, the previous observations and their rewards, would the agent be able to use that information to learn faster?
     
  2. Creaturtle

    Creaturtle

    Joined:
    Jan 24, 2018
    Posts:
    33
    Specifically, if the components of the reward function (loss of reward due to time, taking damage, dealing damage, etc) were coupled with the inputs that led to the decision, and this was stacked a few times, would it be able to learn better?
     
  3. ruoping_unity

    ruoping_unity

    Unity Technologies

    Joined:
    Jul 10, 2020
    Posts:
    134
    This is useful when an agent needs some sort of "memory" to decide the best action. For example in our Hallway example scene the agent needs to remember what it has seen before and decide the goal, so it requires memory for previous observations to train. You can enable this in ML-Agents by specifying the memory section in config file (see more details in the doc).
    However this is not always useful and it doesn't necessarily train faster. Processing such memory needs extra sequential network structure which takes more time to run and introduce more parameters. You should only use it when this information is required for the agent to make decision.


    If the those information is related to how the agent should act or you can imagine this would help you play better if you are playing the game, it would likely help the agent learn better. This applies not only to the "components of the reward function" you're saying here, but all the states and information in the game.
    As I wrote in the first section, stacking them is not necessary and it could be helpful depends on your use cases.