Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Reward/Penalty for a general situation

Discussion in 'ML-Agents' started by rares-c, Jul 26, 2021.

  1. rares-c

    rares-c

    Joined:
    Jul 15, 2021
    Posts:
    1
    Dear community,

    I've been building an agent using ML-Agents and I've ran into some trouble. Right now, I've got the following problem: the agent takes an action, yet the consequences of that action are only shown after some steps. Then, after those steps, I reward the agent according to those consequences. The problem is that during those steps, several other actions are performed. Afterwards, the agent will be rewarded according to an action that was taken in the past while taking an action in the present (summary: my agent is rewarded after taking an action for an action that was taken in the past). Any ideas? Is there a way to reward an agent for an action taken in the past? Also, how should I tackle the problem of penalizing the agent for the overall situation, not just for a single action? I.e. if multiple actions lead to a wrong outcome (and it is quite impossible to determine which action should've been different), can I give the agent a penalty for the multitude of actions it has taken?

    Regards,
    Rares
     
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Luckily, this is exactly what RL is designed to do. Giving an agent a reward at the end of a sequence of actions is rewarding that trajectory, not just the final action that gets the reward. Same with penalties. You can adjust the "gamma" parameter higher than the default 0.99 if the reward is happening after a LOT of steps (~200+ steps).