Question Positive vs negative rewards

EternalMe · Dec 18, 2022

So is there a difference? Or only the balance between upper and lower limit counts, and i could as well use rewards in range of -2 to -1? -1 for good stuff?

I might me hallucinating, but from my personal observations there is a difference. Negative rewards kind of encourage the agent to more explore alternative actions. And lower chances to repeat this negatively rewarded actions. No?

Or is it just the quick termination problem? Where agent decides it's batter to fail the whole episode, just to not accumulate more negative rewards?

The ml-agents documentation recommends to be careful with negative rewards, and not use them excessively as agent won't learn well. This also a pointer for me. However I think we need a bit deeper explanation on this to design proper reward models.

GamerLordMat · Dec 18, 2022

Yeah, I dont understand it either bc you would have to understand reinforcement training in detail (on my TO-DO).

But negative rewards tend to block the agent in my experience. A big negative experience when dropping the ball makes the agent stop moving and just tries to dont let the ball fall. Sometimes negative rewards are needed to avoid exploits and push it in a hard manner to the expected result.

EternalMe · Dec 18, 2022

GamerLordMat said: ↑

Yeah, I dont understand it either bc you would have to understand reinforcement training in detail (on my TO-DO).

But negative rewards tend to block the agent in my experience. A big negative experience when dropping the ball makes the agent stop moving and just tries to dont let the ball fall. Sometimes negative rewards are needed to avoid exploits and push it in a hard manner to the expected result.
Click to expand...

And this is the part from docs:

Positive rewards are often more helpful to shaping the desired behavior of an agent than negative rewards. Excessive negative rewards can result in the agent failing to learn any meaningful behavior.

I did some general reading on RL, but it seems it depends on the actual implementation. So answer from ml-agents team would be nice.

Search Unity

Unity ID

Useful Searches

Question Positive vs negative rewards

EternalMe

GamerLordMat

EternalMe