How to debug reward design in RL training?

gzrjzcx · Mar 11, 2020

Hello, I am a indie game developer and I have leveraged the ML-Agents to develop my game AI successfully (check
).

In this thread I would like to ask how to detect the errors across the rewards? Supposing my game as an example, I encounterd a situation where the cumulative rewrads converged to an ideal level, and all other metrics looked nice(PPO algorithms). However, when I put the model to the actual game environment, the inferenced actions were weird. The agent looks not acted as my wish, in other words, it dropped the block randomly instead of stacking the block to the column.

I guess one possible reason is that the error credits assignment. Because the actions are selected according to a probability distribution. Specifically, each time we choose an action with the highest probability to get maximum credits w.r.t current state. Therefore, the problem looks like the rewards are assigned incorrectly( we use rewards to lead the agent to learn correct action). Finally, I found that some rewards are calculated incorrectly in some places and then the bug was fixed. But the debug process is a time-concuming work, therefore I want to find some better ways to find the rewards errors.

Now I can only post a very straightforward approach is logging all kinds of significant events(i.e. those events which can lead to generate lots of rewards). Then once the model is weird, I can watch the video to find the weird behaviours, and locate the relevant events rapidly by the weird behaviours. Then, find the relevant rewards by checking if the events can influence the rewards.

I know it is a very naive method, but I don't have any better methods. Now I am working on an extreme large project, therefore I need a more efficient way to locate this kind of problem rapidly. Any ideas or experiences? Thanks.

Search Unity

Unity ID

Useful Searches

How to debug reward design in RL training?

gzrjzcx