Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

How to debug reward design in RL training?

Discussion in 'ML-Agents' started by gzrjzcx, Mar 11, 2020.

  1. gzrjzcx


    May 23, 2019
    Hello, I am a indie game developer and I have leveraged the ML-Agents to develop my game AI successfully (check

    In this thread I would like to ask how to detect the errors across the rewards? Supposing my game as an example, I encounterd a situation where the cumulative rewrads converged to an ideal level, and all other metrics looked nice(PPO algorithms). However, when I put the model to the actual game environment, the inferenced actions were weird. The agent looks not acted as my wish, in other words, it dropped the block randomly instead of stacking the block to the column.

    I guess one possible reason is that the error credits assignment. Because the actions are selected according to a probability distribution. Specifically, each time we choose an action with the highest probability to get maximum credits w.r.t current state. Therefore, the problem looks like the rewards are assigned incorrectly( we use rewards to lead the agent to learn correct action). Finally, I found that some rewards are calculated incorrectly in some places and then the bug was fixed. But the debug process is a time-concuming work, therefore I want to find some better ways to find the rewards errors.

    Now I can only post a very straightforward approach is logging all kinds of significant events(i.e. those events which can lead to generate lots of rewards). Then once the model is weird, I can watch the video to find the weird behaviours, and locate the relevant events rapidly by the weird behaviours. Then, find the relevant rewards by checking if the events can influence the rewards.

    I know it is a very naive method, but I don't have any better methods. Now I am working on an extreme large project, therefore I need a more efficient way to locate this kind of problem rapidly. Any ideas or experiences? Thanks.