In my latest experiments I'm trying to train an agent to reach and hold a specific velocity as quickly as possible using 3D physics constrained to a plane (so y position clamped and x/z rotation clamped). This seemed like a simple goal, but the agents never seem to get what I'd call a satisfactory result. Actions: Torque -1 to +1 Forward force 0 to 1 Observations: Current velocity (split into normal and magnitude) Target velocity (same) Y angular velocity Goal: Get dot product and magnitude within a small range of target, while angular velocity is low (without that requirement the agent can just twirl with increasing thrust) I've tried just having that goal with an existential penalty, but it seems too sparse for the agent to ever find it. I've also tried giving continuous small reward/penalties for matching the components of the goal. This leads to the agent finding the goal eventually, but not quickly. Any suggestions for different strategies? More generally, is there a good resource for understanding how to structure rewards?