I am training agents in a game where each represents a vehicle on a terrain. There are several extant vehicles at any one time (think of them as 2 teams of tanks trying to occupy a "king of the hill" point on the terrain by moving and shooting each other). Vehicles are simulated with literal physics: they have throttle and steering using wheel colliders, suspension and so on. The terrain is not flat. There are obstacles on the terrain that vehicles can't traverse (rocks, trees, slopes that are too steep, etc.). I already have robust AI in place that uses behavior trees and NavMesh to successfully make these guys drive to any given destination point on the terrain. They are quite competent at moving from point A to point B on their own, manipulating steering and throttle and paths on the NavMesh to get the job done. Now I am trying to bring ml-agents into the project for the sake of teaching them to be tactical and try to win the game. Maybe the pre-existing AI mentioned above will still be useful, or maybe I should try to let them learn how to drive via ml-agents instead and throw away those behavior trees. Either way, I am hoping ml-agents can provide some of the higher level tactical thinking here. The agents need to learn that, in order to win the game, they need to either capture and hold a "control point" on the terrain (which just means, occupy the radius around some central point on the map with no enemies occupying it), OR wipe out all enemy vehicles. I am trying to start small by simply teaching them to drive to the control point and occupy it. In this first level of training, I don't even have any enemy units in the scene yet, so the problem is simplified to "if you just occupy the control point, you will win" and nothing else. This really means: get within some radius of the control point and stay there in order to win. I've struggled to get even this to work. I've had no luck getting these agents to actually learn anything useful, as far as I can tell, after millions of steps of learning across different attempts. I have set this up as a curriculum, where in the first lesson, all of the agents spawn just outside of the radius (only a few meters outside of it) - so even if they just wandered purely randomly they would "win" pretty often. Then there are subsequent lessons which reduce the radius. None of that has helped at all. Is there any high level advice about this kind of scenario? That would be much appreciated and in addition to that, I have some specific questions: In this kind of setup, what kind of action setup would you recommend? I have tried setting up actions in the "classic" way that almost all ml-agents demos are setup, which is to have simple "move forward, move right, move back, move left" discrete actions. That was a disaster because these are real vehicle physics, so the agent tries to madly steer left then right and shove the throttle forward and backward several times per second, which of course means the vehicle never actually gets anywhere. It takes time for the vehicle to gain momentum when trying to move and you have to steer in the same direction for some time in order to actually turn. I tried giving this hundreds of thousands of training steps without deviating much from ml-agents defaults to no avail. I then tried tuning this in many many ways, including making action intervals farther apart, increasing the time between learning steps (even going so far as to separate them by several seconds, up to even 20 seconds apart in some tests) I even tried negative rewards for being in those "stuck states" where you have no velocity because you're fighting yourself. The agents simply never learned anything meaningful about driving. They always ended up spinning in circles at best. This was usually in the presence of rewards which encouraged getting closer to the objective and which discouraged "wasting time" by not making any progress. In this long and painful experiment, like all of the others, I came away wondering whether something fundamental is broken about my setup: they really don't seem to learn *anything* and I have no way to see why that might be that I know of. In this setup, I also simplified things by making sure there were no obstacles and the terrain was mostly flat. Same results. I have also tried a different approach, which is to take advantage of the above-mentioned existing navigation AI. Instead of having discrete "forward, right, back, left" actions, I changed to "choose a destination x,y coordinate to move to" actions, which then used my existing navmesh and behavior trees to drive to the chosen point. These agents were a lot more interesting because they actually drove to places instead of spinning in circles, but they also never learned anything useful. These are the only 2 action schemes I've tried so far. What would be best? What kind of observations would help? My CollectionObservations() has focused on these kinds of things so far (all in normalized spaces, so even coordinates on the terrain are normalized to 0...1 within the region the agents are playing in on x and y) * The control point position * The agent's position * The agent's offset from the control point * The agent's distance from the control point * The agent's velocity (and also his simple speed as a single scalar) * The agent's angular velocity ... and probably some others I'm forgetting. I've tried many different permutations of all kinds of observations. None of them seem to have any effect on learning. How can I tell whether an observation is contributing to learning? Apparently mine aren't. So many hyperparameters - *NO* idea what to choose I've re-read the docs on hyperparameters many times. I still have no idea what to choose, honestly, nor how to evaluate whether tweaking these is making things better or worse. I watch Tensorboard as training progresses to try to get a feel for whether a change was positive or negative, but there is so little progress being made here that it's hard to judge. Any advice on how to set those for this kind of scenario? Is ml-agents even appropriate? Is this kind of scenario outside of what ml-agents is trying to solve? The reason I'm exploring ml-agents is not for this basic use-case of "occupy the control point", but because of the rich emergent behavior I was hoping to unveil later when enemy vehicles are also on the map and the agents would learn to do things like flank them and evade them on their way to the control point. Given that I can't even make them simply "move to the control point" in the first place, maybe this is a totally unrealistic goal? I've been looking carefully at the Unity AI Planner. I can see exactly how that would work here, combined with my existing navigation stuff mentioned above. But I fear it would offer less interesting, emergent behavior. But maybe I'm wrong - would that approach be better for a project like this? Sorry for that wall of text, but any advice would be most appreciated.