Search Unity

Advice for Training Physics-Driven Vehicles in "RTS" Game

Discussion in 'ML-Agents' started by Claytonious, Apr 30, 2020.

  1. Claytonious

    Claytonious

    Joined:
    Feb 16, 2009
    Posts:
    902
    I am training agents in a game where each represents a vehicle on a terrain. There are several extant vehicles at any one time (think of them as 2 teams of tanks trying to occupy a "king of the hill" point on the terrain by moving and shooting each other). Vehicles are simulated with literal physics: they have throttle and steering using wheel colliders, suspension and so on. The terrain is not flat. There are obstacles on the terrain that vehicles can't traverse (rocks, trees, slopes that are too steep, etc.).

    I already have robust AI in place that uses behavior trees and NavMesh to successfully make these guys drive to any given destination point on the terrain. They are quite competent at moving from point A to point B on their own, manipulating steering and throttle and paths on the NavMesh to get the job done.

    Now I am trying to bring ml-agents into the project for the sake of teaching them to be tactical and try to win the game. Maybe the pre-existing AI mentioned above will still be useful, or maybe I should try to let them learn how to drive via ml-agents instead and throw away those behavior trees. Either way, I am hoping ml-agents can provide some of the higher level tactical thinking here.

    The agents need to learn that, in order to win the game, they need to either capture and hold a "control point" on the terrain (which just means, occupy the radius around some central point on the map with no enemies occupying it), OR wipe out all enemy vehicles.

    I am trying to start small by simply teaching them to drive to the control point and occupy it. In this first level of training, I don't even have any enemy units in the scene yet, so the problem is simplified to "if you just occupy the control point, you will win" and nothing else. This really means: get within some radius of the control point and stay there in order to win.

    I've struggled to get even this to work. I've had no luck getting these agents to actually learn anything useful, as far as I can tell, after millions of steps of learning across different attempts. I have set this up as a curriculum, where in the first lesson, all of the agents spawn just outside of the radius (only a few meters outside of it) - so even if they just wandered purely randomly they would "win" pretty often. Then there are subsequent lessons which reduce the radius. None of that has helped at all.

    Is there any high level advice about this kind of scenario? That would be much appreciated and in addition to that, I have some specific questions:

    In this kind of setup, what kind of action setup would you recommend?
    I have tried setting up actions in the "classic" way that almost all ml-agents demos are setup, which is to have simple "move forward, move right, move back, move left" discrete actions. That was a disaster because these are real vehicle physics, so the agent tries to madly steer left then right and shove the throttle forward and backward several times per second, which of course means the vehicle never actually gets anywhere. It takes time for the vehicle to gain momentum when trying to move and you have to steer in the same direction for some time in order to actually turn. I tried giving this hundreds of thousands of training steps without deviating much from ml-agents defaults to no avail. I then tried tuning this in many many ways, including making action intervals farther apart, increasing the time between learning steps (even going so far as to separate them by several seconds, up to even 20 seconds apart in some tests) I even tried negative rewards for being in those "stuck states" where you have no velocity because you're fighting yourself. The agents simply never learned anything meaningful about driving. They always ended up spinning in circles at best. This was usually in the presence of rewards which encouraged getting closer to the objective and which discouraged "wasting time" by not making any progress. In this long and painful experiment, like all of the others, I came away wondering whether something fundamental is broken about my setup: they really don't seem to learn *anything* and I have no way to see why that might be that I know of. In this setup, I also simplified things by making sure there were no obstacles and the terrain was mostly flat. Same results.

    I have also tried a different approach, which is to take advantage of the above-mentioned existing navigation AI. Instead of having discrete "forward, right, back, left" actions, I changed to "choose a destination x,y coordinate to move to" actions, which then used my existing navmesh and behavior trees to drive to the chosen point. These agents were a lot more interesting because they actually drove to places instead of spinning in circles, but they also never learned anything useful.

    These are the only 2 action schemes I've tried so far. What would be best?

    What kind of observations would help?
    My CollectionObservations() has focused on these kinds of things so far (all in normalized spaces, so even coordinates on the terrain are normalized to 0...1 within the region the agents are playing in on x and y)
    * The control point position
    * The agent's position
    * The agent's offset from the control point
    * The agent's distance from the control point
    * The agent's velocity (and also his simple speed as a single scalar)
    * The agent's angular velocity

    ... and probably some others I'm forgetting. I've tried many different permutations of all kinds of observations. None of them seem to have any effect on learning.

    How can I tell whether an observation is contributing to learning? Apparently mine aren't.

    So many hyperparameters - *NO* idea what to choose
    I've re-read the docs on hyperparameters many times. I still have no idea what to choose, honestly, nor how to evaluate whether tweaking these is making things better or worse. I watch Tensorboard as training progresses to try to get a feel for whether a change was positive or negative, but there is so little progress being made here that it's hard to judge. Any advice on how to set those for this kind of scenario?

    Is ml-agents even appropriate?
    Is this kind of scenario outside of what ml-agents is trying to solve? The reason I'm exploring ml-agents is not for this basic use-case of "occupy the control point", but because of the rich emergent behavior I was hoping to unveil later when enemy vehicles are also on the map and the agents would learn to do things like flank them and evade them on their way to the control point. Given that I can't even make them simply "move to the control point" in the first place, maybe this is a totally unrealistic goal?

    I've been looking carefully at the Unity AI Planner. I can see exactly how that would work here, combined with my existing navigation stuff mentioned above. But I fear it would offer less interesting, emergent behavior. But maybe I'm wrong - would that approach be better for a project like this?

    Sorry for that wall of text, but any advice would be most appreciated.
     
    AMM22 likes this.
  2. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    Hi Claytonious, this sounds like an interesting game and problem. First, w.r.t to ML-Agents vs. AI Planner: ML-Agents tends to be better at reactive behavior, i.e. "if the agent sees X then do Y". If you imagine your game can be played reactively, then ML-Agents is a good choice. But if you require long-term planning, i.e. "if the agent sees X, I need to do A, B, and then C" then the AI Planner is probably a better choice. ML-Agents has a memory feature but it's not terribly long-term. In your case, if you plan on giving the agent the full state (e.g. the location of all the enemies it should be concerned about) then I don't see why ML-Agents won't work.

    I think both control schemes would work, and am actually surprised to hear that the vehicle didn't learn to move at all in your first scheme. We've done something similar with a vehicle that required throttle. I think the issue here is that the agent doesn't see that it's throttle input is contributing to any change in the observations, as it needs to hit the throttle many times before the vehicle starts moving (and similarly for steering), and therefore never learns that its actions have any utility. You could try a couple things (in order of difficulty):
    • Enable Stacked Vectors under Vector Observation on your agent. This hopefully will mean that the agent can "see" the change in velocity over time as it applies the throttle. In our problem this was sufficient.
    • Add acceleration as an observation.
    • Have the agent's actions control a state machine, e.g. action 0 toggles throttle on, action 1 turns it off, and feed in the state of the state machine as observations.
    I'd try this experiment with a simple reward (e.g. proportional to forward velocity) and see if the vehicle learns to move.

    As for the observations, I think yours should work, though it's generally better to make any direction vectors relative (e.g. from the perspective of the vehicle rather than the global space). This usually makes it easier for the agents to learn.

    For rewards, I'd try something simple, like a small reward for velocity towards the target (dot product between agent's velocity and direction towards target), and a large reward for being/staying at the control point.

    Hope that helps, if only a bit!
     
    Claytonious and AMM22 like this.
  3. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    To add on to @ervteng_unity's comment, the observations

    * The control point position
    * The agent's position
    * The agent's offset from the control point
    * The agent's distance from the control point

    may be adding unnecessary complexity/redundancy to the learning problem. You might be able to convey the same information with just a relative vector from the agent to control point, or a relative unit vector from the agent to the control point and a normalized distance scalar.
     
    Claytonious and AMM22 like this.
  4. Claytonious

    Claytonious

    Joined:
    Feb 16, 2009
    Posts:
    902
    Thank you both for the concrete guidance! I shall experiment with these ideas and report on the results. Have a great weekend!
     
    christophergoy likes this.