Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

General ML-Agents beginner questions

Discussion in 'ML-Agents' started by streeetwalker, May 29, 2020.

  1. streeetwalker


    Jun 4, 2013
    Hey All, I'm just getting started learning ML-Agents and have some questions about general concepts I don't understand yet, so if my language is not right please correct me!:

    1. if an agent is trained and you apply the resulting brain to multiple game objects, do the all behave in exactly the same way or is there some variability in their behavior? That is, does the brain result in completely deterministic behavior? (For example, where the environment is fixed, like when the agent drives around a race track.)

    2. How do you organize behaviors that need separate brains? From other answers I have read, I understand (correct me if I am wrong) that if, for example, I want a game NPC to be able to operate a vehicle which has a machine gun on it and a rocket launcher I will need to train it separately for all the vehicle features.

    The NPC will need to operate the vehicle (accelerate, turn to some target), operate a gun (reload, aim, fire, to some other target), and operate a rocket l launcher (reload, aim, fire, and so on). Therefore I will need to train the NPC by creating an agent script for each behavior resulting in multiple brains that the "in game" NPC will use.

    So, is that 3 agents scripts - one each for driving, operating a gun, and operating a rocket launcher - or more or less, or what? For example, the gun and rocket launcher can be done in one agent? Or, on the other extreme, do I need a separate agent for reloading, and one for aiming, and one for firing? How do you decide?

    What coordinates the multiple brains together - or they all operate independently? What if for example, the rocket launcher on the vehicle has the stipulation that the vehicle must stop to be able to operate - doesn't that require that the "driving brain" coordinate with the "rocket launcher brain" in some way?

    2.a (I think related) can you turn the brain off at any time if for example, you need to take over control of the NPC by other means - maybe even another brain -, and then simply turn it back on? For example, suppose I want my NPC to jump out of a vehicle to start running - I switch brains on and off?

    3. I need a pointer to how to do guided ("modeled" ?) training - where I help the learning by modeling - operating the NPC ML-Agent. Is that simply by setting it in Heuristic mode while doing the training, or what - I'm just looking for some resource(s) to get pointed in the right direction for this.

    4. Am I anywhere near thinking about these concepts correctly?

    Thanks for your answers and insights!
    Last edited: May 29, 2020
    null-fun likes this.
  2. mbaske


    Dec 31, 2017
    Let me take a crack at this...

    1) In theory, the behavior is completely deterministic. In practice though, slight variations in an agent's observations can cause different outcomes. You'll want to train an agent for being able to cope with different environment conditions. In the race track example, it can be beneficial to vary the track layout, in order to prevent overfitting. Overfitting in this case would mean the agent learns how to drive on one specific track. It might learn that a particular left turn always follows a particular right turn and develop the optimal driving strategy for that. Drop the trained agent on a different track, and it might not handle that too well. Varying the training conditions should result in more flexible agent skills. Also see

    2) I think it depends on how similar behaviors are, how closely they depend on one another and how often they have to be applied (update frequency). If firing a gun vs a rocket requires the same steps (load > aim > shoot), then both can probably be controlled by the same model. Which weapon the agent picks could depend on how far away the target is and how much ammo it has left, which would be part of the agent's observation space. Otherwise, you can train dedicated models separately and later, during inference, switch them via Agent.SetModel(). Now, you could train yet another agent for deciding which weapon/model to use, but it might be simpler to just have some hardcoded logic for that. Coordinating different agents could also be hardcoded - some controller script might call Agent.RequestDecision() depending on the game state. Perhaps your driver agent can decide to stop which would then enable the shooter agent. If there's a hierarchical dependency between agents, it can be interesting to feed one agent's output actions into another one's observations. For instance, there could be a target chooser agent generating a direction vector which is subsequently fed to the driver agent.

    3) Sounds like you're looking for behavioral cloning
    null-fun likes this.
  3. streeetwalker


    Jun 4, 2013
    Hey thanks a lot for your responses - that definitely helps. I'm going to have to probe further kind of piecemeal for all this to sink in, so here is my first thought.

    This is an informative issue:
    As I posted that set of questions, I was going through Adam Kelly's really great introductory tutorial Reinforcement Learning Penguins

    The tutorial trains the mother penguin to catch fish and replicates the training area several times (I understand) to speed up the learning process.

    I hooked up the finished brain, and the mother penguin did fine in that small swimming pool. But then I dropped the mother, baby, and a lot more fish into a much larger pool and it failed badly:

    - I had expected the mother to venture out further to catch fish, but it just swam around in a small area looking for fish - I guess it had learned that all the fish were only in a small area ...?

    So instead if just training in multiple copies of a small area, I should train in different sized and shaped areas - I can run the training at same time? Like this:

    and even add more variation than that?

    Otherwise, how else would I do it?