Search Unity

Resolved How to train multiple agents with same behavior, different RL algos?

Discussion in 'ML-Agents' started by justkittenaround, Mar 9, 2021.

  1. justkittenaround

    justkittenaround

    Joined:
    Sep 28, 2020
    Posts:
    21
    Hi. I have many agents in my model that all have the same behavior. I want them to be trained using separate RL controllers.

    Currently, I have the appropriate scripts for training the agent with PPO attached to a gameObject prefab. At the start, I instantiate the prefabs, for however many agents I want, in the environment. I can get this to train but the behavior of each agent is pretty bad. They're supposed to move around but they kind of just jiggle and reward is extremely varied.

    I am wondering if training the agents this way is using one RL algorithm for all the agents collectively. Is this true? If it is, how can I train each agent separately even though they all have the same behavior and behavior name??

    Each agent should have its own PPO model controlling it. How do I do this?
     
  2. mbaske

    mbaske

    Joined:
    Dec 31, 2017
    Posts:
    473
    Training with multiple agent instances sharing the same behaviour should reduce training time. The more agents there are, the more experiences the RL algorithm can collect per update step. This also helps with generalization: collectively, a bunch of agents will likely encounter rare states more frequently than a single one.

    If you really want to train agents individually, you would have to assign them distinct behaviours. This doesn't mean they would learn different skills though, just that you would set distinct behaviour names and duplicate the corresponding code in your config yaml file. However, this doesn't make a lot of sense, as it runs counter to the advantages you get from multiple agents learning the same behaviour in the first place.

    And I doubt that would solve your issue really. What you're describing sounds more like there's some problem with your agent design. Might be the observations, might be the rewards. Maybe you could post your agent code? Sometimes, simply choosing a larger decision interval can reduce jiggly actions enough for better learning.
     
  3. justkittenaround

    justkittenaround

    Joined:
    Sep 28, 2020
    Posts:
    21
    Ah I see. This was very helpful! Thank you!