Search Unity

Multiple brains support

Discussion in 'ML-Agents' started by donamin1, Apr 23, 2020.

  1. donamin1

    donamin1

    Joined:
    Apr 4, 2013
    Posts:
    26
    Hi

    I'm trying to train a hierarchical policy structure, where there is a high-level policy that uses discrete actions to choose one of the low-level policies at each time step. Each low-level policy will then sample a continuous action to control the agent. I have two agents (on teams 0 and 1) that use the same structure for the policies. I've implemented everything and it is ready to train. The only thing is that when I start training, the ML-Agents API only detects the high-level policy and it does not train the low-level policies at all. Does someone know how to solve this?

    Thanks
    Amin
     
  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,
    I do not have enough information to reproduce what you are seeing. If I understand correctly, you have a total of four agents in your scene two agents (one on each team) that pick the policy for the other two (also one on each team). One agent can only have one policy at a time, I suppose you are using Agent.SetModel to change the low-level policy. It is possible that the low level policy changes too often and the trainers are not capable of collecting long enough episodes to train. If this is the case, you should still see a message saying "No episodes were completed". If you do not see the message at all this probably means that the low-level agent has not requested any decision during the simulation. Make sure you are either calling Agent.RequestDecision or have a DecisionRequester attached to the low-level Agent.
     
  3. donamin1

    donamin1

    Joined:
    Apr 4, 2013
    Posts:
    26
    Hi

    No I'm not using Agent.SetModel. I wanted to train all of the policies in parallel because I have no pre-trained models at hand and it's difficult to train policies separately.
    I guess this case is not supported at ML-Agents. I ended up changing my policy structure to a plain one where there is only one policy at each time and makes all the decisions in one pass.

    Thanks anyway