Training 2 different behavior at the same time ?

Julien_Desvergnes · May 4, 2020

Hi all,

I'm currently working on my PhD thesis and want to use Unity ML agents to train 2 different agents at the same time. The first has to select a way to render an image, and the other should learn to navigate according to what he sees.

I wanted to know if it's possible to train two agents with two different behaviors at the same time.

Thanks for your answers!

christophergoy · May 4, 2020

Hi @unity_ZfF0h4RpIrjFHA,
You can train multiple different behaviors by clicking on the agent in the Hierarchy, looking in the insepctor, and settings the BehaviorName property in the BehaviorParameters component.

Julien_Desvergnes · May 5, 2020

Hi christophergoy,

Thanks for your answer, my question was more about the possibility of training 2 different agents at the same time not in two consecutive runs, but actually I found out that it work pretty well!
I have another question, could I inject human input while the training is running, such as press a key when a particular situation appears?
In my work we are now considering human reinforcement learning (If you want to know a little more: https://www.ijcai.org/Proceedings/2019/0884.pdf ) and I need to inject some human input during the training.

Thank you for your time!

christophergoy · May 5, 2020

Hi @unity_ZfF0h4RpIrjFHA,
You could always add code to your game which takes user input. I'm just not sure how it will interfere with the rest of the RL loop. Would you make it a sparsely used observation? Would you override actions? Would you want to feed that user input back to the trainer? How would you do so?

Julien_Desvergnes · May 11, 2020

Hello christophergoy !

Thanks for your answer, I want to use that feedback to modify the total reward before the reinforcement step. I think it correspond to your proposition : "feed that user input back to the trainer".

takenbythedesert · Nov 5, 2020

@Julien_Desvergnes How did you make it train two different agents at the same time? I know I can get the code working for each, but how did you make two configs run with mlagents-learn simultaneously? Just running two configs from two console windows?

Goik_ · Dec 2, 2021

Anyone?

soundgarden134 · Jan 10, 2022

takenbythedesert said: ↑

@Julien_Desvergnes How did you make it train two different agents at the same time? I know I can get the code working for each, but how did you make two configs run with mlagents-learn simultaneously? Just running two configs from two console windows?
Click to expand...

Configure 2 agents in the same config file like this:

Code (CSharp):

behaviors:

LionAI:

trainer_type: ppo

hyperparameters:

batch_size: 256

buffer_size: 4096

learning_rate: 5.0e-4

beta: 2.0e-4

epsilon: 0.2

lambd: 0.95

num_epoch: 5

learning_rate_schedule: linear

network_settings:

normalize: false

hidden_units: 128

num_layers: 2

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

max_steps: 8000000

time_horizon: 64

summary_freq: 40000

PenguinAI:

trainer_type: ppo

hyperparameters:

batch_size: 256

buffer_size: 4096

learning_rate: 5.0e-4

beta: 2.0e-4

epsilon: 0.2

lambd: 0.95

num_epoch: 5

learning_rate_schedule: linear

network_settings:

normalize: false

hidden_units: 128

num_layers: 2

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

max_steps: 8000000

time_horizon: 64

summary_freq: 40000

ChillX · Jan 16, 2022

@Julien_Desvergnes
Feeding human input is what Gail is for. Take a look at the GAIL reward config options.
Essentially you pre-record the human input. Then configure Gail to feed this at a configurable strength into the training. You can either have it use mimicking or generative adversarial models. Its quite powerful.
Take a look at the implementation details of GAIL in Unity ML

Vince-Chimp · Jan 28, 2022

Relevant to my interests. What about 2 different behaviors?

Say i have 1 agent. Let's say he learns walking, and standing up from a fall. But doing both in 1 NN is just too complex to properly make work. So we make it 2 NN's, one for walking, and one for getting up.

If code detects a fall, i would like it to switch over to the 2nd behavior, and if that fails, i reset back to the start, but if it succeeds within parameters, it will switch back to walking behavior.

I imagine swapping out the NN file at runtime. But i doubt it will be that easy since the training CMD will time out on either of the 2 behaviours if they don't see any feedback from unity for a while, correct?

How would one go about this type of thing?

ChillX · Jan 29, 2022

Yes it will timeout during training.

However if you really want to have two agents one for walking and one for getting up. Just train them separately.
For the getting up agent. Just have something push the agent over in different ways each time it stands up. Reward it for standing up successfully and then start the next episode by pushing it over. Just don't push it over in exactly the same way each time cause otherwise it will only learn one way of getting up.

For the walking agent reward it for not falling or penalize it for falling or make falling equal to a lose (-1 reward and end episode.

Having said what the agent needs to learn in terms of for example maintaining balance is common for both walking and getting up. So I'm not sure that splitting this into two agents is a good idea. There is shared / common learning in both activities.

Vince-Chimp · Feb 2, 2022

There definitely is, but i am reasoning about what to do when tasks get too complex for one agent to perform. At some point it becomes so hard for the agent to distinguish noise from signal, or to reward them properly at the right points, that it becomes hard to get it to learn without near infinite time and steps needed to do so.

ChillX · Feb 2, 2022

Actually now that I think about it. As long as the unity instance is running the academy will be stepping. As long as the academy is stepping then ML Agents Python side will NOT time out.
So you can freely switch between training one agent vs training the other.

Vince-Chimp · Feb 4, 2022

It would result in empty steps though wouldn't it? They would both be "continous" models. If nothing is gathered during a step, it will surely fall apart? I should jus try this i suppose.

ChillX · Feb 4, 2022

Well a step for an agent brain happens when a decision is requested by an agent of that type. A step does not happen just because the academy steps or an agent of a different type performs a decision request.

You would have to manually trigger the decision requests since you are effectively alternating between agents. The auto decision requester component cannot be used here.

Search Unity

Training 2 different behavior at the same time ?

Julien_Desvergnes

christophergoy

Unity Technologies

Attached Files:

Screen Shot 2020-05-04 at 3.00.21 PM.png

Julien_Desvergnes

christophergoy

Unity Technologies

Julien_Desvergnes

takenbythedesert

Goik_

soundgarden134

ChillX

Vince-Chimp

ChillX

Vince-Chimp

ChillX

Vince-Chimp

ChillX

Search Unity

Unity ID

Useful Searches

Training 2 different behavior at the same time ?

Unity Technologies

Attached Files:

Unity Technologies