Search Unity

Training 2 different behavior at the same time ?

Discussion in 'ML-Agents' started by Julien_Desvergnes, May 4, 2020.

  1. Julien_Desvergnes

    Julien_Desvergnes

    Joined:
    Oct 30, 2019
    Posts:
    4
    Hi all,

    I'm currently working on my PhD thesis and want to use Unity ML agents to train 2 different agents at the same time. The first has to select a way to render an image, and the other should learn to navigate according to what he sees.

    I wanted to know if it's possible to train two agents with two different behaviors at the same time.

    Thanks for your answers!
     
  2. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @unity_ZfF0h4RpIrjFHA,
    You can train multiple different behaviors by clicking on the agent in the Hierarchy, looking in the insepctor, and settings the BehaviorName property in the BehaviorParameters component.

     

    Attached Files:

  3. Julien_Desvergnes

    Julien_Desvergnes

    Joined:
    Oct 30, 2019
    Posts:
    4
    Hi christophergoy,

    Thanks for your answer, my question was more about the possibility of training 2 different agents at the same time not in two consecutive runs, but actually I found out that it work pretty well!
    I have another question, could I inject human input while the training is running, such as press a key when a particular situation appears?
    In my work we are now considering human reinforcement learning (If you want to know a little more: https://www.ijcai.org/Proceedings/2019/0884.pdf ) and I need to inject some human input during the training.

    Thank you for your time!
     
  4. christophergoy

    christophergoy

    Unity Technologies

    Joined:
    Sep 16, 2015
    Posts:
    735
    Hi @unity_ZfF0h4RpIrjFHA,
    You could always add code to your game which takes user input. I'm just not sure how it will interfere with the rest of the RL loop. Would you make it a sparsely used observation? Would you override actions? Would you want to feed that user input back to the trainer? How would you do so?
     
  5. Julien_Desvergnes

    Julien_Desvergnes

    Joined:
    Oct 30, 2019
    Posts:
    4
    Hello christophergoy !

    Thanks for your answer, I want to use that feedback to modify the total reward before the reinforcement step. I think it correspond to your proposition : "feed that user input back to the trainer".
     
  6. takenbythedesert

    takenbythedesert

    Joined:
    Sep 25, 2020
    Posts:
    1
    @Julien_Desvergnes How did you make it train two different agents at the same time? I know I can get the code working for each, but how did you make two configs run with mlagents-learn simultaneously? Just running two configs from two console windows?
     
    betike likes this.
  7. Goik_

    Goik_

    Joined:
    Jul 19, 2021
    Posts:
    1
    Anyone?
     
  8. soundgarden134

    soundgarden134

    Joined:
    Dec 27, 2019
    Posts:
    6
    Configure 2 agents in the same config file like this:

    Code (CSharp):
    1. behaviors:
    2.   LionAI:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 256
    6.       buffer_size: 4096
    7.       learning_rate: 5.0e-4
    8.       beta: 2.0e-4
    9.       epsilon: 0.2
    10.       lambd: 0.95
    11.       num_epoch: 5
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: false
    15.       hidden_units: 128
    16.       num_layers: 2
    17.     reward_signals:
    18.       extrinsic:
    19.         gamma: 0.99
    20.         strength: 1.0
    21.     max_steps: 8000000
    22.     time_horizon: 64
    23.     summary_freq: 40000
    24.    PenguinAI:
    25.     trainer_type: ppo
    26.     hyperparameters:
    27.       batch_size: 256
    28.       buffer_size: 4096
    29.       learning_rate: 5.0e-4
    30.       beta: 2.0e-4
    31.       epsilon: 0.2
    32.       lambd: 0.95
    33.       num_epoch: 5
    34.       learning_rate_schedule: linear
    35.     network_settings:
    36.       normalize: false
    37.       hidden_units: 128
    38.       num_layers: 2
    39.     reward_signals:
    40.       extrinsic:
    41.         gamma: 0.99
    42.         strength: 1.0
    43.     max_steps: 8000000
    44.     time_horizon: 64
    45.     summary_freq: 40000
     
    heartingNinja, Spartaqs and Haneferd like this.
  9. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    @Julien_Desvergnes
    Feeding human input is what Gail is for. Take a look at the GAIL reward config options.
    Essentially you pre-record the human input. Then configure Gail to feed this at a configurable strength into the training. You can either have it use mimicking or generative adversarial models. Its quite powerful.
    Take a look at the implementation details of GAIL in Unity ML
     
  10. Vince-Chimp

    Vince-Chimp

    Joined:
    May 28, 2018
    Posts:
    43
    Relevant to my interests. What about 2 different behaviors?

    Say i have 1 agent. Let's say he learns walking, and standing up from a fall. But doing both in 1 NN is just too complex to properly make work. So we make it 2 NN's, one for walking, and one for getting up.

    If code detects a fall, i would like it to switch over to the 2nd behavior, and if that fails, i reset back to the start, but if it succeeds within parameters, it will switch back to walking behavior.

    I imagine swapping out the NN file at runtime. But i doubt it will be that easy since the training CMD will time out on either of the 2 behaviours if they don't see any feedback from unity for a while, correct?

    How would one go about this type of thing?
     
  11. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Yes it will timeout during training.

    However if you really want to have two agents one for walking and one for getting up. Just train them separately.
    For the getting up agent. Just have something push the agent over in different ways each time it stands up. Reward it for standing up successfully and then start the next episode by pushing it over. Just don't push it over in exactly the same way each time cause otherwise it will only learn one way of getting up.

    For the walking agent reward it for not falling or penalize it for falling or make falling equal to a lose (-1 reward and end episode.

    Having said what the agent needs to learn in terms of for example maintaining balance is common for both walking and getting up. So I'm not sure that splitting this into two agents is a good idea. There is shared / common learning in both activities.
     
  12. Vince-Chimp

    Vince-Chimp

    Joined:
    May 28, 2018
    Posts:
    43
    There definitely is, but i am reasoning about what to do when tasks get too complex for one agent to perform. At some point it becomes so hard for the agent to distinguish noise from signal, or to reward them properly at the right points, that it becomes hard to get it to learn without near infinite time and steps needed to do so.
     
  13. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Actually now that I think about it. As long as the unity instance is running the academy will be stepping. As long as the academy is stepping then ML Agents Python side will NOT time out.
    So you can freely switch between training one agent vs training the other.
     
  14. Vince-Chimp

    Vince-Chimp

    Joined:
    May 28, 2018
    Posts:
    43
    It would result in empty steps though wouldn't it? They would both be "continous" models. If nothing is gathered during a step, it will surely fall apart? I should jus try this i suppose.
     
  15. ChillX

    ChillX

    Joined:
    Jun 16, 2016
    Posts:
    145
    Well a step for an agent brain happens when a decision is requested by an agent of that type. A step does not happen just because the academy steps or an agent of a different type performs a decision request.

    You would have to manually trigger the decision requests since you are effectively alternating between agents. The auto decision requester component cannot be used here.