Search Unity

BoardGame with two different agents

Discussion in 'ML-Agents' started by ki_ha1984, Apr 20, 2020.

  1. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    Hi I am new in ml-agents and I have developed a board game with two opponents.

    Two agents with different behavior names (BlueAgent and RedAgent) and Team ID (O,1)
    I run the game with the following command (through the unity editor):

    mlagents-learn notebooks/trainer_config.yaml --run-id=myGame


    I use as Vector Observation Space Size:100, and Stacked Vector:1 (A 2D array transform to C# List). Vector Action Type Discrete with two branches.

    The problem is that when I run the training I noticed that only the one agent is training and the second returns always 0 in both the branches.
    Also, I noticed that in the command line window the ml-agents system recognized only the one agent (BlueAgent).

    Where is the problem ??

    My goal is after the training two have two different NN each one for each agent.


    2020-04-20 21:27:39 INFO [trainer_controller.py:167] Hyperparameters for the GhostTrainer of brain BlueAgent:

    trainer: ppo
    batch_size: 2048
    beta: 0.001
    buffer_size: 20480
    epsilon: 0.2
    hidden_units: 128
    lambd: 0.99
    learning_rate: 0.0003
    learning_rate_schedule: constant
    max_steps: 6.0e4
    memory_size: 128
    normalize: True
    num_epoch: 2
    num_layers: 2
    time_horizon: 1000
    sequence_length: 64
    summary_freq: 1000
    use_recurrent: False
    vis_encode_type: simple
    reward_signals:
    extrinsic:
    strength: 1.0
    gamma: 0.99
    summary_path: RLGame_BlueAgent
    model_path: ./models/RLGame/BlueAgent
    keep_checkpoints: 5
    self_play:
    window: 10
    play_against_latest_model_ratio: 0.5
    save_steps: 1000
    swap_steps: 25000
    team_change: 200000


    In my rainer_config.yaml file I have the following:
    Code (CSharp):
    1. default:
    2.     trainer: ppo
    3.     batch_size: 1024
    4.     beta: 5.0e-3
    5.     buffer_size: 10240
    6.     epsilon: 0.2
    7.     hidden_units: 128
    8.     lambd: 0.95
    9.     learning_rate: 3.0e-4
    10.     learning_rate_schedule: linear
    11.     max_steps: 5.0e5
    12.     memory_size: 128
    13.     normalize: false
    14.     num_epoch: 3
    15.     num_layers: 2
    16.     time_horizon: 64
    17.     sequence_length: 64
    18.     summary_freq: 10000
    19.     use_recurrent: false
    20.     vis_encode_type: simple
    21.     reward_signals:
    22.         extrinsic:
    23.             strength: 1.0
    24.             gamma: 0.99
    25.  
    26. BlueAgent:
    27.     max_steps: 6.0e4
    28.     learning_rate_schedule: constant
    29.     normalize: true
    30.     batch_size: 2048
    31.     buffer_size: 20480
    32.     hidden_units: 128
    33.     num_epoch: 2
    34.     summary_freq: 1000
    35.     time_horizon: 1000
    36.     lambd: 0.99
    37.     beta: 0.001
    38.     self_play:
    39.         window: 10
    40.         play_against_latest_model_ratio: 0.5
    41.         save_steps: 1000
    42.         swap_steps: 25000
    43.         team_change: 200000
    44.  
    45. RedAgent:
    46.     max_steps: 6.0e4
    47.     learning_rate_schedule: constant
    48.     normalize: true
    49.     batch_size: 2048
    50.     buffer_size: 20480
    51.     hidden_units: 128
    52.     num_epoch: 2
    53.     summary_freq: 1000
    54.     time_horizon: 1000
    55.     lambd: 0.99
    56.     beta: 0.001
    57.     self_play:
    58.         play_against_latest_model_ratio: 0.5
    59.         save_steps: 1000
    60.         swap_steps: 25000
    61.         team_change: 200000
     
  2. TreyK-47

    TreyK-47

    Unity Technologies

    Joined:
    Oct 22, 2019
    Posts:
    1,822
    I'll kick this over to the team to have a look. Which version of Python and C# are you using?
     
  3. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    Hi, after I re installed the mlagents it worked finally.
     
  4. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    Now another question.
    In by board game I use a 2D array presenting the position of the agents on it. I use 0 for empty cells, 1 for the first agent and -1 for the second agent. As I noticed the observation vector receives only Vectors or float Lists, for this reason I used following List which represent a 2D array.
    Code (CSharp):
    1. public List<List <float>> stateSpece = new List<List<float>>();
    I update the list like this: (for example I add -1 for a move of the second agent when is in the position z,x)
    Code (CSharp):
    1.    stateSpece[z][x] = -1;
    In the CollectObservations function I add to every sensor a subList of the stateSpace as follow:
    Code (CSharp):
    1.     public override void CollectObservations(VectorSensor sensor)
    2.     {
    3.              foreach (var item in stateSpece)
    4.         {
    5.             sensor.AddObservation(item);
    6.         }
    7.  
    8.     }
    For example if I have 5x5 board size I will have 5 lists of float list in the stateSpace.
    In each agent's Behavior Parameters, I set the (Vector Observation) Space Size = 5 and the Stacked Vector to 1.

    I want my agent's to observe the 2D array cell by cell (in our case List of List).

    Is this logic correct?
    Also, how I can visualize the observation of my Agent?
    Also, does this means that my NN in PPO algorithm will have 25 input?

    Thank you in advance
     
  5. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    Also, I tried to make use of the Gym wrapper, with the following code:

    Code (CSharp):
    1. multi_env_name = "myGame.app"
    2. multi_env = UnityEnv(multi_env_name, worker_id=1, use_visual=False, multiagent=True)
    3. # Examine environment parameters
    4. print(str(multi_env))
    and I got a strange output, while warnings show that there are two agents (AgentRed?team=1 and AgentBlue?team=2), the python code crashes by saying "UnityGymException: The environment was launched as a mutli-agent environment, however there is only one agent in the scene."

    Code (CSharp):
    1. 2020-04-24 18:29:02 INFO [environment.py:160] Connected to Unity environment with package version 0.15.1-preview and communication version 0.15.0
    2. 2020-04-24 18:29:02 INFO [environment.py:305] Connected new brain:
    3. AgentRed?team=1
    4. 2020-04-24 18:29:02 INFO [environment.py:305] Connected new brain:
    5. AgentBlue?team=2
    6. ---------------------------------------------------------------------------
    7. UnityGymException                         Traceback (most recent call last)
    8. <ipython-input-2-bd584fbe2fe1> in <module>
    9.       1 multi_env_name = "myGame.app"
    10.       2
    11. ----> 3 multi_env = UnityEnv(multi_env_name, worker_id=1, use_visual=False, multiagent=True)
    12.       4
    13.       5 # Examine environment parameters
    14.  
    15. ~/Programming/MLAgentUnityEnv/lib/python3.7/site-packages/gym_unity/envs/__init__.py in __init__(self, environment_filename, worker_id, use_visual, uint8_visual, multiagent, flatten_branched, no_graphics, allow_multiple_visual_obs)
    16.     121         self._env.reset()
    17.     122         step_result = self._env.get_step_result(self.brain_name)
    18. --> 123         self._check_agents(step_result.n_agents())
    19.     124         self._previous_step_result = step_result
    20.     125         self.agent_mapper.set_initial_agents(list(self._previous_step_result.agent_id))
    21.  
    22. ~/Programming/MLAgentUnityEnv/lib/python3.7/site-packages/gym_unity/envs/__init__.py in _check_agents(self, n_agents)
    23.     350         elif self._multiagent and n_agents <= 1:
    24.     351             raise UnityGymException(
    25. --> 352                 "The environment was launched as a mutli-agent environment, however "
    26.     353                 "there is only one agent in the scene."
    27.     354             )
    28.  
    29. UnityGymException: The environment was launched as a mutli-agent environment, however there is only one agent in the scene.
    30.  
    Any idea ??
     
  6. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Regarding the first question, I think giving the 25 inputs is fine. in a 5x5 grid, using a convolution is probably overkill.
    Regarding Gym, the multiagent gym will be deprecated in future releases as it only applies to a very small subset of environments where all the agent in the scene must have the same behavior AND request decisions at the same time (which is not the case in your game). For multi-agent environments, we recommend using the UnityEnvironment directly.
     
  7. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    vincentpierre thank you very much for your answers.

    First the 5x5 was just an example in real applications I use board sizes always bigger than 20x20, this is why I use CNN. On the other hand, in my opinion the depreciation of gym wrapper is not good idea because of the huge variety of algorithms provided for gym-like environment.

    Last, do you know if is there any guide for connecting external custom python algorithms for multi agent games like mine and that will not be deprecated?

    I wan to use other algorithms beside the default PPO and SAC.
     
  8. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    We will continue to provide a gym wrapper for single agent environments, but the multi-agent gym we used does not support agents requesting decisions at different frequencies. In my opinion, the gym abstraction is not well suited for multi-agent environment, there exist multiple attempts at adapting gym for multi-agent scenarios but none are easy to make compatible with our framework.

    I do not know of any external Python algorithms that are compatible with our API, although it is possible to write a wrapper for you specific scenario and use any multi-agent algorithm available. (We will not deprecate our UnityEnvironment API but it might change in the future)
     
  9. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    Do you plan in next releases to update the gym wrapper to support asymmetric games with different behaviors parameters agents ?
     
  10. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    Hi,

    Gym works best for single agent environments. We have not found a version of gym that supports asymmetric games that we could use. The gym API was built for single agents, for anything more complex than this, there exists multiple different variations of gym, each good for a specific use case, but not for others. For this reason, we made our own API that works for the types of environments that can be built with ml-agents..
     
  11. ki_ha1984

    ki_ha1984

    Joined:
    Aug 24, 2014
    Posts:
    111
    vincentprierre thank you very much for your answers.
    Another question.
    When I export my game and try it with the python code, recognizes only the one agent. Do you have any idea ?

    Code (CSharp):
    1.  
    2. import matplotlib.pyplot as plt
    3. import numpy as np
    4. import sys
    5.  
    6. from mlagents_envs.environment import UnityEnvironment
    7. from mlagents_envs.side_channel.engine_configuration_channel import EngineConfig, EngineConfigurationChannel
    8.  
    9. if (sys.version_info[0] < 3):
    10.       raise Exception("ERROR: ML-Agents Toolkit (v0.3 onwards) requires Python 3")
    11.  
    12. env_name = "myGame"
    13. engine_configuration_channel = EngineConfigurationChannel()
    14. env = UnityEnvironment(file_name=env_name, side_channels = [engine_configuration_channel])
    15.  
    16. env.reset()
    17.  
    18. group_name = env.get_agent_groups()
    19. print(group_name)
    20.  
    output

    ['RedAgent?team=1']
     
  12. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    This means that during the first step, you have the group 'RedAgent?team=1' available (the behavior name RedAgent with team 1) This means that only Agents in team 1 with behavior RedAgent have requested a decision since reset.
    You need to step the environment a bit to see the other groups appear, they will appear as agents in the group request decisions.