Search Unity

Gym Unity - Baselines

Discussion in 'ML-Agents' started by ademord, May 27, 2021.

  1. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    Hello guys,

    I finished my environment in unity and now I am trying to "export it to gym" to try different algorithms (i will do my own implementations afterwards). I am trying Baselines now and I exported the environment as:

    env = UnityToGymWrapper(unity_env, uint8_visual=True, flatten_branched=True, allow_multiple_obs=True)


    And now, from this line:

    model = PPO(MlpPolicy, env, verbose=0)


    I am getting the error:

    NotImplementedError: Tuple(Box(-inf, inf, (91,), float32)) observation space is not supported


    What could I do? I am a bit lost.
     
  2. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    PPO baselines does not support observations of type Tuple(Box(-inf, inf, (91,), float32)) (which I think corresponds to flat vector observations of 91 floats). If you want to use baselines, you need to create an environment with observations and actions that baselines can work with.
     
  3. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    I am using raycasts and one boolean (so yes vector obs as you mention). How can i know what kind of observations does baselines work with? do i check the algorithm - which obs input it supports or do i try to change my obs? i just need some direction.
     
  4. vincentpierre

    vincentpierre

    Joined:
    May 5, 2017
    Posts:
    160
    I have not worked with PPO baselines in a while, I think you will have better luck looking at their documentation or issues page. If my memory is correct, it should work on single visual observations but I really am not sure.
     
  5. simmax21

    simmax21

    Joined:
    Sep 13, 2021
    Posts:
    2
    Hi, i have similar problem with stable_baselines3. How did you solved it?
     
  6. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    @simmax21 I had to make a custom environment with the help of @aakarshanc01
    If you can further improve on this code would also be amazing for me and other people that come after us:
    Code (python):
    1.  
    2.  
    3. def get_wandb_ue_env():
    4.     # engine config
    5.     engine_channel = EngineConfigurationChannel()
    6.     engine_channel.set_configuration_parameters(time_scale=config.time_scale)
    7.     # side channels
    8.     channel = SB3StatsRecorder()
    9.     # environment
    10.     env = UE(config.env_path,
    11.              seed=1,
    12.              worker_id=rank,
    13.              base_port=5000 + rank,
    14.              no_graphics=config.no_graphics,
    15.              side_channels=[engine_channel, channel])
    16.  
    17.     return env
    18.  
    19.  
    20. class CustomEnv(gym.Env):
    21.     def __init__(self):
    22.         super(CustomEnv, self).__init__()
    23.  
    24.         env = get_wandb_ue_env()
    25.         env = UnityToGymWrapper(env, allow_multiple_obs=True)
    26.  
    27.         self.env = env
    28.         self.action_space = self.env.action_space
    29.         self.action_size = self.env.action_size
    30.         self.observation_space = gym.spaces.Dict({
    31.             0: gym.spaces.Box(low=0, high=1, shape=(27, 60, 3)),  # =(40, 90, 3)),
    32.             1: gym.spaces.Box(low=0, high=1, shape=(20, 40, 1)),  # (56, 121, 1
    33.             2: gym.spaces.Box(low='-inf', high='inf', shape=(400,))
    34.         })
    35.  
    36.     @staticmethod
    37.     def tuple_to_dict(s):
    38.         obs = {
    39.             0: s[0],
    40.             1: s[1],
    41.             2: s[2]
    42.         }
    43.         return obs
    44.  
    45.     def reset(self):
    46.         #         print("LOG: returning reset" + self.tuple_to_dict(self.env.reset()))
    47.         #         print("LOG: returning reset" + (self.env.reset()))
    48.         #          np.array(self._observation)
    49.         return self.tuple_to_dict(self.env.reset())
    50.  
    51.     def step(self, action):
    52.         s, r, d, info = self.env.step(action)
    53.         return self.tuple_to_dict(s), float(r), d, info
    54.  
    55.     def close(self):
    56.         self.env.close()
    57.         global rank
    58.         rank -= 1
    59.  
    60.     def render(self, mode="human"):
    61.         self.env.render()
    62.  
    63. class SB3StatsRecorder(SideChannel):
    64.     """
    65.     Side channel that receives (string, float) pairs from the environment, so that they can eventually
    66.     be passed to a StatsReporter.
    67.     """
    68.  
    69.     def __init__(self) -> None:
    70.         # >>> uuid.uuid5(uuid.NAMESPACE_URL, "com.unity.ml-agents/StatsSideChannel")
    71.         # UUID('a1d8f7b7-cec8-50f9-b78b-d3e165a78520')
    72.         super().__init__(uuid.UUID("a1d8f7b7-cec8-50f9-b78b-d3e165a78520"))
    73.         pretty_print("Initializing SB3StatsRecorder", Colors.FAIL)
    74.         self.stats: EnvironmentStats = defaultdict(list)
    75.         self.i = 0
    76.         self.wandb_tables: dict = {}
    77.  
    78.     def on_message_received(self, msg: IncomingMessage) -> None:
    79.         """
    80.         Receive the message from the environment, and save it for later retrieval.
    81.  
    82.         :param msg:
    83.         :return:
    84.         """
    85.         key = msg.read_string()
    86.         val = msg.read_float32()
    87.         agg_type = StatsAggregationMethod(msg.read_int32())
    88.  
    89.         self.stats[key].append((val, agg_type))
    90.  
    91.         # assign different Drone[id] to each subprocess within this wandb run
    92.         key = key.split("/")[1]
    93.         self.i += 1
    94.  
    95.         if env_callback is not None and wandb_run_identifier == "test":  # and "Speed" in "val"
    96.             # if self.i % 100 == 0:
    97.  
    98.             my_table_id: str = "Performance[{}]".format(wandb_run_identifier)
    99.  
    100.             # pretty_print("Publishing Table: key: {}, val: {}".format(my_table_id, key, val), Colors.FAIL)
    101.  
    102.             env_callback(my_table_id, key, val)
    103.                
    104.     def get_and_reset_stats(self) -> EnvironmentStats:
    105.         """
    106.         Returns the current stats, and resets the internal storage of the stats.
    107.  
    108.         :return:
    109.         """
    110.         s = self.stats
    111.         self.stats = defaultdict(list)
    112.         return s
    113.  
     
    aakarshanc01 likes this.
  7. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
    i then register this environment through the gym registration method and call it everywhere else as gym.make("my_id"). Since the environment pulls from the config file it can always adapt to different builds and dont need any more code to register "new" builds.

    also something to take into account is the SubProcVecEnv is a bit unstable at least for me, you pass no context from any previous variables into the subprocesses so the training has to be fully separated and then brought back, you might choose a different strategy for that. i decided to reduce myself to 1 trainer instead of a vectorized env for now and just train for ~20 hours.
     
  8. ademord

    ademord

    Joined:
    Mar 22, 2021
    Posts:
    49
  9. TiranianHoward

    TiranianHoward

    Joined:
    Sep 21, 2021
    Posts:
    4
    @ademord thank you for the custom env
    just a little change. Changing the dict key from int to str make it works.

    Edit:
    This was the error
    raise TypeError("module name should be a string. Got {}".format(
    TypeError: module name should be a string. Got int
     
  10. TiranianHoward

    TiranianHoward

    Joined:
    Sep 21, 2021
    Posts:
    4
    @ademord sorry, but do you know how to pass Academy.Instance.StatsRecorder data to python?

    I want to log all the data that is recorded by Academy.Instance.StatsRecorder from my unity-converted-to-gym env