Search Unity

Question Ml agents gym Wrapper

Discussion in 'ML-Agents' started by seifmostafa7347, Jan 31, 2023.

  1. seifmostafa7347

    seifmostafa7347

    Joined:
    Nov 2, 2021
    Posts:
    22
    I've been searching for a way to build my own unity environment as a gym environment to run my python code on cloud notebook(Kaggle/collab)
    I've found several links referring to unity gym wrappers, however, all the links are broken (even some recent ones from 2 months ago), did they drop the feature or change the documentation links?
     
  2. hughperkins

    hughperkins

    Joined:
    Dec 3, 2022
    Posts:
    191
    Last edited: Feb 2, 2023
    seifmostafa7347 likes this.
  3. seifmostafa7347

    seifmostafa7347

    Joined:
    Nov 2, 2021
    Posts:
    22
    This is precisely what I was looking for, thanks! you have a new subscriber :D
     
    hughperkins likes this.
  4. chiaradivece

    chiaradivece

    Joined:
    Nov 17, 2020
    Posts:
    2
    Hello, following up on this - I downloaded the ml-agents release 21 and the latest stable-baselines3 available (using gymnasium). However, this is causing an issue with the UnityToGymWrapper as sb3 is expecting gymnasium.spaces.box.Box while the Wrapper provides gym.spaces.box.Box. I tried the following:

    import gymnasium as gym

    However, it's not doing the trick. I had to downgrade sb3 to 1.8.0, which is the latest version supporting gym, but I'd like to transition to newer versions since there's no longer support for gym.

    This is the code that I'm using to train the agent:

    Code (CSharp):
    1. import gym
    2. from mlagents_envs.environment import UnityEnvironment
    3. from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
    4. from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
    5. import os
    6. import warnings
    7. warnings.filterwarnings('ignore')
    8. import sys
    9. import numpy as np
    10. import time
    11. import argparse
    12. from stable_baselines3.common.monitor import Monitor
    13. from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
    14. from stable_baselines3.common.evaluation import evaluate_policy
    15. from stable_baselines3.common.env_util import make_vec_env  # creation of parallel environments
    16. from stable_baselines3.common.logger import configure
    17. from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
    18. from stable_baselines3.common.callbacks import BaseCallback
    19.  
    20. from utils import ALGOS, SaveOnBestTrainingRewardCallback, linear_schedule
    21.  
    22. import wandb
    23.  
    24. np.random.seed(2)
    25.  
    26. def main(args):
    27.     """
    28.        :param args: (ArgumentParser) the input arguments
    29.    """
    30.  
    31.     algo = args.algo
    32.     model_class = ALGOS[algo]
    33.  
    34.     save_dir = os.path.join(os.path.dirname(__file__), '../results')
    35.     res_dir = os.path.join(save_dir, args.res_dir)
    36.     model_dir = os.path.join(res_dir, args.model_dir)
    37.     logs_dir = os.path.join(save_dir, args.tensorboard_log)
    38.     logger_dir = os.path.join(logs_dir, args.model_dir)
    39.  
    40.     if args.pretrained == 'True':
    41.         model_dir_pretrain = os.path.join(res_dir, args.model_dir_pretrain)
    42.  
    43.     os.makedirs(res_dir, exist_ok=True)
    44.     os.makedirs(logs_dir, exist_ok=True)
    45.     os.makedirs(logger_dir, exist_ok=True)
    46.  
    47.     channel = EngineConfigurationChannel()
    48.     env = UnityEnvironment(None, side_channels=[channel])
    49.     # env = UnityEnvironment('built_scenes/UnityVolumeRendering', side_channels=[channel], base_port=5004)
    50.     channel.set_configuration_parameters(time_scale=4)
    51.     env = UnityToGymWrapper(env, uint8_visual=False, flatten_branched=False, allow_multiple_obs=False)
    52.     env.reset()
    53.     env = Monitor(env, logger_dir, allow_early_resets=True)
    54.     # env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)
    55.     env = DummyVecEnv([lambda: env])
    56.  
    57.     logger = configure(logger_dir, ["stdout", "csv", "log", "tensorboard"])
    58.  
    59.     wandb.init(
    60.         # set the wandb project where this run will be logged
    61.         project="AgentTransl",
    62.         name=args.model_dir,
    63.     )
    64.  
    65.     # Setting the policy to “MlpPolicy” means that we are giving a state vector as input to our model.
    66.     # There are only other two options:
    67.     # - CnnPolicy, of you provide images as input;
    68.     # - MultiInputPolicy, for handling multiple inputs
    69.     if algo == 'ppo':
    70.  
    71.         print(f'RL Algorithm: {model_class}')
    72.      
    73.         if args.pretrained == 'False':
    74.             model = model_class("MlpPolicy", env, verbose=1)
    75.             model.set_logger(logger)
    76.             print('training')
    77.             callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=logger_dir)
    78.             model.learn(total_timesteps=args.n_train_timesteps, callback=callback)
    79.             model.save(model_dir)
    80.             print('model saved')
    81.             del model
    82.             print('model deleted')
    83.         else:
    84.             model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)
    85.             model.set_logger(logger)
    86.             print('fine tuning model')
    87.             model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir)
    88.             model.save(model_dir)
    89.             print('new model saved')
    90.             del model
    91.             print('model deleted')
    92.     elif algo == 'td3':
    93.         print(f'RL Algorithm: {model_class}')
    94.         n_actions = env.action_space.shape[-1]
    95.         action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
    96.      
    97.         if args.pretrained == 'False':
    98.             model = model_class("MlpPolicy", env, action_noise=action_noise, verbose=1, tensorboard_log=logs_dir, seed=0)
    99.             model.set_logger(logger)
    100.             print('training')
    101.             model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)
    102.             model.save(model_dir)
    103.             print('model saved')
    104.             del model
    105.             print('model deleted')
    106.         else:
    107.             model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)
    108.             model.set_logger(logger)
    109.             print('fine tuning model')
    110.             model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)
    111.             model.save(model_dir)
    112.             print('new model saved')
    113.             del model
    114.             print('model deleted')
    115.             # print score of the model
    116.          
    117.  
    118.     env.close()
    119.  
    120.     print('training completed')
    121.  
    122.  
    123. if __name__ == '__main__':
    124.     parser = argparse.ArgumentParser(description='Train agent on custom env')
    125.     parser.add_argument('--algo', default='ppo', type=str, required=False, choices=list(ALGOS.keys()), help='RL Algorithm')
    126.     parser.add_argument('--res_dir', type=str, help='Directory to save results')
    127.     parser.add_argument('--model_dir', type=str, help='Directory to save model.zip')
    128.     parser.add_argument('--policy', default='MlpPolicy')
    129.     parser.add_argument('--tensorboard_log', type=str, help='Tensorboard log dir')
    130.     parser.add_argument('--monitor', type=str, help='Monitor wrapper filename')
    131.     parser.add_argument('--n_train_timesteps', default=200000, required=False, type=int, help='Maximum number of timesteps for training')
    132.     parser.add_argument('--pretrained', type=str, default='False', required=False, help='Boolean to determine if the training must started from an existing model')
    133.     parser.add_argument('--model_dir_pretrain', type=str, required=False, help='Directory to load the pretained model')
    134.     args = parser.parse_args()
    135.  
    136.     main(args)
    137.  
    Please, note I'm using
    env = UnityEnvironment(None, side_channels=[channel])

    because with release 19, I was getting the following to start the training (this is not happening now with release 21)
    [INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.

    If anyone could help me understand what is causing these issues or has encountered these before I would REALLY appreciate the help! Also, please let me know if you need further details.
     
  5. petroben

    petroben

    Joined:
    Nov 23, 2023
    Posts:
    1
    It seems like you might be encountering broken links or outdated information. Unity provides a framework called "ml-agents" (Machine Learning Agents) that enables integration with Unity environments for reinforcement learning. You can check the official GitHub repository for the latest documentation and resources:

    GitHub Repository: ML-Agents

    Marcia



    Ensure that you are referring to the latest documentation and follow the instructions there to set up your Unity environment as a gym environment for reinforcement learning in Python. If you encounter specific issues, the GitHub repository's issue tracker can be a helpful resource for seeking assistance or reporting problems.
     
    Last edited: Dec 5, 2023
  6. chiaradivece

    chiaradivece

    Joined:
    Nov 17, 2020
    Posts:
    2
    Hello @petroben thank you for your reply! I'm already using ML-Agents and the mlagents-env, but I'm not interested in using the ML-Agents implementations. With release 19 of ML-Agents I'm able to integrate stable-baselines3 implementation of RL algorithm. The main issue is that new releases of stable-baselines3 do not longer support gym that has been replaced by gymnasium and this is causing an issue with the new release of ML-Agents.