Question Ml agents gym Wrapper

seifmostafa7347 · Jan 31, 2023

I've been searching for a way to build my own unity environment as a gym environment to run my python code on cloud notebook(Kaggle/collab)
I've found several links referring to unity gym wrappers, however, all the links are broken (even some recent ones from 2 months ago), did they drop the feature or change the documentation links?

hughperkins · Feb 2, 2023

- The low level api is documented here: https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Python-LLAPI.md
- The gym env is documented here: https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Python-Gym-API.md

However, I found them not flexible enough for my own purposes (LLAPI doesn't tell you when episodes finish, and it locks the Unity Editor in between calls to `step`, for example).

So I wrote my own approach, see

seifmostafa7347 · Feb 1, 2023

This is precisely what I was looking for, thanks! you have a new subscriber

chiaradivece · Nov 16, 2023

Hello, following up on this - I downloaded the ml-agents release 21 and the latest stable-baselines3 available (using gymnasium). However, this is causing an issue with the UnityToGymWrapper as sb3 is expecting gymnasium.spaces.box.Box while the Wrapper provides gym.spaces.box.Box. I tried the following:

import gymnasium as gym

However, it's not doing the trick. I had to downgrade sb3 to 1.8.0, which is the latest version supporting gym, but I'd like to transition to newer versions since there's no longer support for gym.

This is the code that I'm using to train the agent:

Code (CSharp):

import gym

from mlagents_envs.environment import UnityEnvironment

from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel

from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper

import os

import warnings

warnings.filterwarnings('ignore')

import sys

import numpy as np

import time

import argparse

from stable_baselines3.common.monitor import Monitor

from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

from stable_baselines3.common.evaluation import evaluate_policy

from stable_baselines3.common.env_util import make_vec_env # creation of parallel environments

from stable_baselines3.common.logger import configure

from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

from stable_baselines3.common.callbacks import BaseCallback

from utils import ALGOS, SaveOnBestTrainingRewardCallback, linear_schedule

import wandb

np.random.seed(2)

def main(args):

"""

:param args: (ArgumentParser) the input arguments

"""

algo = args.algo

model_class = ALGOS[algo]

save_dir = os.path.join(os.path.dirname(__file__), '../results')

res_dir = os.path.join(save_dir, args.res_dir)

model_dir = os.path.join(res_dir, args.model_dir)

logs_dir = os.path.join(save_dir, args.tensorboard_log)

logger_dir = os.path.join(logs_dir, args.model_dir)

if args.pretrained == 'True':

model_dir_pretrain = os.path.join(res_dir, args.model_dir_pretrain)

os.makedirs(res_dir, exist_ok=True)

os.makedirs(logs_dir, exist_ok=True)

os.makedirs(logger_dir, exist_ok=True)

channel = EngineConfigurationChannel()

env = UnityEnvironment(None, side_channels=[channel])

# env = UnityEnvironment('built_scenes/UnityVolumeRendering', side_channels=[channel], base_port=5004)

channel.set_configuration_parameters(time_scale=4)

env = UnityToGymWrapper(env, uint8_visual=False, flatten_branched=False, allow_multiple_obs=False)

env.reset()

env = Monitor(env, logger_dir, allow_early_resets=True)

# env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)

env = DummyVecEnv([lambda: env])

logger = configure(logger_dir, ["stdout", "csv", "log", "tensorboard"])

wandb.init(

# set the wandb project where this run will be logged

project="AgentTransl",

name=args.model_dir,

)

# Setting the policy to “MlpPolicy” means that we are giving a state vector as input to our model.

# There are only other two options:

# - CnnPolicy, of you provide images as input;

# - MultiInputPolicy, for handling multiple inputs

if algo == 'ppo':

print(f'RL Algorithm: {model_class}')

if args.pretrained == 'False':

model = model_class("MlpPolicy", env, verbose=1)

model.set_logger(logger)

print('training')

callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=logger_dir)

model.learn(total_timesteps=args.n_train_timesteps, callback=callback)

model.save(model_dir)

print('model saved')

del model

print('model deleted')

else:

model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)

model.set_logger(logger)

print('fine tuning model')

model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir)

model.save(model_dir)

print('new model saved')

del model

print('model deleted')

elif algo == 'td3':

print(f'RL Algorithm: {model_class}')

n_actions = env.action_space.shape[-1]

action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))

if args.pretrained == 'False':

model = model_class("MlpPolicy", env, action_noise=action_noise, verbose=1, tensorboard_log=logs_dir, seed=0)

model.set_logger(logger)

print('training')

model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)

model.save(model_dir)

print('model saved')

del model

print('model deleted')

else:

model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)

model.set_logger(logger)

print('fine tuning model')

model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)

model.save(model_dir)

print('new model saved')

del model

print('model deleted')

# print score of the model

env.close()

print('training completed')

if __name__ == '__main__':

parser = argparse.ArgumentParser(description='Train agent on custom env')

parser.add_argument('--algo', default='ppo', type=str, required=False, choices=list(ALGOS.keys()), help='RL Algorithm')

parser.add_argument('--res_dir', type=str, help='Directory to save results')

parser.add_argument('--model_dir', type=str, help='Directory to save model.zip')

parser.add_argument('--policy', default='MlpPolicy')

parser.add_argument('--tensorboard_log', type=str, help='Tensorboard log dir')

parser.add_argument('--monitor', type=str, help='Monitor wrapper filename')

parser.add_argument('--n_train_timesteps', default=200000, required=False, type=int, help='Maximum number of timesteps for training')

parser.add_argument('--pretrained', type=str, default='False', required=False, help='Boolean to determine if the training must started from an existing model')

parser.add_argument('--model_dir_pretrain', type=str, required=False, help='Directory to load the pretained model')

args = parser.parse_args()

main(args)

Please, note I'm using
env = UnityEnvironment(None, side_channels=[channel])
because with release 19, I was getting the following to start the training (this is not happening now with release 21)
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.

If anyone could help me understand what is causing these issues or has encountered these before I would REALLY appreciate the help! Also, please let me know if you need further details.

petroben · Dec 5, 2023

It seems like you might be encountering broken links or outdated information. Unity provides a framework called "ml-agents" (Machine Learning Agents) that enables integration with Unity environments for reinforcement learning. You can check the official GitHub repository for the latest documentation and resources:

GitHub Repository: ML-Agents

Marcia

Ensure that you are referring to the latest documentation and follow the instructions there to set up your Unity environment as a gym environment for reinforcement learning in Python. If you encounter specific issues, the GitHub repository's issue tracker can be a helpful resource for seeking assistance or reporting problems.

chiaradivece · Nov 30, 2023

petroben said: ↑

It seems like you might be encountering broken links or outdated information. Unity provides a framework called "ml-agents" (Machine Learning Agents) that enables integration with Unity environments for reinforcement learning. You can check the official GitHub repository for the latest documentation and resources:

GitHub Repository: ML-Agents

Ensure that you are referring to the latest documentation and follow the instructions there to set up your Unity environment as a gym environment for reinforcement learning in Python. If you encounter specific issues, the GitHub repository's issue tracker can be a helpful resource for seeking assistance or reporting problems.
Click to expand...

Hello @petroben thank you for your reply! I'm already using ML-Agents and the mlagents-env, but I'm not interested in using the ML-Agents implementations. With release 19 of ML-Agents I'm able to integrate stable-baselines3 implementation of RL algorithm. The main issue is that new releases of stable-baselines3 do not longer support gym that has been replaced by gymnasium and this is causing an issue with the new release of ML-Agents.

Search Unity

Unity ID

Useful Searches

Question Ml agents gym Wrapper

seifmostafa7347

hughperkins

seifmostafa7347

chiaradivece

petroben

chiaradivece