INFO:mlagents_envs:Connected new brain: Orbit?team=0 INFO:mlagents.trainers:Hyperparameters for the PPOTrainer of brain Orbit: trainer: ppo batch_size: 1024 beta: 0.005 buffer_size: 10240 epsilon: 0.2 hidden_units: 128 lambd: 0.95 learning_rate: 0.0003 learning_rate_schedule: linear max_steps: 1.0e8 memory_size: 256 normalize: False num_epoch: 3 num_layers: 2 time_horizon: 64 sequence_length: 64 summary_freq: 10000 use_recurrent: False vis_encode_type: simple reward_signals: extrinsic: strength: 1.0 gamma: 0.99 summary_path: orbit_0_Orbit model_path: ./models/orbit_0/Orbit keep_checkpoints: 5 2020-04-07 17:21:52.478674: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:mlagents.trainers:Orbit lesson changed. Now in lesson 1: platformDifficulty -> 1.1 Traceback (most recent call last): File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\ww\AppData\Local\Continuum\anaconda3\envs\ml-agents-14-1\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module> File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 479, in main run_cli(parse_command_line()) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 475, in run_cli run_training(run_seed, options) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 320, in run_training tc.start_learning(env_manager) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 218, in start_learning self.reset_env_if_ready(env_manager, global_step) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 270, in reset_env_if_ready self.end_trainer_episodes(env, lessons_incremented) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 236, in end_trainer_episodes self._reset_env(env) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 151, in _reset_env env.reset(config=sampled_reset_param) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\env_manager.py", line 54, in reset manager.end_episode() File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 208, in end_episode self._clean_agent_data(_gid) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 187, in _clean_agent_data del self.episode_rewards[global_id] KeyError: '$0-4' Orbit.yaml Orbit: measure: progress thresholds: [-3.0, -3.0, -3.0] min_lesson_length: 20 signal_smoothing: true parameters: platformDifficulty: [0.0, 1.1, 2.2, 3.3] my command mlagents-learn config/trainer_config.yaml --run-id=orbit_0 --train --curriculum=config/curricula/Orbit.yaml
thresholds: [-3.0, -3.0, -3.0] thresholds (float array) - Points in value of measure where lesson should be increased.
Hi, As pointed out above, your threshold values are negative numbers. Since you are using progress, this should be some ratio of steps to max steps. For example, if you are running for a total of 10 million timesteps, and want your lesson to change at 1 million and 2 million timesteps, your thresholds would be: thresholds: [.1, .2] Did you mean to use reward instead of progress? Please see documentation here https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md
Even with changing to reward I'm hitting the error. Orbit: measure: reward thresholds: [5.0, 10.0, 15.0, 20.0] min_lesson_length: 5 signal_smoothing: true parameters: platformDifficulty: [0.0, 1.0, 2.0, 3.0, 4.0] (ml-agents-14-1) D:\Documents\GitHub\ml-agents>mlagents-learn config/trainer_config.yaml --run-id=orbit_0 --train --curriculum=config/curricula/Orbit.yaml WARNING:tensorflow:From c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term ▄▄▄▓▓▓▓ ╓▓▓▓▓▓▓█▓▓▓▓▓ ,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ ▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ ▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ ^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` '▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ `▀█▓▓▓▓▓▓▓▓▓▌ ¬`▀▀▀█▓ Version information: ml-agents: 0.14.1, ml-agents-envs: 0.14.1, Communicator API: API-14, TensorFlow: 2.0.1 WARNING:tensorflow:From c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term INFO:mlagents_envs:Listening on port 5004. Start training by pressing the Play button in the Unity Editor. INFO:mlagents_envs:Connected new brain: Orbit?team=0 INFO:mlagents.trainers:Hyperparameters for the PPOTrainer of brain Orbit: trainer: ppo batch_size: 1024 beta: 0.005 buffer_size: 10240 epsilon: 0.2 hidden_units: 128 lambd: 0.95 learning_rate: 0.0003 learning_rate_schedule: linear max_steps: 1.0e8 memory_size: 256 normalize: False num_epoch: 3 num_layers: 2 time_horizon: 64 sequence_length: 64 summary_freq: 10000 use_recurrent: False vis_encode_type: simple reward_signals: extrinsic: strength: 1.0 gamma: 0.99 summary_path: orbit_0_Orbit model_path: ./models/orbit_0/Orbit keep_checkpoints: 5 2020-04-09 16:13:49.288449: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:mlagents.trainers:Orbit lesson changed. Now in lesson 1: platformDifficulty -> 1.0 Traceback (most recent call last): File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\ww\AppData\Local\Continuum\anaconda3\envs\ml-agents-14-1\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module> File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 479, in main run_cli(parse_command_line()) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 475, in run_cli run_training(run_seed, options) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 320, in run_training tc.start_learning(env_manager) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 218, in start_learning self.reset_env_if_ready(env_manager, global_step) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 270, in reset_env_if_ready self.end_trainer_episodes(env, lessons_incremented) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 236, in end_trainer_episodes self._reset_env(env) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 151, in _reset_env env.reset(config=sampled_reset_param) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\env_manager.py", line 54, in reset manager.end_episode() File "c:\users\ww3\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 208, in end_episode self._clean_agent_data(_gid) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 187, in _clean_agent_data del self.episode_rewards[global_id] KeyError: '$0-4'
I'm crashing even when I'm within 0-1 for threshold. Orbit: measure: progress thresholds: [.2, .3, .4, .5] min_lesson_length: 5 signal_smoothing: true parameters: platformDifficulty: [0.0, 1.0, 2.0, 3.0, 4.0] D:\Documents\GitHub\ml-agents>mlagents-learn config/trainer_config.yaml --run-id=orbit_0 --train --curriculum=config/curricula/Orbit.yaml WARNING:tensorflow:From c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term ▄▄▄▓▓▓▓ ╓▓▓▓▓▓▓█▓▓▓▓▓ ,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌ ▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄ ▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌ ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌ ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓ ^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓` '▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌ ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀ `▀█▓▓▓▓▓▓▓▓▓▌ ¬`▀▀▀█▓ Version information: ml-agents: 0.14.1, ml-agents-envs: 0.14.1, Communicator API: API-14, TensorFlow: 2.0.1 WARNING:tensorflow:From c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term INFO:mlagents_envs:Listening on port 5004. Start training by pressing the Play button in the Unity Editor. INFO:mlagents_envs:Connected new brain: Orbit?team=0 INFO:mlagents.trainers:Hyperparameters for the PPOTrainer of brain Orbit: trainer: ppo batch_size: 1024 beta: 0.005 buffer_size: 10240 epsilon: 0.2 hidden_units: 128 lambd: 0.95 learning_rate: 0.0003 learning_rate_schedule: linear max_steps: 1.0e5 memory_size: 256 normalize: False num_epoch: 3 num_layers: 2 time_horizon: 64 sequence_length: 64 summary_freq: 10000 use_recurrent: False vis_encode_type: simple reward_signals: extrinsic: strength: 1.0 gamma: 0.99 summary_path: orbit_0_Orbit model_path: ./models/orbit_0/Orbit keep_checkpoints: 5 2020-04-16 13:35:43.846804: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:mlagents.trainers: orbit_0: Orbit: Step: 10000. Time Elapsed: 137.540 s Mean Reward: -16.265. Std of Reward: 19.271. Training. INFO:mlagents.trainers: orbit_0: Orbit: Step: 20000. Time Elapsed: 272.975 s Mean Reward: -12.294. Std of Reward: 19.108. Training. INFO:mlagents.trainers:Orbit lesson changed. Now in lesson 1: platformDifficulty -> 1.0 Traceback (most recent call last): File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\ww\AppData\Local\Continuum\anaconda3\envs\ml-agents-14-1\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module> File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 479, in main run_cli(parse_command_line()) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 475, in run_cli run_training(run_seed, options) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\learn.py", line 320, in run_training tc.start_learning(env_manager) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 218, in start_learning self.reset_env_if_ready(env_manager, global_step) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 270, in reset_env_if_ready self.end_trainer_episodes(env, lessons_incremented) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 236, in end_trainer_episodes self._reset_env(env) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\trainer_controller.py", line 151, in _reset_env env.reset(config=sampled_reset_param) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\env_manager.py", line 54, in reset manager.end_episode() File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 208, in end_episode self._clean_agent_data(_gid) File "c:\users\ww\appdata\local\continuum\anaconda3\envs\ml-agents-14-1\lib\site-packages\mlagents\trainers\agent_processor.py", line 187, in _clean_agent_data del self.episode_rewards[global_id] KeyError: '$0-4'
Is there a reason you are using v0.14.1 ? I am looking at the code for version 0.15.1 and it seems this part of the code changed between 0.15.0 and 0.15.1(exactly on the line you are reporting). I think it is possible that when using curriculum, Agents are deleted when Python sends a reset command to the environment and if the Agent sends a "Done" signal to Python afterwards, it is possible that Python will try to delete the agent AGAIN even though it was deleted during the previous reset (causing errors). I would recommend updating the 0.15.1 and if not possible, modify this method directly to safely clean the agent data : https://github.com/Unity-Technologi...nts/mlagents/trainers/agent_processor.py#L179
Hello vincentpierre, the following link that you provided seem to be not working currently: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Curriculum-Learning.md Is there a new page documenting how to use it?