Hi, I'm trying to use GAIL with visual observations and I'm having some trouble figuring out how to set everything up. This is the error I get with the 128 as the encoding size in the yaml config and 84x84x3 is the visual input. Not really sure what to make of the 10.3MiB Traceback (most recent call last): File "C:\Users\jezrd\Anaconda3\envs\mla-r1\lib\threading.py", line 916, in _bootstrap_inner self.run() File "C:\Users\jezrd\Anaconda3\envs\mla-r1\lib\threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\trainer_controller.py", line 350, in trainer_update_func trainer.advance() File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 151, in advance self._process_trajectory(t) File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\ppo\trainer.py", line 161, in _process_trajectory self.update_buffer, training_length=self.policy.sequence_length File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\buffer.py", line 283, in resequence_and_append batch_size=batch_size, training_length=training_length File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\buffer.py", line 51, in extend self += list(np.array(data)) MemoryError: Unable to allocate 10.3 MiB for an array with shape (128, 84, 84, 3) and data type float32 Here's the config file : SymbolFinder: use_recurrent: true sequence_length: 64 num_layers: 3 hidden_units: 256 memory_size: 128 beta: 1.0e-2 num_epoch: 8 buffer_size: 32768 batch_size: 2048 max_steps: 10000000 summary_freq: 10000 time_horizon: 96 reward_signals: extrinsic: strength: 1.0 gamma: 0.99 curiosity: strength: 0.015 gamma: 0.99 encoding_size: 128 gail: gamma: 0.99 strength: 0.5 encoding_size: 128 learning_rate: 0.0003 use_actions: false demo_path: demos/5goals2.demo I'm not 100% sure it's the right set-up. The training works normally without GAIL. Since memory allocation might be an issue, my set-up: Windows10, RAM 64GB GPU 24 GB. The demonstration file is ~160MB of data After dropping the curiosity and reducing the encoding size to 64 it started training. The memory load for the GPU is 5/24 and for RAM it's 36/64 (but that's mostly chrome and 50 other apps running as well).
Did a few tests and I see that the size of the demonstration file and the number of environments also matter. 50MB file with env-num set to two works fine, but a bigger file ~500MB with a few more environment instances fails. So I'm guessing the demonstration file is loaded up to memory for each agent superatlety? Would it make sense to start with smaller pool of instances, bootstrap the training with GAIL and then resume training, leaving the action/observation space unchanged and the number of layers/hidden units, without GAIL?
Hi andrej_, I guess if you keep running into memory issue, you can train with GAIL and resume with GAIL with a smaller demo file with GAIL regarding signal as 0 (so that the small demo file is ignored). However the best way is still to avoid overload your memory and keep training with the same setup.
the exe file itself is 625kB I'm not entirely sure what seems to be the issue here, as a few things seem to influence the memory issue (num-envs, batch size, demo recording file size... ). Does the GAIL implementation try to load the whole demo recording at once and for each environment instance, or does it sample it? I tend to build the environment and launch from command line with num-envs argument rather than duplicating n-times the agents in the editor ... would that help here? It would be n learners/agents but only a single Unity instance.
Here mlagents were training for ~80k steps. Single agent/env instance run from editor. batch size 512, buffer size 32768 the demo recording was ~41MB I have still plenty of RAM and VRAM (~30 and ~12 GB respectively), but starting training eats up a lot of my C: drive space (~20-30GB), so I'm wondering if it's possible that it is caching something and at some point runs out of free space. If you prefer I can post this as an issue on github.
If I ever get this working I'll definitely will post a few things how and why ... so far most of the runs look like this: Not sure what are the odds of two separate runs, with different hyper-parameters, to get a sudden 'collapse' after nearly 2M steps ... edit: both of those runs were PPO+curosity, wo GAIL.
With curiosity, it's possible to get reward collapses later as the agent stops being curious about the things you want it to be curious about =P, and goes off to explore other things. Do you have the plots for the Extrinsic and Curiosity rewards (in TB under Policy). Usually this can be solved by making the Curiosity reward much smaller than the Extrinsic reward, so that the agent prioritizes that.
But I should add that all 3 might have some small changes to the environment and/or hyperparameters. Nothing big, but still it's not a 100% perfect comparison