Search Unity

GAIL with visual observations

Discussion in 'ML-Agents' started by andrzej_, Jun 13, 2020.

  1. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Hi,
    I'm trying to use GAIL with visual observations and I'm having some trouble figuring out how to set everything up. This is the error I get with the 128 as the encoding size in the yaml config and 84x84x3 is the visual input. Not really sure what to make of the 10.3MiB


    Traceback (most recent call last):
    File "C:\Users\jezrd\Anaconda3\envs\mla-r1\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
    File "C:\Users\jezrd\Anaconda3\envs\mla-r1\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
    File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\trainer_controller.py", line 350, in trainer_update_func
    trainer.advance()
    File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 151, in advance
    self._process_trajectory(t)
    File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\ppo\trainer.py", line 161, in _process_trajectory
    self.update_buffer, training_length=self.policy.sequence_length
    File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\buffer.py", line 283, in resequence_and_append
    batch_size=batch_size, training_length=training_length
    File "e:\dok\_ml\ml-agents-release_2\ml-agents\mlagents\trainers\buffer.py", line 51, in extend
    self += list(np.array(data))
    MemoryError: Unable to allocate 10.3 MiB for an array with shape (128, 84, 84, 3) and data type float32
    Here's the config file :
    SymbolFinder:
    use_recurrent: true
    sequence_length: 64
    num_layers: 3
    hidden_units: 256
    memory_size: 128
    beta: 1.0e-2
    num_epoch: 8
    buffer_size: 32768
    batch_size: 2048
    max_steps: 10000000
    summary_freq: 10000
    time_horizon: 96
    reward_signals:
    extrinsic:
    strength: 1.0
    gamma: 0.99
    curiosity:
    strength: 0.015
    gamma: 0.99
    encoding_size: 128
    gail:
    gamma: 0.99
    strength: 0.5
    encoding_size: 128
    learning_rate: 0.0003
    use_actions: false
    demo_path: demos/5goals2.demo

    I'm not 100% sure it's the right set-up. The training works normally without GAIL.

    Since memory allocation might be an issue, my set-up: Windows10, RAM 64GB GPU 24 GB. The demonstration file is ~160MB of data


    After dropping the curiosity and reducing the encoding size to 64 it started training. The memory load for the GPU is 5/24 and for RAM it's 36/64 (but that's mostly chrome and 50 other apps running as well).
     
  2. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Did a few tests and I see that the size of the demonstration file and the number of environments also matter. 50MB file with env-num set to two works fine, but a bigger file ~500MB with a few more environment instances fails. So I'm guessing the demonstration file is loaded up to memory for each agent superatlety?

    Would it make sense to start with smaller pool of instances, bootstrap the training with GAIL and then resume training, leaving the action/observation space unchanged and the number of layers/hidden units, without GAIL?
     
  3. vincentgao88

    vincentgao88

    Unity Technologies

    Joined:
    Feb 7, 2018
    Posts:
    21
    Hi andrej_, I guess if you keep running into memory issue, you can train with GAIL and resume with GAIL with a smaller demo file with GAIL regarding signal as 0 (so that the small demo file is ignored). However the best way is still to avoid overload your memory and keep training with the same setup.
     
  4. vincentgao88

    vincentgao88

    Unity Technologies

    Joined:
    Feb 7, 2018
    Posts:
    21
    Also how much memory is your environment taking, is that the cause of the memory issue?
     
  5. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    the exe file itself is 625kB
    I'm not entirely sure what seems to be the issue here, as a few things seem to influence the memory issue (num-envs, batch size, demo recording file size... ). Does the GAIL implementation try to load the whole demo recording at once and for each environment instance, or does it sample it?
    I tend to build the environment and launch from command line with num-envs argument rather than duplicating n-times the agents in the editor ... would that help here? It would be n learners/agents but only a single Unity instance.
     
  6. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    upload_2020-6-16_16-6-18.png
    Here mlagents were training for ~80k steps.
    Single agent/env instance run from editor. batch size 512, buffer size 32768
    the demo recording was ~41MB
    I have still plenty of RAM and VRAM (~30 and ~12 GB respectively), but starting training eats up a lot of my C: drive space (~20-30GB), so I'm wondering if it's possible that it is caching something and at some point runs out of free space.
    If you prefer I can post this as an issue on github.
     
  7. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    If I ever get this working I'll definitely will post a few things how and why ... so far most of the runs look like this:
    upload_2020-6-17_10-40-24.png
    Not sure what are the odds of two separate runs, with different hyper-parameters, to get a sudden 'collapse' after nearly 2M steps ...
    edit: both of those runs were PPO+curosity, wo GAIL.
     
    Last edited: Jun 17, 2020
  8. ervteng_unity

    ervteng_unity

    Unity Technologies

    Joined:
    Dec 6, 2018
    Posts:
    150
    With curiosity, it's possible to get reward collapses later as the agent stops being curious about the things you want it to be curious about =P, and goes off to explore other things.

    Do you have the plots for the Extrinsic and Curiosity rewards (in TB under Policy). Usually this can be solved by making the Curiosity reward much smaller than the Extrinsic reward, so that the agent prioritizes that.
     
  9. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    Here's the plot for two runs with curiosity and one with GAIL and no curiosity
    upload_2020-6-19_8-56-20.png
    policy :
    upload_2020-6-19_8-56-38.png
    loses:
    upload_2020-6-19_8-57-5.png
     
  10. andrzej_

    andrzej_

    Joined:
    Dec 2, 2016
    Posts:
    81
    But I should add that all 3 might have some small changes to the environment and/or hyperparameters. Nothing big, but still it's not a 100% perfect comparison