Search Unity

Question ML Agents Torch -- not using GPU?

Discussion in 'ML-Agents' started by BrainSlugs83, Feb 22, 2021.

  1. BrainSlugs83

    BrainSlugs83

    Joined:
    Jun 18, 2015
    Posts:
    38
    I'm using ML Agents to train a simple model... -- and it's destroying my CPU and not even touching my GPU... (65% CPU usage vs 6% GPU usage...)

    I'm not seeing any warnings that my GPU isn't supported (or isn't enabled) -- but just to double check, I wrote quick script to see what python says:
    Code (python):
    1. import torch
    2. print('version:        ' + torch.__version__)
    3. print('cuda available: ' + str(torch.cuda.is_available()))
    4. print('current device: ' + str(torch.cuda.current_device()))
    5. print('device[0]     : ' + str(torch.cuda.device(0)))
    6. print('device count  : ' + str(torch.cuda.device_count()))
    7. print('device name   : ' + torch.cuda.get_device_name(0))
    And here's the output:
    Code (console):
    1. version:        1.7.0+cu110
    2. cuda available: True
    3. current device: 0
    4. device[0]     : <torch.cuda.device object at 0x000001889B5FA3C8>
    5. device count  : 1
    6. device name   : GeForce GTX 1080
    Nothing really looks out of place to me there... -- and my python-fu may be weak, but I think this indicates it's an issue with ml agents and not with my python install, right?

    So, I tried upgrading my ml agents in Unity to the latest beta version... and still no dice. -- Do I have to do something special to tell ml agents that it should use my GPU on windows?

    What is my next troubleshooting step?
     
    Last edited: Feb 22, 2021
  2. andrewcoh_unity

    andrewcoh_unity

    Unity Technologies

    Joined:
    Sep 5, 2019
    Posts:
    162
    Hi @BrainSlugs83

    It looks like torch is using your GPU. With PPO, smaller networks and batch sizes/no visual observations, GPUs won't really make that of a difference unfortunately. If you want to see greater GPU utilization just to test, you can try one of our example environments that use visual observations e.g. Gridworld
     
  3. BrainSlugs83

    BrainSlugs83

    Joined:
    Jun 18, 2015
    Posts:
    38
    Hey, thanks for the reply!

    To be clear in my goals: what I'd really like to see is less CPU utilization, so that I can still use my computer while it trains in the background... -- even if it takes a little longer to train, if it were to offload more to the GPU, I'd be a very happy camper.

    (i.e. it's okay if it doesn't "benefit" from the GPU, as long as it can offload the task there and let me use my computer. -- because 65% CPU Utilziation [with spikes up to 80-90%] on a 5GHz Coffee Lake i7 really brings Windows to it's knees and makes the rest of the computer unusable...)

    1.) Is there a way to validate that the code is actually using the GPU? -- (I'd like to be sure this isn't a bug in ml agents. -- I've read that the python code has to be written in a certain way to use the gpu -- i.e. it has to call ".cuda()" on objects, and there are other things that could go wrong...) -- Is trying the gridworld example the best way to test this?
    1.a.) Possible feature request: maybe adding a message at the console indicating that the GPU is being used would eliminate some confusion for end users like me? (i.e. it looks like these "how do I make it use my GPU" threads are common).

    2.) RE: small batch sizes: is it possible that tweaking my hyper parameters will result in less CPU utilization during training? -- right now my batch size is 1000, and my buffer size is 12000 (the docs seemed to indicate that for PPO, this was within the appropriate ranges?).

    My YAML file is as follows (maybe there's something obvious that's misconfigured here?):
    Code (YAML):
    1. behaviors:
    2.   DriveCar:
    3.     trainer_type: ppo
    4.     hyperparameters:
    5.       batch_size: 1000
    6.       buffer_size: 12000
    7.       learning_rate: 0.0003
    8.       beta: 0.001
    9.       epsilon: 0.2
    10.       lambd: 0.99
    11.       num_epoch: 3
    12.       learning_rate_schedule: linear
    13.     network_settings:
    14.       normalize: true
    15.       hidden_units: 128
    16.       num_layers: 2
    17.       vis_encode_type: simple
    18.     reward_signals:
    19.       extrinsic:
    20.         gamma: 0.99
    21.         strength: 1.0
    22.     keep_checkpoints: 5
    23.     max_steps: 5.0e20 # try to train forever
    24.     time_horizon: 64
    25.     summary_freq: 12000

    By the way, I see you work at Unity -- and I'd like to pass on some feedback:
    Overall, I'm super impressed with this project: I left my network training over night, and this morning my physics based cars were doing three point turn arounds at dead ends of my virtual city, and driving on the right side of the road! -- That's seriously amazing. Great work. -- Now, if there's one thing I would change about ml-agents, it would be the setup process (i.e. it would be nice if there was the option of Unity managing the Python/Torch/CUDA install -- the same way it manages Java and ADK installs -- and if instead of text editing YAML files I could just edit the hyper params in a properties pane -- I think that would make the technology MUCH more accessible to folks like me! -- one of the reasons I waited so long to try this out was the pain point of setting it up manually.)
     
  4. AngrySamsquanch

    AngrySamsquanch

    Joined:
    Dec 28, 2014
    Posts:
    24
    If you open the task manager in windows, select the performance tab, then gpu. Each of the graphs in the gpu window have a dropdown which can be switched to show cuda usage.