Question ML Agents Torch -- not using GPU?

BrainSlugs83 · Feb 22, 2021

I'm using ML Agents to train a simple model... -- and it's destroying my CPU and not even touching my GPU... (65% CPU usage vs 6% GPU usage...)

I'm not seeing any warnings that my GPU isn't supported (or isn't enabled) -- but just to double check, I wrote quick script to see what python says:

Code (python):

import torch

print('version: ' + torch.__version__)

print('cuda available: ' + str(torch.cuda.is_available()))

print('current device: ' + str(torch.cuda.current_device()))

print('device[0] : ' + str(torch.cuda.device(0)))

print('device count : ' + str(torch.cuda.device_count()))

print('device name : ' + torch.cuda.get_device_name(0))

And here's the output:

Code (console):

version: 1.7.0+cu110

cuda available: True

current device: 0

device[0] : <torch.cuda.device object at 0x000001889B5FA3C8>

device count : 1

device name : GeForce GTX 1080

Nothing really looks out of place to me there... -- and my python-fu may be weak, but I think this indicates it's an issue with ml agents and not with my python install, right?

So, I tried upgrading my ml agents in Unity to the latest beta version... and still no dice. -- Do I have to do something special to tell ml agents that it should use my GPU on windows?

What is my next troubleshooting step?

andrewcoh_unity · Feb 22, 2021

Hi @BrainSlugs83

It looks like torch is using your GPU. With PPO, smaller networks and batch sizes/no visual observations, GPUs won't really make that of a difference unfortunately. If you want to see greater GPU utilization just to test, you can try one of our example environments that use visual observations e.g. Gridworld

BrainSlugs83 · Feb 22, 2021

Hey, thanks for the reply!

To be clear in my goals: what I'd really like to see is less CPU utilization, so that I can still use my computer while it trains in the background... -- even if it takes a little longer to train, if it were to offload more to the GPU, I'd be a very happy camper.

(i.e. it's okay if it doesn't "benefit" from the GPU, as long as it can offload the task there and let me use my computer. -- because 65% CPU Utilziation [with spikes up to 80-90%] on a 5GHz Coffee Lake i7 really brings Windows to it's knees and makes the rest of the computer unusable...)

1.) Is there a way to validate that the code is actually using the GPU? -- (I'd like to be sure this isn't a bug in ml agents. -- I've read that the python code has to be written in a certain way to use the gpu -- i.e. it has to call ".cuda()" on objects, and there are other things that could go wrong...) -- Is trying the gridworld example the best way to test this?
1.a.) Possible feature request: maybe adding a message at the console indicating that the GPU is being used would eliminate some confusion for end users like me? (i.e. it looks like these "how do I make it use my GPU" threads are common).

2.) RE: small batch sizes: is it possible that tweaking my hyper parameters will result in less CPU utilization during training? -- right now my batch size is 1000, and my buffer size is 12000 (the docs seemed to indicate that for PPO, this was within the appropriate ranges?).

My YAML file is as follows (maybe there's something obvious that's misconfigured here?):

Code (YAML):

behaviors:

DriveCar:

trainer_type: ppo

hyperparameters:

batch_size: 1000

buffer_size: 12000

learning_rate: 0.0003

beta: 0.001

epsilon: 0.2

lambd: 0.99

num_epoch: 3

learning_rate_schedule: linear

network_settings:

normalize: true

hidden_units: 128

num_layers: 2

vis_encode_type: simple

reward_signals:

extrinsic:

gamma: 0.99

strength: 1.0

keep_checkpoints: 5

max_steps: 5.0e20 # try to train forever

time_horizon: 64

summary_freq: 12000

By the way, I see you work at Unity -- and I'd like to pass on some feedback:
Overall, I'm super impressed with this project: I left my network training over night, and this morning my physics based cars were doing three point turn arounds at dead ends of my virtual city, and driving on the right side of the road! -- That's seriously amazing. Great work. -- Now, if there's one thing I would change about ml-agents, it would be the setup process (i.e. it would be nice if there was the option of Unity managing the Python/Torch/CUDA install -- the same way it manages Java and ADK installs -- and if instead of text editing YAML files I could just edit the hyper params in a properties pane -- I think that would make the technology MUCH more accessible to folks like me! -- one of the reasons I waited so long to try this out was the pain point of setting it up manually.)

AngrySamsquanch · Feb 23, 2021

If you open the task manager in windows, select the performance tab, then gpu. Each of the graphs in the gpu window have a dropdown which can be switched to show cuda usage.

Search Unity

Question ML Agents Torch -- not using GPU?

BrainSlugs83

andrewcoh_unity

Unity Technologies

BrainSlugs83

AngrySamsquanch

Search Unity

Unity ID

Useful Searches

Question ML Agents Torch -- not using GPU?

BrainSlugs83

andrewcoh_unity

Unity Technologies

BrainSlugs83

AngrySamsquanch