Question Why training with CPU use GPU vram and cause [CUDA out of memory]?

genesiz20898 · Apr 12, 2022

Hello
I tried using ML-Agents using CPU (as it's recommended in the docs that normally CPU inference training is normally faster than GPU inference).

However, when I inspected my resource in Task Manager, it showed that my GPU vram is being used by ML-Agents.

I have no problem with that if the hidden unit for neural network is set to 256. However, if I increase it to 512, the ML-Agents use up all my GPU vram, and produce this error.

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 6.00 GiB total capacity; 4.61 GiB already allocated; 24.62 MiB free; 4.61 GiB reserved in total by PyTorch)

Why CPU inference training require my GPU vram and lead to that error? Are there any way to solve this error?

I tried to uninstalled Pytorch with CUDA (torch==1.8.2+cu111) and reinstall the version with no CUDA version (torch==1.8.2+cpu. With no CUDA Pytorch, the ML-Agents no longer use my GPU vram, but the training time for each step is 5x increased (which I don't know if it is normal or not since the docs said that normally CPU inference is faster than GPU inference).

Here is my Behavior Parameter Settings

And here is my config file:

My system is Windows 10, Intel i5-6500 CPU with 16gb RAM, and NVIDIA GeForce GTX 1060 6GB.

May be I did something wrong? I am really new for this area, and really need some help, thank you.

xcao65 · Apr 12, 2022

Someone also has similar issue and they solved this by upgrading pytorch to latest version.
https://github.com/Unity-Technologies/ml-agents/issues/5725

GamerLordMat · Apr 23, 2022

genesiz20898 said: ↑

Hello
I tried using ML-Agents using CPU (as it's recommended in the docs that normally CPU inference training is normally faster than GPU inference).

However, when I inspected my resource in Task Manager, it showed that my GPU vram is being used by ML-Agents.

I have no problem with that if the hidden unit for neural network is set to 256. However, if I increase it to 512, the ML-Agents use up all my GPU vram, and produce this error.

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 6.00 GiB total capacity; 4.61 GiB already allocated; 24.62 MiB free; 4.61 GiB reserved in total by PyTorch)

Why CPU inference training require my GPU vram and lead to that error? Are there any way to solve this error?

I tried to uninstalled Pytorch with CUDA (torch==1.8.2+cu111) and reinstall the version with no CUDA version (torch==1.8.2+cpu. With no CUDA Pytorch, the ML-Agents no longer use my GPU vram, but the training time for each step is 5x increased (which I don't know if it is normal or not since the docs said that normally CPU inference is faster than GPU inference).

Here is my Behavior Parameter Settings
View attachment 1036448

And here is my config file:
View attachment 1036451

My system is Windows 10, Intel i5-6500 CPU with 16gb RAM, and NVIDIA GeForce GTX 1060 6GB.

May be I did something wrong? I am really new for this area, and really need some help, thank you.
Click to expand...

Hello,

stupid question, but do you mix up inference and training (Inference GPU has nothing to do with the training prozess.)? Or do you train while having also inference.
If you have a camera sensor I think it uses your graphics card anyway for rendering

Search Unity

Question Why training with CPU use GPU vram and cause [CUDA out of memory]?

genesiz20898

xcao65

Unity Technologies

GamerLordMat

Search Unity

Unity ID

Useful Searches

Question Why training with CPU use GPU vram and cause [CUDA out of memory]?

genesiz20898

xcao65

Unity Technologies

GamerLordMat