Search Unity

Question GPU Utilization doesn't spike when running Inference on an Onnx Model via Barracuda

Discussion in 'Barracuda' started by blee111, Apr 19, 2023.

  1. blee111

    blee111

    Joined:
    Jan 18, 2022
    Posts:
    1
    Model: PreTrained GoogLeNet from PyTorch and saved as a .onnx file
    Input Dimensions: 1x3x224x224
    Configuration: WorkerFactory.Type.ComputePreCompiled
    Inference Run Time in Editor (avg) when using ComputePreCompiled: 6.33ms
    Inference Run Time in Editor (avg) when using WorkerFactory.Type.CSharp: 14.9ms
    Inference RunTime in Build (avg): 171ms

    I'm currently attempting to improve the inference runtime of GoogLeNet in Unity. Based on the Barracuda docs, I believe that I have configured the model to run optimally, but it is not spiking the GPU Utilization of my Nvidia RTX 3080-ti when inference runs. I have confirmed this both in Task Manager > Performance as well as with an nvidia-smi terminal that there is not a significant (any) change when I run inference on the model.

    Is this the expected behavior or should I be seeing GPU utilization increase when inference is running?

    As a side note, I am also currently testing Wav2Vec2 using ML.Microsoft.OnnxRuntime.DirectML in Unity and I am experiencing the same issue. If I force the Onnx Model to run on CPU, the inference runtime is 72ms (avg), but when I configure the model via SessionOptions for the model to use DML (GPU), the inference runtime is 10.75ms (avg). Same issue, no spike in GPU Utilization but definitely an increase in performance when configured for GPU.