Search Unity

Bug Potato performance when run inference

Discussion in 'ML-Agents' started by YodoLuca, Jun 18, 2023.

  1. YodoLuca

    YodoLuca

    Joined:
    May 16, 2018
    Posts:
    10
    Hello!

    I developed a large neural network (<1k LSTM blocks and 2 fully connected with 4k neurons, these big numbers were required for the complex task the agent has to perform) for a virtual reality application. The problem appears after training (or better, during training too, but I did not care too much since it was an automatic training, without the human opponent playing obviously), when run inference through the Unity Inference Engine, the application FPS drops to less then 50 (unplayable in VR). My first trial was to reduce the number of query to the network from one for each physical frame to 5 (resulting in a dumbest agent), and the performance increased a little bit, now I am able to reach 68/69 FPS, but it is still pretty uncomfortable with an headset on. Turning off only the requests to the NN, the application reaches 200 FPS without any problem, but this defeats the whole project.
    Let's suppose that I am ok with those 70 FPS: on a performant hardware like the one where I'm currently working is ok, but when I try to swap toward a lower specs pc, here it is the tragedy.

    Tests:
    RTX 2080 super + 32 gb ram + I9 9990k -> max 70 FPS
    RTX 3060 + 32 gb ram + I7 11700H -> less then 30 FPS
    GTX 1070 + 16 gb ram + I7 7700H -> less then 30 FPS too

    All the above hardware are able to run smoothly any VR game without difficulty.
    One more information: even turning off the virtual reality, using a normal camera and disabling OpenXR, the FPS value remains equal, so it is not a problem related to the use of VR with inference.

    Is there a way to enhance the inference engine performance? I don't want to quit ML-Agents, but if it lead to these terrible results, I don't know how to behave... In a VR application it is fundamental to keep the FPS above 72 in order not to give the user motion sickness and grant an immersive experience.

    (sorry, I posted the same question on Barracuda section, but I'm not sure which group is the right one for this kind of issue)
     
  2. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Are you doing inference on the gpu or cpu?
     
  3. YodoLuca

    YodoLuca

    Joined:
    May 16, 2018
    Posts:
    10
    CPU with burst. I tried gpu but the resulting fps amount is something like 10/11, while with the old classic CPU is slower than burst
     
  4. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Yeah those numbers make sense to me, that is an extremely large network and LSTMs tend to be poor performers. Running VR rendering and a large NN on the same GPU is going to lead to transfer bottleneck issues on consumer hardware, so it isn't very surprising that you saw worse performance.

    Unfortunately your best option is to reduce the size of the network. Can you reframe the training so it isn't a memory task? Serialize some previous important states and feed them in as observations?
     
  5. YodoLuca

    YodoLuca

    Joined:
    May 16, 2018
    Posts:
    10
    I tried more times with stacked observations but I didn't obtained good results. I even tried smaller LSTMs (from 32 to 1024 by power of 2) but the only hyperparameter that worked well is 1024. I will try to re-design the network once again...
    I didn't expect at all this massive impact just for a 1024 LSTM layer :/

    Thank you for your help!
     
  6. Luke-Houlihan

    Luke-Houlihan

    Joined:
    Jun 26, 2007
    Posts:
    303
    Yeah stacked observations have never been useful in my experiments. The technique I've previously used to avoid the overhead of a memory task (LSTMs) is to save important observations to a "memory buffer" which is just an array of vectors I'd like the agent to remember.

    Using a buffer sensor (https://github.com/Unity-Technologi...Design-Agents.md#variable-length-observations) you can feed the memory buffer in as observations and the attention mechanism will learn which observations in the memory buffer are important in which tasks and ignore the ones which aren't.

    Obviously if your memory buffer is huge the agent may still have performance issues but it may be worth a try.