Search Unity

Need help understanding GPU bottleneck

Discussion in 'Universal Render Pipeline' started by sylvain_unity893, Jan 22, 2021.

  1. sylvain_unity893

    sylvain_unity893

    Joined:
    Jan 5, 2021
    Posts:
    3
    Hello folks,
    I'm not very experienced in optimizing graphics and I'm in front of a weird situation that I would like to understand in order to find the best course of action. Hopefully somebody here is willing to help.

    We are building a VR interactive story for the Oculus Quest 2 using URP and Unity 2020.2 and we want to establish the level of detail we can use on our characters, especially in the face, for facial expressions. As such, we created a scene that proxies our desired environment and have benchmarked it on the Quest 2. While doing so, I noticed a very weird behaviour where the ceiling and floor of our indoor environment cost around 15% of the GPU by themselves (as reported by OVR Metrics Tool).

    The roof and floor are very simple static boxes with baked indirect lighting coming from a single mixed directional light. Both use the URP/Simple Lit shader and there are two other lights in the scene, a baked rectangle light for a window and a non shadow casting spot light.

    I have tried many things but can't come to a conclusion as to why the GPU usage is so high on those surfaces. I know I'm not CPU bound as the render thread has some WaitForGFX in the profiler. I know it's not shadow casting as I disabled all shadows for other objects and still get the same slowdown. However, if I change their shader to Unlit, the GPU usage drops the same amount as disabling the GameObjects. I notice when enabling the GOs that the textures per fragment bumps from 2 to 3 and the instructions per fragment jumps from ~60 to ~100. The number of draw calls in the Frame Debugger is 44, all environment GOs are fully static and appear to be static batched since they get an unnamed mesh in the inspector.

    What sense can I make of this or what other info should I look into to figure out what is going on?

    Thanks for the help!!!

    Scene preview (with the two problematic GOs highlighted)
    upload_2021-1-22_9-29-24.png

    Profiler view with GOs enabled
    upload_2021-1-22_9-33-22.png
     
    Last edited: Jan 22, 2021
  2. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    That's GPU overdraw.
    The floor and ceiling polygons are being rendered before the rest of the objects, then the rest of the object's polygons got rendered over it.
    This is happening BECAUSE of batching, and that's why, if you change the shader to something else than the rest of the objects in the scene, it will render faster, since the floor and ceiling will have their own batch separated from the rest and will likely be rendered after the rest.
    This is happening more on URP than on Built-in render pipeline because on Built-in the batches were made by identical materials, while on URP it batches per shader, and thus all objects with the same shader will be batched in the same polygon soup without being able to sort them to avoid overdraw.

    Generaly speaking, batching and instancing are great for reducing the overhead of CPU-GPU communication, but it's not great for actual GPU performance, since it can lead to overdraw easily.

    What you can do:
    You can disable the SRP Batcher all together in the Advanced section of the UniversalRP settings. (which will be bad for performance) or
    You can make a script to specify the rendering order of specific objects like the floor and the ceiling, to ensure they don't batch with others and get rendered last, although I'm not completely sure it will work correctly with URP, since I haven't tried it.
    Look at Renderer.sortingOrder
    All objects have a rendering order of zero by default, so just set this two objects a rendering order of 100 and check out the performance in device.

    Let me know if specifying the sorting order works with URP. It does with Built-in. :)
     
    PutridEx likes this.
  3. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Oh! Wait! There is actually a very helpful setting in URP's LIT shader just for that!
    Go to the materials of your floor and ceiling and open the "Advanced" tab in the bottom.
    There is a slider named "Priority". Set it to "-10" and check if that fixes your issues.
    (supposing your floor and ceiling use different materials than the rest of the scene. If not, then make separated materials for them and set the priority to -10)

    This slider should do the same stuff as the Renderer.sortingOrder and it's available directly on the material GUI, which is great!
     
  4. sylvain_unity893

    sylvain_unity893

    Joined:
    Jan 5, 2021
    Posts:
    3
    Hello Joe,
    First of all thanks for your time. However, I tried your trick and I'm getting exactly the same result. I was already using a different material for the floor so I created one for the ceiling and set the both to -10 priority. Same result. I then tried 10 instead, thinking it might be reversed and I'm getting the exact same. I also made them non-static so they are not batched, same result. I then changed the shader to a shader that isn't used in the scene (from Lit to Simple Lit) and it's better but they are still using about 6% GPU to render those 2 mehshes.

    Any other idea? And while we're at it, I'm more used to Unreal graphics debugging and I can't find a way to view different graphics debugging views. I looked around for a way to see the overdraw of the scene and I've seen screenshots of it being a dropdown in the Game view but I can't find it. I find the tools to debug graphics are lacking from what I'm used to.

    Thanks again,
     
  5. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Seems like the priority is ignored when batching then. You'll need to use Renderer.sortingOrder by script.
    They might still be batching: either dynamic batching or instancing.
    You'll need to disable Dynamic Batching in the URP settings preset.
    Go to Project Settings -> Graphics -> Scriptable Render Pipeline Settings -> Advanced and uncheck Dynamic Batching if it'is enabled.
    (you can disable SRP Batcher there too for testing)
    Check also the material for the ceiling and floor and be sure Advanced->Enable GPU Instancing is Disabled
    Yeah... seems like it's not available on URP and HDRP...
    The "new" scriptable render pipelines are still a work in progress... :rolleyes:

    Actually, yes:
    Use the Frame Debugger slider (top right) to manually move through all the render passes from 0 to 44 and check at which point the floor and ceiling are drawn. If they are being drawn at the same time as other objects, then they are being batched and you have to remove them from the batch.
    Again, make a script to set the Renderer.sortingOrder manually in your floor and ceiling objects. This will remove them from other batches. Then check the rendering order of the passes again in the Frame Debugger.

    I haven't used Unreal, but for me, the Frame Debugger is really all I need to check this things: I just scrub the render passes with the slider to figure things out. Two or three years ago, we didn't even have a Frame Debugger, so we had to TRUST OUR INSTINCTS in Unityland... :D
     
  6. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Oh, another thing:
    By default, Unity disables front to back sorting on GPUs that perform early Z sorting by their-selves.
    It's supposed to be the case on Snapdragons like on the Quest, but I have found that forcing front to back sorting of the passes on the CPU is faster nonetheless (even if they don't admit it at Unity)
    So you can write a script to force the camera to sort the geometry before batching and rendering.
    Just make a minimal script that forces it and put it in the central Camera of the Quest camera rig.
    Look at Camera.opaqueSortMode

    Apart from this. I really don't know what else it could be.
    Again, the new render pipelines are kind of unexplored right now... it could very well be some strange internal behaviour of URP not actually accessible.
     
  7. sylvain_unity893

    sylvain_unity893

    Joined:
    Jan 5, 2021
    Posts:
    3
    Hello Joe and thanks again,
    The Frame Debugger is buggy to say the least. It doesn't appear to clear it's render buffer when changing render steps and therefore makes it almost impossible to know what was rendered. I tried uninstalling Unity and reinstalling but it's still the same... Therefore I haven't used it much.

    Nevertheless, I got it working once in the editor and through the Quest and the floor and ceiling are rendered last but still with the same 15-20% render cost with batching off (Draw calls shot up to 200+). Priority on the shader does seem to work by the way.

    So what's next from here? I'm out of ideas...
     
  8. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Idk... maybe send a bug report to unity including the whole project.
    URP is still a work in progress, so there is a chance you actually found some internal bug and they fix it.
    Try to reproduce the same behaviour on a minimal project and send it to them (Unity menu bar -> help -> bug report)
     
  9. LeRondPoint

    LeRondPoint

    Joined:
    Jan 22, 2021
    Posts:
    10
    Hello Joe,
    I stripped the scene completely. Left the floor and the camera rig. Floor takes 8% GPU... from 22% to 30% at this point I guess it’s just the fragment shader cost for a ‘large’ screen area.

    Anybody else has a clue?
     
  10. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Ok, silly question: are you sure you are comparing GPU usages at the same GPU LEVEL?
    If the GPU LEVEL in the OVR Metrics is less than 4, it means the GPU has lowered it's power consumption because it was under-used. In that case, the GPU USE percent will go up, but the Quest has still plenty of headroom. Think about the GPU LEVEL like the gearbox in a manual car and the GPU USAGE are the revolutions of the engine: a car in 3rd gear at 6000 revs will go slower than the same car in 5th gear at 4000 revs.

    I don't know if you have read ALL the docs about it, but if you haven't, you absolutely need to.
    Check here on CPU and GPU levels and this blog post about power management.
     
  11. LeRondPoint

    LeRondPoint

    Joined:
    Jan 22, 2021
    Posts:
    10
    Hey,
    so of course in the last test it was at level 2 but all the ones before that it was at 3. With that said, I’ve never seen the GPU usage % rise in OVRMetrics because of a level drop and vice-versa so I think it is adjusted for the level. The numbers are not exact but, basically, it goes from lvl 2 to 3 at 60% and from 3 to 4 at 95% but it doesn’t drop it’s usage % when doing so.
     
  12. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    No. Not at all. The CPU and GPU levels are different power management levels: with a lower level, the GPU has less power and thus rendering will be slower with the exact same geometry, which will make the usage go up.
    Read the links I posted up there.
     
  13. LeRondPoint

    LeRondPoint

    Joined:
    Jan 22, 2021
    Posts:
    10
    I did and understand power levels but it does not fit what I see in OVR Metrics Tools. It will never hit 90% at level 2 or 60% at level 4 because it will dynamically change the power level when it reaches higher or lower usage numbers. It only makes sense to keep the level in mind if you lock the power levels, which I’m not doing.

    So yeah 8% at level 2 is not the same as 8% at level 3 so perhaps referring to the GPU frame time would be a better metric for the forums yet it also depends on GPU clock... Is there any documentation for the clocks at different power levels?