Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question How to use the Frame Debugger and Profiler to identify bottlnecks?

Discussion in 'Universal Render Pipeline' started by Sangemdoko, Mar 29, 2023.

  1. Sangemdoko


    Dec 15, 2013

    I'm 3 months away from releasing my first steam game and I need to improve performance. I improved a ton on the scripting side but the rendering is just taking too long.

    RenderPipelineManager.DoRenderLoop_Internal() takes around 8-16ms per frame depending what the camera is looking at. I need to get it down to around 4-6ms to achieve my 60FPS target.

    But I don't have enough experience to know how to identify what are the issues with my rendering. I assume I need to reduce the number or draw calls or reduce the numer of vertices, but I don't want to blindly try optimizing things without completely understanding it first.

    To give a bit of context:
    I have 2 cameras, one for the 3D background and another for the 2.5D foreground.

    My Profiler

    I have things like the SRPBatcher.Flush taking 2.04ms on its own for the background camera

    My Frame Debugger

    Both the Foreground and background camera have a ton of draw calls.

    And my game view stats


    I see that I have 631 draw calls, millions of Tris and thousands of batches. But I can't tell what parts are the issue.

    Is there any way I can know how many 'ms' It takes per draw call / draw group in the frame debugger? That would help me identify which parts are slow so that I can change them.

    Could someone recommend some good learning material for this sort of thing?
  2. TheSniperFan


    Jul 18, 2013
    I don't know what version of Unity and URP you use, so some things may have different names on your end.
    Some things I've noticed:
    • That's a lot of SetPass calls
      • Something is probably breaking batching
    • Those books and scrolls that are copy pasted a hundred times?
      • You want to make sure these are rendered using GPU instancing and not batching
      • It's the same use-case as terrain grass/trees: One object rendered a lot of times in different places
      • That's what GPU instancing is there for
    • Is it perhaps the shadow maps (real time shadows) that are taking so long?
    • When using multiple cameras/render textures in URP, you may want to use a separate URP asset for those extra cameras
      • Do all of your cameras need to use all of the RenderFeatures like SSAO?
    • Have you toggled various features (e.g. shadows, post processing, extra cameras) while playing to figure out how much they cost?
    Here are some additional pointers:

    • It will tell you why a new batch had to be started
    • You want to minimize the number of SRP batches
      • Reduce the number of shader variants you use
      • The material variants feature released in 2022.1/URP 14 is invaluable for that
      • It allows you to use a material as master material and then create variants from it, which act like prefabs
      • It makes it easy to make sure your game minimizes the amount of different shader variants used
      • I have 5 master materials and all environment materials (minus terrain) are variants of them

    • You can either use the old wireframe draw modes, the new rendering debugger or RenderDoc
    • What you're interested in is triangle density
    • Contrary to what many think it's (often) not the number of triangles that are the issue
      • They can be, but modern GPUs can handle a lot of them
    • Skinny triangles and small triangles are inefficient on your GPU
    • The reason you want to LOD is not so much to reduce the number of triangles, but to keep triangle density in check
      • When a high poly model is far away, it gets smaller on screen
      • This means there will be tons of triangles contributing to the final color of a single pixel
    • RenderDoc has some extra visualization modes Unity doesn't: "Quad Overdraw" and "Triangle Size"
    • Triangle size shows exactly what I've been talking about
      • upload_2023-3-30_23-24-37.png
      • upload_2023-3-30_23-27-35.png
      • My example isn't too bad (terrain should be reduced), but RD does help you to figure out where to look
    • Quad overdraw is also important for performance
      • upload_2023-3-30_23-29-23.png
      • Here we can see that the grass (especially) and the trees (less so) cause the same pixels to get redrawn over and over and over again
    • You want to minimize that
      • Use LODs
      • Use billboards
      • Look into impostor billboards, which are fancy billboards that work almost like magic
      • However, you don't want too many LODs
        • Can cause more batches

    • You may want to figure out which mesh takes too long to render
    • You'll have to use the native profiling tools of your GPU vendor for that
    • I have an AMD GPU, so I'd have to use the Radeon GPU Profiler
    • Bear some things in mind
      • DO NOT profile the editor
      • DO NOT profile development builds
      • Make a release build and create your capture in there
      • These are expert tools and the data they show you may be overwhelming
      • It's still useful to know which objects take too long to render
  3. Sangemdoko


    Dec 15, 2013
    Thank you @TheSniperFan

    I'm using Unity 2020.3.30f and URP 10.8.1

    1) Looking at the frame debugger, it seems to more common reason for breaking batching is
    SRP: First call from ScriptableRenderLoopJob
    SRP: Node material requires device state change
    SRP: Node use different shader keywords​

    One part I don't understand is that sometimes the exact same material is not being batched in a row and the reason is "First call from ScriptableRenderLoopJob " which doesn't make sense to me.
    Here I have pass 179 and 180. Both are books, I can't tell what's different and why they are not toghether.

    Note that this is a custom shader I made with shader graph and as you suggested I made enabled GPU instancing:

    2) Yes the books are the same prefab copy pasted, the shader gives them a random color depedning on position. So they all have a random color picked from a gradient (all done in the shader)

    Enabling GPU Instancing or disabling it doesn't seem to affect anything at all, so my assumpotion is that it can't group them toghether.

    But then I tried to do the same with the scrolls that use the URP/Lit shader and no change there either. So I'm not sure if I'm doing something wrong, or if URP or the frame debugger is acting up.

    That's possible, all of these objects can receive and cast shadows.
    I have baked point lights in the scene too from the candles close to the books.

    I removed the 3rd camera that wasn't supposed to be there.
    I have two cameras one for the background (Perspective) and another for the foreground (Orthographic).

    The foreground camera is stacked on top of the background one:

    I was not aware that each camera could use a different renderer asset. That looks like a good thing to look into to optimize each camera independently.

    I'm using RenderFeatures for outlines, stencil buffer for explosion masking, etc... so definetly stuff I can remove for the background.

    5) Yes, There are a ton of things in the scene so there's not a particular main problem. It's a combination of many, many little problems.

    Thank you so much for the links I'll make sure to read through them thoroughly.
  4. TheSniperFan


    Jul 18, 2013
    • SRP: First call from ScriptableRenderLoopJob
      • The object wasn't batched with the previous batch, because there is no previous batch; this was the first batch
      • Nothing wrong here
    • SRP: Node material requires device state change
    • SRP: Node use different shader keywords
      • Shaders can have optional features
        • Think about how the default shader allows you to plug in height maps to use parallax mapping
        • Very expensive feature you won't want to use everywhere
        • Another example are different blend modes
        • You have to choose one of the available options
      • Shader keywords are used to toggle those shader features
      • Each permutation creates a different shader (shader variants)
      • The SRP batcher doesn't batch shaders, but shader variants
      • If you use the same shader source file on two materials, but one has reflections turned off, it creates two shader variants and they can't be batched

    • It takes more to use GPU instancing on URP
      • One thing that I know is that you have to disable SRP batching on the material
      • Otherwise it'll default to the SRP batcher
    • I'd recommend to create an empty scene with a few of those scrolls
      • Then check with the frame debugger
      • It'll tell you whether it uses the SRP batcher or GPU instancing
    • GPU instancing only works if you use the same mesh, the same material and the same material properties
      • If you want to change material properties while using GPU instancing, you have to use MaterialPropertyBlocks
      • More info here (may be outdated)
      • I don't know if ShaderGraph supports GPU instancing
    • If those prefabs can't move independently, another option would be to use one of those mesh combiner assets

    • You can turn off "Render Shadows" on your camera(s) to see if realtime lighting is the culprit
    • Same with post processing

    • How dynamic are those backgrounds?
      • Do they need dynamic lighting?
      • Do objects move?

    • Can you post a high-res screenshot in "Shaded Wireframe" draw mode?