Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Unity's aggressive overdraw minimization harms performance - kills dynamic batching, GPU instancing

Discussion in 'General Graphics' started by Zergling103, Jun 21, 2018.

  1. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    In our project we've been constantly fighting against Unity's stubborn preference to strictly render opaque objects from front to back in order to minimize overdraw. While this sounds fine on paper it ultimately results in slow rendering times due to increasing the number of draw calls.

    In the case of rendering a densely populated scene like forests and cityscapes, you're reusing the same model over and over, thus you may use GPU instancing or dynamic batching to draw all of these objects very quickly with one draw call.

    However, Unity's very strict minimization of overdraw philosophy prevents developers from being able to utilize these drawcall minimization optimizations in most real-world rendering situations (without using hacks and work arounds). If you have two different types of meshes scattered about across a terrain say, Unity could draw all instances of these meshes in two draw calls if it wanted to. However, instead it insists on drawing by distance, and thus draws one instance of A, then another instance of B, then maybe a couple instances of A, and a couple instances of B, and so on, resulting in many draw calls and awful performance.

    One horrible solution we come up with is to put each different mesh+material combination into different layers of the Render Queue (Think Opaque, Transparent, Overlay, etc.) bypassing distance rendering order, so that all the things that can be batched or instanced will be rendered consecutively. This works, dramatically increasing performance. However, we quickly realized that using property blocks causes Unity to ignore material-specified queue order, thus reverting to the queue setting specified by the shader, thus reverting everything to slow front-to-back ordering.

    The next, more horrible solution, was to not use property blocks (even though we kinda needed them) and instead achieve the same effect by massively duplicating materials in the project via an automated build step. This is pretty awful, but it prevented Unity from ignoring the render queue order specified in the material. (Please fix this bug!)

    However, a better solution is in the works with using Renderer.sortingOrder in the same way we used the render queue to bypass Unity's aggressive sorting. Hopefully, when we start using property blocks again, Unity will not decide to throw away the sorting order we specified.

    Anyway to summarize - 2 issues:

    1. Unity needs to think of a better scheme for managing the trade off between overdraw and draw call count. Keep in mind that even though graphics APIs are getting lighter and the cost of draw calls will decrease, GPU speed and thus the cost of overdraw always drops over time according to moore's law, a phenomenon that has ceased with CPUs. Grouping things into chunks may be an option (where you can specify chunk size, with a chunk size of 0 being the behaviour we have now). It'd render the chunks from front to back, but within chunks, batchable or instancable objects are drawn together: DrawCalls.OrderBy(Distance).OrderBy(BatchingCriteriaLikeMaterialAndMesh).OrderBy(ChunkIndex)

    2. Enabling property blocks will cause unity to ignore the render queue specified by the material, reverting to the render order specified by the shader instead. While in our case we used this as a hack for performance boosts, this could break some people's visuals unexpectedly. Sounds like a bug as well.
     
    Last edited: Feb 9, 2019
    Martin_H, jvo3dc and sharkapps like this.
  2. jvo3dc

    jvo3dc

    Joined:
    Oct 11, 2013
    Posts:
    1,520
    You should be able to set this per platform maybe. I'm guessing it's generally a good choice to minimize overdraw for mobile platforms. For desktop you generally want to minimize the draw calls.

    I haven't noticed this for opaque geometry, but I have seen it with transparent geometry. Unity will break hardware instancing in favor of draw order, which can cause a major performance hit. (It will also break hardware instancing in favor of batching, which doesn't really seem to make sense in any situation.)

    For opaque geometry I'd prefer hardware instancing in any case. For transparent geometry I'd still prefer it and I'll order the render queue according to the visual impact of drawing in the wrong order. So it would be two per platform settings:
    - Opaque geometry scheme (optimize fill rate, optimize draw calls)
    - Transparent geometry scheme (optimize order, optimize draw calls)

    Good to know about the property block issue. We also generally use that for instance colors and I'm not that impressed with the performance. Now I know it can't be fixed by changing the queue.
     
  3. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    By the way, we found that using Renderer.sortOrder AND render queues (Opaque, Transparent, Overlay, etc.) together were required to get the intended effect.

    This is because renderers with multiple materials had each material drawn consecutively, and thus could not be batched. Using the render queue seemed to fix this though property blocks don't seem to affect it while using sort order? I'm not entirely sure I understand wtf Unity is doing at this point as I'm not working on the work around for this problem.
     
    Martin_H likes this.
  4. TeKniKo64

    TeKniKo64

    Joined:
    Oct 7, 2014
    Posts:
    30
    I am pretty sure Unity just leaves stuff like this in, so at the next release they can fix it and say, "Look guys, its a little bit faster now." It's exactly what I would do.
     
  5. WiktorCal

    WiktorCal

    Joined:
    Mar 1, 2014
    Posts:
    4
    My dear Ford. I was battling with the back to front rendering order for some days.
    I couldn't understand why Unity cannot draw the object with the same material in one batch.

    Setting the Render Queue makes so much sense.
     
    Zergling103 likes this.
  6. BakeMyCake

    BakeMyCake

    Joined:
    May 8, 2017
    Posts:
    175
    @Zergling103 I'm confused about some of the things you've written, but am very interested in any wisdom you could share. Could you clarify?

    You wrote:
    How does rendering opaque objects back to front reduce overdraw? If anything this is the definition of overdraw, since nothing will ever be discarded during depth tests. I thought Unity tries to do the opposite. If you know of some cases where back to front is better please share them. In my personal observations dynamic batching does break occasionally and I'm trying to identify if the nature of your issues is the same as mine.

    Then you wrote:
    Can you explain how you used sortOrder to solve the problem? Isn't it only relevant to sprite renderers or am I wrong?
     
    dadude123 likes this.
  7. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    789
    This!
    You obviously draw the stuff closest to the camera first, and then continue in order of increasing distance.
    Then (in an idealized case) the pixels of the later objects will all be instantly discarded!


    Maybe he meant that and it was just a typo. But it happened multiple times, so maybe there's some misunderstanding here somewhere.
     
  8. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Whenever I say "back to front" I misspoke and meant "front to back" - or more clearly "near to far".
     
  9. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I'm quite confused by your comment that it it increases draw calls.... Unity doesn't actually change draw calls once rendering is in progress.

    So regardless if you sorted front to back or back to front, the draw calls would remain the same, because the CPU does not know anything about the pixels that are discarded.

    In any case we don't use static or dynamic batching because both are slower than just instancing or regular rendering, specially with draw mesh instanced indirect. Also, you would probably want to look at ECS rendering, which is the fastest (megacity demo for example).
     
  10. drcrck

    drcrck

    Joined:
    May 23, 2017
    Posts:
    328
    I can confirm Zergling103 is right
    Example:

    upload_2019-2-9_9-49-56.png

    There is only 1 type of house on this scene and it can be rendered with 1 draw call
    Instead, Unity splits it to many calls, starting from the closest one, then batching others in small groups, rendering other objects in between and doing other things described in the first post
     
    WildStyle69 likes this.
  11. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I see so this is with dynamic/static batching in mind, makes sense it might change and why I wouldn't see any change.
     
  12. drcrck

    drcrck

    Joined:
    May 23, 2017
    Posts:
    328
    the problem is that overdraw is not really an issue for modern gaming gpus, it's definitely not worth the time spent for preparing, sending and processing all these extra draw calls
     
  13. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I think overdraw's expense rather depends on the bandwith and shader complexity of that fragment. I would be keen to avoid overdraw with a full HDRP material.
     
  14. drcrck

    drcrck

    Joined:
    May 23, 2017
    Posts:
    328
    HDRP uses depth prepass, there is no overdraw with "full" materials
     
  15. LazloBonin

    LazloBonin

    Joined:
    Mar 6, 2015
    Posts:
    809
    For anyone looking into this: from early tests it looks like this can be fixed entirely by setting Camera.opaqueSortMode to OpaqueSortMode.NoDistancSort! This wildly reduces the amount of draw calls when using proper GPU instancing, to guaranteed 1 call per mesh/material combo. :)
     
    PutridEx and Martin_H like this.
  16. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    This is great news!

    I still think it'd be nice to have a configurable trade-off. That is, batchable drawcalls are batched in chunks from near-to-far, e.g. in groups of N drawcalls.