Search Unity

  1. Looking for a job or to hire someone for a project? Check out the re-opened job forums.
    Dismiss Notice
  2. Unity 2020 LTS & Unity 2021.1 have been released.
    Dismiss Notice
  3. Good news ✨ We have more Unite Now videos available for you to watch on-demand! Come check them out and ask our experts any questions!
    Dismiss Notice

General questions regarding Early-Z

Discussion in 'General Graphics' started by Dorodo, Feb 27, 2021.

  1. Dorodo


    Mar 8, 2015
    I've been taking a second look at some of the steps on the GPU pipeline and one of the things that has been on my mind is regarding Early-Z, since it seems like a deep topic and one that can be extremely useful for performance, so I decided to raise some questions (which may not make any sense at all or mix unrelated topics, so I apologize in advance for any confusion or lack of knowledge on the matter).

    1) Does Early-Z get disabled for the entire rasterization process if there is a single object altering the Depth buffer or does it only affect the pixels which have the depth modified in the fragment stage?

    2) Does changing the ClipPos.z value break Early-Z?

    3) Are there any operations other than clip() that can discard a pixel on the fragment stage and disable Early-Z optimizations?

    4) Regarding Z-Prepass, the idea seems to imply that it renders additional drawcalls (although extremely lightweight). Does Early-Z depend on it in order to work? Could it actually become a bottleneck in your pipeline due to the initial pre-pass costs?

    5) Is there any way to check if Early-Z is working on a Device?

    6) Does Stencil Masking affect Early-Z?

    7) I've seen some examples of games utilizing dithering on transparent objects, such as Super Mario Odyssey (below). If this is an alpha-test, Wouldn't this kill early-z optimizations or is it not as big of a deal on console architectures? (I imagine the Switch utilizes a tile-based renderer similar to mobile GPUs, so that adds an extra layer of confusion to me)

    Thank you in advance :) .
  2. bgolus


    Dec 7, 2012
    Early Z only gets disabled for that object. However on some (mainly older mobile) GPUs it also disables some depth buffer optimizations making all subsequent uses of the depth buffer slightly slower. That is something you can basically ignore for modern GPUs.

    No. Early Z happens after the vertex shader. Nothing you do on the vertex shader stage will have any affect on if early Z happens or not. Only render state and the fragment shader can affect that.


    But that’s not really fair since
    if(x<0) discard;

    and its variants, if you set the output to a value greater than 1.0 or less than 0.0.
    edit: The use of
    always disables early Z, regardless of the output value.
    can be used to output depth and keep early Z.

    which lets you control the coverage bits for MSAA to do manual alpha to coverage.

    since early Z is done at the same time as early stencil rejection, so both get disabled if the fragment shader modifies the reference value.

    There’s probably some DX12 ones I’m missing.

    The idea behind a Z prepass is to fill in the depth buffer with cheap draws so that you avoid over shading (leveraging early z on the subsequent “full fat” draws). There was a frame breakdown of Cyberpunk 2077 that showed they used a Z prepass that only rendered a handful of visible meshes, presumably only those that are fully opaque and were considered big enough to block a lot. Other renderers, like Unity’s HDRP, ended up adding a Z prepass to make grass rendering more efficient, so they render everything that uses ZWrite into the Z prepass. And yes, there’s no perfect solution. You can eventually spend more time rendering the Z prepass than the savings you get from having one.

    GPUs that use tile based deferred renderering, like Mali and Adreno, are basically doing hardware level Z prepass btw. They basically made the choice to always do a Z prepass for everything ... except they don’t because they can also swap to immediate mode rendering per tile if it doesn’t think it’ll be faster to do TBDR for that tile. Like if there’s only one or two triangles visible, or none of the triangles write to depth.

    Within Unity? Not really. RenderDoc also doesn’t seem to have a way to check. I recently asked the author of RenderDoc about this exact topic and he said he had no idea how he’d be able to add it. I think Nvidia Nsight can show it, and I would expect tools like Snapdragon Profiler to as well. I’ve been meaning to check if Nsight does or not but I seem to remember seeing a flag for earlydepthstencil in there someplace.

    As noted above, stencil masking happens at the same time as early z. (Ignoring the
    case.) The depth buffer is more accurately the depth stencil buffer, and when it’s checking the depth value and comparison state for early rejection it simultaneously checks the stencil value and comparison state.

    Dithering can often be cheaper than alpha blending even on some mobile hardware. The Tegra X1 in the Switch is based on the desktop Maxwell GPU family (desktop Nvidia GTX 700 series). It is tile based, and all Nvidia GPUs have been since Maxwell, but it is not a tile based deferred renderer. It’s a tile based immediate mode renderer. As are more recent AMD GPUs (from Vega on). So it does not have HSR (hidden surface removal), aka the hardware Z prepass. Because Maxwell is based on a desktop class GPU, it doesn’t really have any problems with alpha testing.
    Last edited: Feb 28, 2021
    Desoxi, Dorodo and Shinyclef like this.