Search Unity

General questions regarding Early-Z

Discussion in 'General Graphics' started by Dorodo, Feb 27, 2021.

  1. Dorodo

    Dorodo

    Joined:
    Mar 8, 2015
    Posts:
    44
    I've been taking a second look at some of the steps on the GPU pipeline and one of the things that has been on my mind is regarding Early-Z, since it seems like a deep topic and one that can be extremely useful for performance, so I decided to raise some questions (which may not make any sense at all or mix unrelated topics, so I apologize in advance for any confusion or lack of knowledge on the matter).


    1) Does Early-Z get disabled for the entire rasterization process if there is a single object altering the Depth buffer or does it only affect the pixels which have the depth modified in the fragment stage?

    2) Does changing the ClipPos.z value break Early-Z?

    3) Are there any operations other than clip() that can discard a pixel on the fragment stage and disable Early-Z optimizations?

    4) Regarding Z-Prepass, the idea seems to imply that it renders additional drawcalls (although extremely lightweight). Does Early-Z depend on it in order to work? Could it actually become a bottleneck in your pipeline due to the initial pre-pass costs?

    5) Is there any way to check if Early-Z is working on a Device?

    6) Does Stencil Masking affect Early-Z?

    7) I've seen some examples of games utilizing dithering on transparent objects, such as Super Mario Odyssey (below). If this is an alpha-test, Wouldn't this kill early-z optimizations or is it not as big of a deal on console architectures? (I imagine the Switch utilizes a tile-based renderer similar to mobile GPUs, so that adds an extra layer of confusion to me)




    Thank you in advance :) .
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Early Z only gets disabled for that object. However on some (mainly older mobile) GPUs it also disables some depth buffer optimizations making all subsequent uses of the depth buffer slightly slower. That is something you can basically ignore for modern GPUs.

    No. Early Z happens after the vertex shader. Nothing you do on the vertex shader stage will have any affect on if early Z happens or not. Only render state and the fragment shader can affect that.

    discard


    But that’s not really fair since
    clip(x)
    is
    if(x<0) discard;
    .

    Also
    SV_Depth
    and its variants, if you set the output to a value greater than 1.0 or less than 0.0.
    edit: The use of
    SV_Depth
    always disables early Z, regardless of the output value.
    SV_DepthGreaterEqual
    and
    SV_DepthLessEqual
    can be used to output depth and keep early Z.


    And
    SV_Coverage
    which lets you control the coverage bits for MSAA to do manual alpha to coverage.

    And
    SV_StencilRef
    since early Z is done at the same time as early stencil rejection, so both get disabled if the fragment shader modifies the reference value.

    There’s probably some DX12 ones I’m missing.

    The idea behind a Z prepass is to fill in the depth buffer with cheap draws so that you avoid over shading (leveraging early z on the subsequent “full fat” draws). There was a frame breakdown of Cyberpunk 2077 that showed they used a Z prepass that only rendered a handful of visible meshes, presumably only those that are fully opaque and were considered big enough to block a lot. Other renderers, like Unity’s HDRP, ended up adding a Z prepass to make grass rendering more efficient, so they render everything that uses ZWrite into the Z prepass. And yes, there’s no perfect solution. You can eventually spend more time rendering the Z prepass than the savings you get from having one.

    GPUs that use tile based deferred renderering, like Mali and Adreno, are basically doing hardware level Z prepass btw. They basically made the choice to always do a Z prepass for everything ... except they don’t because they can also swap to immediate mode rendering per tile if it doesn’t think it’ll be faster to do TBDR for that tile. Like if there’s only one or two triangles visible, or none of the triangles write to depth.

    Within Unity? Not really. RenderDoc also doesn’t seem to have a way to check. I recently asked the author of RenderDoc about this exact topic and he said he had no idea how he’d be able to add it. I think Nvidia Nsight can show it, and I would expect tools like Snapdragon Profiler to as well. I’ve been meaning to check if Nsight does or not but I seem to remember seeing a flag for earlydepthstencil in there someplace.

    As noted above, stencil masking happens at the same time as early z. (Ignoring the
    SV_StencilRef
    case.) The depth buffer is more accurately the depth stencil buffer, and when it’s checking the depth value and comparison state for early rejection it simultaneously checks the stencil value and comparison state.

    Dithering can often be cheaper than alpha blending even on some mobile hardware. The Tegra X1 in the Switch is based on the desktop Maxwell GPU family (desktop Nvidia GTX 700 series). It is tile based, and all Nvidia GPUs have been since Maxwell, but it is not a tile based deferred renderer. It’s a tile based immediate mode renderer. As are more recent AMD GPUs (from Vega on). So it does not have HSR (hidden surface removal), aka the hardware Z prepass. Because Maxwell is based on a desktop class GPU, it doesn’t really have any problems with alpha testing.
     
    Last edited: Feb 28, 2021
  3. HatteFox

    HatteFox

    Joined:
    Jan 20, 2021
    Posts:
    12
    Hello,could you please tell me will the depth be written immediately after early z passed?
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    For fully opaque shaders that do not fall under the cases listed above, yes. The z depth is written immediately after passing the early z test, before or maybe in parallel to the fragment shader invokaction.
     
  5. kite3h

    kite3h

    Joined:
    Aug 27, 2012
    Posts:
    197
    EarlyZ and HiZ are tested before entering the rasterizer. It is added to correct unnecessary overhead because the existing depth test is performed after rasterinzinfg. However, you need to use a depth prepass to get the effect right. Because depth prepass uses only position and does not use any additional data, it can be processed at high speed, and high-performance memory is intentionally used as a target for depth prepass.
    The problem is in the case of cut off materials. For opaque materials, textures are not used for depth prepass, but for cutoff materials, transparent textures are used. If the capacity of this texture increases, a cache miss occurs, which affects the depth prepass, which should be processed at high speed.
    So, AAA-class games combine the transparent masks of the cutoff materials into one and lower the resolution as much as possible, limiting the size to the size that can be uploaded to the cache at once.
    However, doing so causes jitter in the cutoff, so the transparent mask is converted to SDF.
    Once you have done that in the depth prepass, earlyZ is not turned off because you do not change the depth value with cutoff using ZTest equal in the material pass.