Search Unity

Question Is there a way to control back-face culling per-instance during GPU instancing?

Discussion in 'General Graphics' started by NightElfik, Aug 20, 2022.

  1. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    I am drawing a lot of meshes using `DrawMeshInstancedIndirect`, however, some have a negative scale (flipped models) and these get rendered inside-out due to back-face culling.

    Is there a way to control per-instance back-face culling (say in vertex shader)? If not, what is the best way to solve this?

    Do I need to draw flipped meshes with a second call to `DrawMeshInstancedIndirect` with a different shader that has culling inverted? I am trying to avoid this since it adds a lot of complexity in bookkeeping buffers.

    I found this issue, sadly marked as "Won't fix", and I believe that would be useful to avoid the need for copy-pasting a shader just to invert its culling: https://issuetracker.unity3d.com/is...sh-drawn-using-graphics-dot-drawnmesh-to-flip

    I was also considering disabling back-face culling and using `facing : VFACE` parameter of fragment shader, but that will still add extra rasterization cost compared to the true back-face culling done by hardware. Would this even work in surface shaders?

    Any suggestions/tricks are welcome, thanks!
     
  2. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    I'm using it on surface shaders just fine.
    Just put
    Code (CSharp):
    1. half vface : VFACE;
    inside your surface/fragment input struct.

    As for the culling direction, you can use a material property to drive it, along with other things:
    Code (CSharp):
    1.  
    2. Shader "shadername"
    3. {
    4.    Properties
    5.    {
    6.     [Enum(UnityEngine.Rendering.CullMode)] _Cull ("Cull", Float) = 2
    7.     [Enum(UnityEngine.Rendering.BlendMode)] _SrcBlend ("Source Blend", Float) = 1
    8.     [Enum(UnityEngine.Rendering.BlendMode)] _DstBlend ("Dest Blend", Float) = 0
    9.     [Enum(UnityEngine.Rendering.CompareFunction)] _ZTest ("Z Test", Float) = 2
    10.     [Enum(Off, 0, On, 1)] _ZWrite ("Z Write", Float) = 1
    11.     }
    12.  
    13.     SubShader
    14.     {
    15.        Cull [_Cull]
    16.        ztest [_ZTest]
    17.        zwrite [_ZWrite]
    18.        Blend [_SrcBlend] [_DstBlend]
    19.  
    20.        CGPROGRAM
    21.  
    22.        ......
    23.  
    You can then use Material Property Blocks to modify that without breaking GPU instancing.

    But maybe this breaks instancing, since it's a render state change?
    I don't know for sure, I have never actually verified it.

    Using Material Property Blocks does break batching though. (different thing)

    Correction after several edits: Material Property Blocks DOES break batching, but it DOES NOT BREAK instancing as long as your material's shader properly supports GPU instancing and you have checked the "enable GPU instancing" option on the material inspector.
    (this topic is quite confusing... :p )
     
    Last edited: Aug 20, 2022
    Kreshi likes this.
  3. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    669
    You could do the backface culling in a geometry shader.

    However, be aware that geometry shaders are frowned upon because they can be slow. Probably still better than doing it in a pixel shader because the clip command disables early-z optimization.

    Another idea would be to double the vertex and index buffer so that you have both front- and back-facing polygons in it. You'd also have to add an vertex attribute to indicate whether a vertex belongs to clockwise or counter-clockwise winding. This would allow you to reject vertices in the vertex shader by setting SV_CullDistance to -1. Also, not entirely free, though, because the GPU would have to call the vertex shader twice as often.

    I'm wondering if you could use a compute shader to fill a compute buffer with the arguments for the indirect draw call. You'd still need two submeshes with different winding because you can't change the culling mode within a single draw call.
     
    Last edited: Aug 20, 2022
  4. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    I would stay away from geometry shaders if you want to publish for multi-platform, as they are poorly supported and are slow in general. It's a pity but vendors don't seem to like them much, hence the poor support.
    Apple's Metal is infamous for not supporting them.
     
  5. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    Thanks for the reply! Tthis is useful for making one shader that can support both culling modes, great tip!

    Now this is an interesting idea! However, AFAIK material property blocks are unable to change anything that changes the render state. You can change the value, but that does not affect the render state.

    And even if it did work, I don't think there is a way of applying material properties per-instance, is there? One call of `DrawMeshInstancedIndirect` will get material + property block and that will be applied to all rendered instances.

    Property blocks are useful when you need to do multiple calls with the same material but some minor changes. Even if this worked with the culling, I'd have to separate the normal and flipped instances to separate `DrawMeshInstancedIndirect` calls.

    When using `DrawMeshInstancedIndirect`, I don't think that batching is supported/considered. I also believe that "enable GPU instancing" on material has nothing to do with instancing using `DrawMeshInstancedIndirect`. All my materials have this off and it works fine.

    This is also interesting, thanks for the ideas! It just seems even more complex than doing the two draw calls from infrastructure point of view. The question is performance, two draw-calls with less instances per call, or one draw call with more instances that does extra work? Any intuition on this? This is really hard to compare without some benchmark and I bet that performance will vary across GPU vendors and generations.

    However, you mentioned that you can discard vertices from vertex shader by using SV_CullDistance? Can you elaborate how this works? I have many shaders where rejection in the vertex phase is used and what I do is just setting vertices to zeros so that resulting triangles have zero area and are discarded. Which one is better?
     
    Last edited: Aug 20, 2022
  6. atomicjoe

    atomicjoe

    Joined:
    Apr 10, 2013
    Posts:
    1,869
    Yeah, I meant that you have to program the shader to support it using CBUFFERs (UNITY_INSTANCING_BUFFER) for the properties. The "enable GPU instancing" toggle in the material just enables the instancing keyword to activate that shader variant. I guess it's implicit with DrawMeshInstancedIndirect.
    And yeah, batching is only a thing when using Unity's Mesh Renderers.
    I just wanted to clarify things because I had edited my answer like 3 times :p
     
  7. NightElfik

    NightElfik

    Joined:
    Oct 27, 2014
    Posts:
    27
    Oh, I probably should have posted the shader for reference, here is a simplified but functional one:

    Code (csharp):
    1. Shader "InstancedShader" {
    2.     Properties {
    3.         _MainTex("Albedo (RGB)", 2D) = "black" {}
    4.     }
    5.     SubShader {
    6.         Tags {"Queue" = "Geometry" "RenderType" = "Opaque" }
    7.  
    8.         CGPROGRAM
    9.         #pragma surface surf Standard fullforwardshadows addshadow
    10.         #pragma multi_compile_instancing
    11.         #pragma instancing_options procedural:setupInstancing assumeuniformscaling
    12.  
    13.         sampler2D _MainTex;
    14.  
    15.         struct Input {
    16.             float2 uv_MainTex;
    17.         };
    18.  
    19.         struct InstanceData {
    20.             float3 position;
    21.             float scaleX; // either +1 or -1
    22.         };
    23.  
    24. #ifdef UNITY_PROCEDURAL_INSTANCING_ENABLED
    25.         StructuredBuffer<InstanceData> _InstanceData;
    26.  
    27.         void setupInstancing() {
    28.             InstanceData data = _InstanceData[unity_InstanceID];
    29.             unity_ObjectToWorld._11_21_31_41 = float4(data.scaleX, 0.0, 0.0, 0.0);
    30.             unity_ObjectToWorld._12_22_32_42 = float4(0.0, 1.0, 0.0, 0.0);
    31.             unity_ObjectToWorld._13_23_33_43 = float4(0.0, 0.0, 1.0, 0.0);
    32.             unity_ObjectToWorld._14_24_34_44 = float4(data.position, 1.0);
    33.         }
    34. #endif
    35.  
    36.         void surf (Input IN, inout SurfaceOutputStandard o) {
    37.             o.Albedo = tex2D(_MainTex, IN.uv_MainTex).rgb;
    38.         }
    39.  
    40.         ENDCG
    41.     }
    42. }
     
    Last edited: Aug 21, 2022
  8. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    669
    If you have - lets say - 1000 draw calls but only one particular mesh has positive and negative scale, you'd end up with only 1001 draw calls. However, if you have 1000 instanced draw calls and all of them have positive and negative scale, you'd end up with 2000 draw calls. Usually, the first case is much more likely, so it's probably not worth doing.

    SV_CullDistance and SV_ClipDistance are special shader semantics for custom clip/cull planes to be used in addition to the frustum clip planes. It's the distance of each vertex to the clip plane. If all 3 vertices of a triangle have negative cull distance, the triangle will be culled. If some but not all vertices have negative clip and cull distances, the triangle will be clipped but not culled.

    Planes are usually given in the form dot(n, v) + d = 0, where n is the normal of the plane, v is a vertex position to test and d is the negative distance of the plane from the origin along the normal. dot(n, v) is the distance of v along the normal vector n, so dot(n, v) + d will be the desired distance of v from the plane.

    But in your case, you wouldn't have actual clip planes, you'd simply set the value to a positive or negative number. Probably the same as making zero area triangles, performance-wise. Alternatively, you can also move the triangle to the negative side of the near plane in clip space, where they are culled as well (moving them to the view-space origin should do the trick for a perspective projection). Personally, I'd prefer a culled triangle over a degenerate one but performance-wise it's probably the same. There are many ways to skin a cat ;)

    https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/user-clip-planes-on-10level9
     
    NightElfik likes this.