Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

GPU driven rendering with SRP: No DrawProceduralIndirectNow for CommandBuffers?

Discussion in 'General Graphics' started by dotmos, Jun 28, 2022.

  1. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    I am currently in the process of writing a custom GPU driven SRP and have a problem with reducing SetPass calls as there seem to be no methods for drawing meshes via a CommandBuffer without calling SetPass each time.

    TL;DR; I need DrawProceduralIndirectNow for CommandBuffers to render lots of different meshes with only one SetPass call.

    The algorithm:
    I combine all my meshes into one big mesh ("Mesh-atlas") and sent it to the GPU via GraphicsBuffers. Rendering is then done by creating an args buffer with correct vertex/index offsets and calling CommandBuffer.DrawProceduralIndirect with the correct offset for each mesh inside the "mesh-atlas", always supplying the same material and pass.

    The problem:
    At the moment, whenever i call CommandBuffer.DrawProceduralIndirect the total SetPass count is increased by one, even if the material<->mesh combination and pass index stays the same. I clearly do not want that. I want to render most of my scene with one SetPass call. This seems like a bug to me, as there is no need to call SetPass again.

    The workaround(?)
    I work around this issue by not calling CommandBuffer.DrawProceduralIndirect in my SRP's Render() function, but calling Graphics.DrawProceduralIndirectNow in RenderPipelineManager.endCameraRendering instead. This works and i can now render everything with only one SetPass call.
    But i have the strong feeling that this is not the correct way to do it and i will get into trouble on the long run when trying to draw the shadow pass or transparent objects and trying to fire computeshaders and async gpu fences in between to handle everything else (gpu culling, light binning, decals, tiled rendering, etc.).
    From my understanding of Unity, this should all be done in a CommandBuffer to make sure everything is in correct order. "Manually" calling Graphics.DrawProceduralIndirectNow in between sounds very risky to me.

    Am i overlooking something? Is there a method to draw meshes with a GPU driven/indirect approach with CommandBuffers which will not call SetPass everytime?
     
    Last edited: Jul 22, 2022
  2. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    648
    If you are doing GPU driven rendering, why do you need to call DrawProceduralIndirect more than once per render pass with the same material?
     
  3. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    Thanks for your reply:)

    Afaik Unity does not expose MultiDrawIndirect so i have to emulate it by manually calling DrawProceduralIndirect for each mesh.
    This results in driver overhead but i have to do that anyways, as Intel GPUs do not support MultiDrawIndirect (afaik).
    Once i have that working, i can think about creating native plugins for MultiDrawIndirect access on PCs with Nvidia/AMD GPUs as well as cconsoles
    Or am i missing something here? :)
     
  4. joshuacwilde

    joshuacwilde

    Joined:
    Feb 4, 2018
    Posts:
    648
    Ah ok, if you are only culling at a per-object level, then yes makes sense. I was assuming you were wanting to do finer culling (cluster, triangle), in which case you would only need 1 draw call per material, but that's a different strategy.

    As for why 1 set pass per draw call (instead of per material), I have experienced similar, iirc, and would also be interested to hear what's going on.
     
  5. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    Last edited: Jul 22, 2022
  6. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    329
  7. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,609
    Each DrawProceduralIndirect call is one GPU draw call. Unity doesn't (and can't) batch them together, it's up to you to make sure you draw as much stuff as possible in a single DrawProceduralIndirect call.
     
  8. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    Thanks for posting these :)
    I had a quick look at BRG in the past and it seems this is just an abstraction layer on top of DrawMeshInstanced and to get rid of GameObjects for rendering.
    We are not using GameObjects (except for UI). We are also not using DOTS for that matter but a custom ecs solution.
    We also created our own abstraction layer for DrawMeshInstancedIndirect in the past but with our next project we want to go one step further by using DrawProceduralIndirect or MultiDrawIndirect / ExecuteIndirect if Unity implements it.

    I will have another look at BRG again and will also look closely at the threads you posted. Maybe i missed something. Thanks again. :)



    Thanks for your answer.
    Please note that i am talking about SetPass calls and not batches. :)
    When using CommandBuffer.DrawMeshProceduralIndirect i get one SetPass call each time i use it. Note that the material is always the same and there should be no need to call SetPass again.
    If i use Graphics.DrawProceduralIndirectNow i only get 1 SetPass call for everything, which is what i want but there is no DrawProceduralIndirectNow for CommandBuffers.
    Please have a look at the attached screenshots to see what i mean :)

    Graphics.DrawProceduralIndirectNow.JPG
    First screenshot shows Graphics.DrawProceduralIndirectNow with only 1 SetPass call for all meshes. (Another one for the sky)

    CommandBuffer.JPG
    Second screenshot shows CommandBuffer.DrawProceduralIndirect with 67 SetPass calls for all meshes and another one for the sky.
     
    Last edited: Jul 22, 2022
  9. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,609
    Ah, I see. SetPass is basically Unity's way of telling "sending material parameters to the GPU before a draw call". I'm actually surprised it does only one for multiple Graphics.DrawProceduralIndirectNow.
     
  10. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    Yes, this is a major advantage of that function as you save the SetPass call if the material/pass does not change between calls. You basically have to manually tell Unity which material and pass it should use, saving you a lot of cpu time if you have many different meshes. :)
    MultiDrawIndirect / ExecuteIndirect would be even faster.

    From the DrawProceduralIndirectNow documentation

     
  11. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    I had a look at BRG again and it is not what i am looking for as it is a CPU driven approach (at least at the moment).

    HOWEVER, while reading through the second thread, i stumbled upon Graphics.RenderMeshIndirect which is mentioned there by user Jes28 and it seems this is Unity's wrapper for MultiDrawIndirect/ExecuteIndirect. If this is the case, this is a big step forward! I would then still need a version for CommandBuffers, but this would already be great news.

    I will give this a try now and report back :)
     
  12. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    Just tried it and it seems to do what MultiDrawIndirect/ExecuteIndirect is supposed to do :)

    On DX11 it uses a fallback/software-emulation which is to be expected, as DX11 does not officially support it (afaik). There are DX11 vendor specific extensions for mdi though.
    But since we are not planning to use DX11, this is ok for us.

    I have not yet checked my sample on PS5 and SeriesX, but i expect them to also support this in hardware and no software-emulation should be needed.

    The only thing missing now is to implement RenderMeshIndirect for CommandBuffers.
    @Unity: pretty please? :)

    DX12:
    DX12.JPG

    Vulkan:
    Vulkan.JPG

    DX11:
    DX11.JPG

    Frame Debugger (Vulkan):
    FrameDebugger.JPG


    All in all, this is pretty cool and will probably save us from having to implement Mesh Cluster Rendering
     
  13. c0d3_m0nk3y

    c0d3_m0nk3y

    Joined:
    Oct 21, 2021
    Posts:
    329
    I don't think that RenderMeshIndirect is the same as MDI. RenderMeshIndirect can only render one mesh multiple times whereas MDI can render different meshes with one draw call.

    The only difference to RenderMeshInstanced is that baseVertexIndex, indexCountPerIntance, instanceCount, startIndex and startInstance come from a compute buffer. You can have multiple instanced draw calls in the compute buffer but they all use the same mesh and shader.

    This should be possible in DX11 as well.
    https://docs.microsoft.com/en-us/wi...d11devicecontext-drawindexedinstancedindirect

    This method also exists for CommandBuffers, it's just called slightly different:
    https://docs.unity3d.com/ScriptReference/Rendering.CommandBuffer.DrawMeshInstancedIndirect.html
     
    Last edited: Jul 26, 2022
    joshuacwilde likes this.
  14. dotmos

    dotmos

    Joined:
    Jan 2, 2011
    Posts:
    39
    You render multiple meshes with RenderMeshIndirect by supplying a single "mega-mesh"/"mesh-atlas" which contains all the meshes you want to render. You then pick the correct mesh from the mega-mesh by supplying the correct offsets to the graphicsBuffer and setting commandCount to the amount of different meshes you want to render.
    So basically bind all the mesh data once and then pick the individual mesh via the GraphicsBuffer and commandCount supplied to RenderMeshIndirect.

    Quick and dirty example for that:
    Code (CSharp):
    1. multiDrawCommandsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.IndirectArguments, meshes.Count, GraphicsBuffer.IndirectDrawIndexedArgs.size);
    2.             multiDrawCommands = new GraphicsBuffer.IndirectDrawIndexedArgs[meshes.Count];
    3.  
    4.             //Create merged mesh
    5.             if (mergedMesh != null)
    6.             {
    7.                 UnityEngine.Object.Destroy(mergedMesh);
    8.             }
    9.             mergedMesh = new Mesh();
    10.  
    11.             int vertexCount = 0;
    12.             int indexCount = 0;
    13.             foreach (Mesh m in meshes)
    14.             {
    15.                 vertexCount += m.vertexCount;
    16.                 indexCount += (int)m.triangles.Length;
    17.             }
    18.  
    19.             //Create merged mesh and multDrawCommands
    20.             Vector3[] vertices = new Vector3[vertexCount];
    21.             int[] indices = new int[indexCount];
    22.             Vector3[] normals = new Vector3[vertexCount];
    23.             int currentVertexCount = 0;
    24.             int currentIndexCount = 0;
    25.             for (int i = 0; i < meshes.Count; i++)
    26.             {
    27.                 Mesh m = meshes[i];
    28.                 Array.Copy(m.vertices, 0, vertices, currentVertexCount, m.vertexCount);
    29.                 Array.Copy(m.triangles, 0, indices, currentIndexCount, m.triangles.Length);
    30.                 Array.Copy(m.normals, 0, normals, currentVertexCount, m.vertexCount);
    31.  
    32.                 multiDrawCommands[i].baseVertexIndex = (uint)currentVertexCount;
    33.                 multiDrawCommands[i].indexCountPerInstance = (uint)m.triangles.Length;
    34.                 multiDrawCommands[i].instanceCount = instanceCount;
    35.                 multiDrawCommands[i].startIndex = (uint)currentIndexCount;
    36.                 multiDrawCommands[i].startInstance = (uint)(i * instanceCount);
    37.  
    38.                 currentVertexCount += m.vertexCount;
    39.                 currentIndexCount += (int)m.triangles.Length;
    40.             }
    41.             //mergedMesh.SetVertices(vertices);
    42.             //mergedMesh.SetIndices(indices, MeshTopology.Triangles, 0);
    43.             //mergedMesh.SetNormals(normals);
    44.             mergedMesh.vertices = vertices;
    45.             mergedMesh.triangles = indices;
    46.             mergedMesh.normals = normals;
    47.             mergedMesh.RecalculateTangents();
    48.  
    49.             multiDrawCommandsBuffer.SetData(multiDrawCommands);
    You can also fill the GraphicsBuffer on the GPU via a ComputeShader if you want to do culling on the GPU.


    And then call this somewhere else to actually render everything in one go:
    Code (CSharp):
    1. RenderParams rp = new RenderParams(material);
    2.             rp.worldBounds = new Bounds(Vector3.zero, 10000 * Vector3.one); // use tighter bounds for better FOV culling
    3.             rp.matProps = new MaterialPropertyBlock();
    4.  
    5.             Graphics.RenderMeshIndirect(rp, mergedMesh, multiDrawCommandsBuffer, meshes.Count);

    And a quick and dirty test shader to render everything, which is basically the one from the documentation. Please note that this is still using the old CG syntax and not the "new" HLSL syntax which should be used for the SRP:
    Code (CSharp):
    1. Shader "Custom/UberTweaked"
    2. {
    3.     Properties
    4.     {
    5.  
    6.     }
    7.  
    8.     SubShader
    9.     {
    10.         Tags {
    11.             "RenderType" = "Opaque"
    12.             "LightMode" = "SRPDefaultUnlit"
    13.         }
    14.  
    15.         Pass
    16.         {
    17.             CGPROGRAM
    18.             #pragma target 4.5
    19.             #pragma vertex vert
    20.             #pragma fragment frag
    21.  
    22.             #define UNITY_INDIRECT_DRAW_ARGS IndirectDrawIndexedArgs
    23.             #include "UnityIndirect.cginc"
    24.  
    25.             struct appdata
    26.             {
    27.                 float4 vertex : POSITION;
    28.                 float3 normals : NORMAL;
    29.                 uint svInstanceID : SV_InstanceID;
    30.                 //uint svVertexID : SV_VertexID;
    31.             };
    32.  
    33.             struct v2f
    34.             {
    35.                 float4 pos : SV_POSITION;
    36.                 float4 color : COLOR0;
    37.                 float3 worldNormal : TEXCOORD0;
    38.             };
    39.  
    40.             v2f vert(appdata v)
    41.             {
    42.                 InitIndirectDrawArgs(0);
    43.                 v2f o;
    44.                 uint cmdID = GetCommandID(0);
    45.                 uint instanceID = GetIndirectInstanceID(v.svInstanceID);
    46.                 float4 wpos = mul(unity_ObjectToWorld, v.vertex + float4( (instanceID%10) * 15, cmdID * 8, (int)(instanceID / 10) * 15, 0));
    47.  
    48.                 o.pos = mul(UNITY_MATRIX_VP, wpos);
    49.                 o.color = v.vertex / 10;// v.svInstanceID;// float4(cmdID & 1 ? 0.0f : 1.0f, cmdID & 1 ? 1.0f : 0.0f, instanceID), 0.0f);
    50.  
    51.                 o.worldNormal = mul((float3x3)unity_ObjectToWorld, v.normals.xyz).xyz;
    52.                 return o;
    53.             }
    54.  
    55.             float4 frag(v2f i) : SV_Target
    56.             {
    57.                 float4 color = 1;
    58.                 color.xyz = dot(normalize(i.worldNormal.xyz), normalize(float3(1, 1, 0)));
    59.          
    60.                 return color;
    61.             }
    62.             ENDCG
    63.         }
    64.     }
    65. }


    I also had a look at the documentation again and chances are pretty high that it uses MDI under the hood on platforms that support it.
    From the documentation:
     
    Last edited: Aug 8, 2022
  15. kyriew

    kyriew

    Joined:
    Sep 10, 2019
    Posts:
    7
    Good job! But it would be better if it can render the entire static scene with 1 drawcall, so I'm confused about different materials.
     
  16. TumTumTree

    TumTumTree

    Joined:
    Mar 16, 2015
    Posts:
    5
    Hi, very interesting thread :)
    Have you found a way to implement InitIndirectDrawArgs(), GetCommandID() and GetIndirectInstanceID() in HLSL?. I'm using URP but I could not find anything in the documentation that would suggest how to properly implement this. Everything seems to work fine if I include "UnityIndirect.cginc" in my HLSL code but that does not seem like the right thing to do to me.
     
    TJHeuvel-net likes this.
  17. kyriew

    kyriew

    Joined:
    Sep 10, 2019
    Posts:
    7
    Maybe you can create a "UnityIndirect.hlsl" and copy into it.
     
  18. TumTumTree

    TumTumTree

    Joined:
    Mar 16, 2015
    Posts:
    5
    I suppose I could do that. I was hoping there would be a more officially supported way of doing it. But it'll do for now. Thanks :)
     
  19. Goularou

    Goularou

    Joined:
    Oct 19, 2018
    Posts:
    46
    Wow: probably the most useful post ever in this forum / thank you so much Sir dotmos for the incredible help !
    Still need to figure out how to include Matrix4x4 for the individual items, but this should be manageable. Not the first time I regret not to have invested time in understanding shaders...
    Again, super mega useful stuff!!!