Search Unity

GPU driven rendering with SRP: No DrawProceduralIndirectNow for CommandBuffers?

Discussion in 'General Graphics' started by dotmos, Jun 28, 2022.

  1. dotmos


    Jan 2, 2011
    I am currently in the process of writing a custom GPU driven SRP and have a problem with reducing SetPass calls as there seem to be no methods for drawing meshes via a CommandBuffer without calling SetPass each time.

    TL;DR; I need DrawProceduralIndirectNow for CommandBuffers to render lots of different meshes with only one SetPass call.

    The algorithm:
    I combine all my meshes into one big mesh ("Mesh-atlas") and sent it to the GPU via GraphicsBuffers. Rendering is then done by creating an args buffer with correct vertex/index offsets and calling CommandBuffer.DrawProceduralIndirect with the correct offset for each mesh inside the "mesh-atlas", always supplying the same material and pass.

    The problem:
    At the moment, whenever i call CommandBuffer.DrawProceduralIndirect the total SetPass count is increased by one, even if the material<->mesh combination and pass index stays the same. I clearly do not want that. I want to render most of my scene with one SetPass call. This seems like a bug to me, as there is no need to call SetPass again.

    The workaround(?)
    I work around this issue by not calling CommandBuffer.DrawProceduralIndirect in my SRP's Render() function, but calling Graphics.DrawProceduralIndirectNow in RenderPipelineManager.endCameraRendering instead. This works and i can now render everything with only one SetPass call.
    But i have the strong feeling that this is not the correct way to do it and i will get into trouble on the long run when trying to draw the shadow pass or transparent objects and trying to fire computeshaders and async gpu fences in between to handle everything else (gpu culling, light binning, decals, tiled rendering, etc.).
    From my understanding of Unity, this should all be done in a CommandBuffer to make sure everything is in correct order. "Manually" calling Graphics.DrawProceduralIndirectNow in between sounds very risky to me.

    Am i overlooking something? Is there a method to draw meshes with a GPU driven/indirect approach with CommandBuffers which will not call SetPass everytime?
    Last edited: Jul 22, 2022
    Thomas-Mountainborn likes this.
  2. joshuacwilde


    Feb 4, 2018
    If you are doing GPU driven rendering, why do you need to call DrawProceduralIndirect more than once per render pass with the same material?
  3. dotmos


    Jan 2, 2011
    Thanks for your reply:)

    Afaik Unity does not expose MultiDrawIndirect so i have to emulate it by manually calling DrawProceduralIndirect for each mesh.
    This results in driver overhead but i have to do that anyways, as Intel GPUs do not support MultiDrawIndirect (afaik).
    Once i have that working, i can think about creating native plugins for MultiDrawIndirect access on PCs with Nvidia/AMD GPUs as well as cconsoles
    Or am i missing something here? :)
  4. joshuacwilde


    Feb 4, 2018
    Ah ok, if you are only culling at a per-object level, then yes makes sense. I was assuming you were wanting to do finer culling (cluster, triangle), in which case you would only need 1 draw call per material, but that's a different strategy.

    As for why 1 set pass per draw call (instead of per material), I have experienced similar, iirc, and would also be interested to hear what's going on.
  5. dotmos


    Jan 2, 2011
    Last edited: Jul 22, 2022
  6. c0d3_m0nk3y


    Oct 21, 2021
  7. Neto_Kokku


    Feb 15, 2018
    Each DrawProceduralIndirect call is one GPU draw call. Unity doesn't (and can't) batch them together, it's up to you to make sure you draw as much stuff as possible in a single DrawProceduralIndirect call.
  8. dotmos


    Jan 2, 2011
    Thanks for posting these :)
    I had a quick look at BRG in the past and it seems this is just an abstraction layer on top of DrawMeshInstanced and to get rid of GameObjects for rendering.
    We are not using GameObjects (except for UI). We are also not using DOTS for that matter but a custom ecs solution.
    We also created our own abstraction layer for DrawMeshInstancedIndirect in the past but with our next project we want to go one step further by using DrawProceduralIndirect or MultiDrawIndirect / ExecuteIndirect if Unity implements it.

    I will have another look at BRG again and will also look closely at the threads you posted. Maybe i missed something. Thanks again. :)

    Thanks for your answer.
    Please note that i am talking about SetPass calls and not batches. :)
    When using CommandBuffer.DrawMeshProceduralIndirect i get one SetPass call each time i use it. Note that the material is always the same and there should be no need to call SetPass again.
    If i use Graphics.DrawProceduralIndirectNow i only get 1 SetPass call for everything, which is what i want but there is no DrawProceduralIndirectNow for CommandBuffers.
    Please have a look at the attached screenshots to see what i mean :)

    First screenshot shows Graphics.DrawProceduralIndirectNow with only 1 SetPass call for all meshes. (Another one for the sky)

    Second screenshot shows CommandBuffer.DrawProceduralIndirect with 67 SetPass calls for all meshes and another one for the sky.
    Last edited: Jul 22, 2022
  9. Neto_Kokku


    Feb 15, 2018
    Ah, I see. SetPass is basically Unity's way of telling "sending material parameters to the GPU before a draw call". I'm actually surprised it does only one for multiple Graphics.DrawProceduralIndirectNow.
  10. dotmos


    Jan 2, 2011
    Yes, this is a major advantage of that function as you save the SetPass call if the material/pass does not change between calls. You basically have to manually tell Unity which material and pass it should use, saving you a lot of cpu time if you have many different meshes. :)
    MultiDrawIndirect / ExecuteIndirect would be even faster.

    From the DrawProceduralIndirectNow documentation

    joshuacwilde likes this.
  11. dotmos


    Jan 2, 2011
    I had a look at BRG again and it is not what i am looking for as it is a CPU driven approach (at least at the moment).

    HOWEVER, while reading through the second thread, i stumbled upon Graphics.RenderMeshIndirect which is mentioned there by user Jes28 and it seems this is Unity's wrapper for MultiDrawIndirect/ExecuteIndirect. If this is the case, this is a big step forward! I would then still need a version for CommandBuffers, but this would already be great news.

    I will give this a try now and report back :)
  12. dotmos


    Jan 2, 2011
    Just tried it and it seems to do what MultiDrawIndirect/ExecuteIndirect is supposed to do :)

    On DX11 it uses a fallback/software-emulation which is to be expected, as DX11 does not officially support it (afaik). There are DX11 vendor specific extensions for mdi though.
    But since we are not planning to use DX11, this is ok for us.

    I have not yet checked my sample on PS5 and SeriesX, but i expect them to also support this in hardware and no software-emulation should be needed.

    The only thing missing now is to implement RenderMeshIndirect for CommandBuffers.
    @Unity: pretty please? :)




    Frame Debugger (Vulkan):

    All in all, this is pretty cool and will probably save us from having to implement Mesh Cluster Rendering
  13. c0d3_m0nk3y


    Oct 21, 2021
    I don't think that RenderMeshIndirect is the same as MDI. RenderMeshIndirect can only render one mesh multiple times whereas MDI can render different meshes with one draw call.

    The only difference to RenderMeshInstanced is that baseVertexIndex, indexCountPerIntance, instanceCount, startIndex and startInstance come from a compute buffer. You can have multiple instanced draw calls in the compute buffer but they all use the same mesh and shader.

    This should be possible in DX11 as well.

    This method also exists for CommandBuffers, it's just called slightly different:
    Last edited: Jul 26, 2022
    joshuacwilde likes this.
  14. dotmos


    Jan 2, 2011
    You render multiple meshes with RenderMeshIndirect by supplying a single "mega-mesh"/"mesh-atlas" which contains all the meshes you want to render. You then pick the correct mesh from the mega-mesh by supplying the correct offsets to the graphicsBuffer and setting commandCount to the amount of different meshes you want to render.
    So basically bind all the mesh data once and then pick the individual mesh via the GraphicsBuffer and commandCount supplied to RenderMeshIndirect.

    Quick and dirty example for that:
    Code (CSharp):
    1. multiDrawCommandsBuffer = new GraphicsBuffer(GraphicsBuffer.Target.IndirectArguments, meshes.Count, GraphicsBuffer.IndirectDrawIndexedArgs.size);
    2.             multiDrawCommands = new GraphicsBuffer.IndirectDrawIndexedArgs[meshes.Count];
    4.             //Create merged mesh
    5.             if (mergedMesh != null)
    6.             {
    7.                 UnityEngine.Object.Destroy(mergedMesh);
    8.             }
    9.             mergedMesh = new Mesh();
    11.             int vertexCount = 0;
    12.             int indexCount = 0;
    13.             foreach (Mesh m in meshes)
    14.             {
    15.                 vertexCount += m.vertexCount;
    16.                 indexCount += (int)m.triangles.Length;
    17.             }
    19.             //Create merged mesh and multDrawCommands
    20.             Vector3[] vertices = new Vector3[vertexCount];
    21.             int[] indices = new int[indexCount];
    22.             Vector3[] normals = new Vector3[vertexCount];
    23.             int currentVertexCount = 0;
    24.             int currentIndexCount = 0;
    25.             for (int i = 0; i < meshes.Count; i++)
    26.             {
    27.                 Mesh m = meshes[i];
    28.                 Array.Copy(m.vertices, 0, vertices, currentVertexCount, m.vertexCount);
    29.                 Array.Copy(m.triangles, 0, indices, currentIndexCount, m.triangles.Length);
    30.                 Array.Copy(m.normals, 0, normals, currentVertexCount, m.vertexCount);
    32.                 multiDrawCommands[i].baseVertexIndex = (uint)currentVertexCount;
    33.                 multiDrawCommands[i].indexCountPerInstance = (uint)m.triangles.Length;
    34.                 multiDrawCommands[i].instanceCount = instanceCount;
    35.                 multiDrawCommands[i].startIndex = (uint)currentIndexCount;
    36.                 multiDrawCommands[i].startInstance = (uint)(i * instanceCount);
    38.                 currentVertexCount += m.vertexCount;
    39.                 currentIndexCount += (int)m.triangles.Length;
    40.             }
    41.             //mergedMesh.SetVertices(vertices);
    42.             //mergedMesh.SetIndices(indices, MeshTopology.Triangles, 0);
    43.             //mergedMesh.SetNormals(normals);
    44.             mergedMesh.vertices = vertices;
    45.             mergedMesh.triangles = indices;
    46.             mergedMesh.normals = normals;
    47.             mergedMesh.RecalculateTangents();
    49.             multiDrawCommandsBuffer.SetData(multiDrawCommands);
    You can also fill the GraphicsBuffer on the GPU via a ComputeShader if you want to do culling on the GPU.

    And then call this somewhere else to actually render everything in one go:
    Code (CSharp):
    1. RenderParams rp = new RenderParams(material);
    2.             rp.worldBounds = new Bounds(, 10000 *; // use tighter bounds for better FOV culling
    3.             rp.matProps = new MaterialPropertyBlock();
    5.             Graphics.RenderMeshIndirect(rp, mergedMesh, multiDrawCommandsBuffer, meshes.Count);

    And a quick and dirty test shader to render everything, which is basically the one from the documentation. Please note that this is still using the old CG syntax and not the "new" HLSL syntax which should be used for the SRP:
    Code (CSharp):
    1. Shader "Custom/UberTweaked"
    2. {
    3.     Properties
    4.     {
    6.     }
    8.     SubShader
    9.     {
    10.         Tags {
    11.             "RenderType" = "Opaque"
    12.             "LightMode" = "SRPDefaultUnlit"
    13.         }
    15.         Pass
    16.         {
    17.             CGPROGRAM
    18.             #pragma target 4.5
    19.             #pragma vertex vert
    20.             #pragma fragment frag
    22.             #define UNITY_INDIRECT_DRAW_ARGS IndirectDrawIndexedArgs
    23.             #include "UnityIndirect.cginc"
    25.             struct appdata
    26.             {
    27.                 float4 vertex : POSITION;
    28.                 float3 normals : NORMAL;
    29.                 uint svInstanceID : SV_InstanceID;
    30.                 //uint svVertexID : SV_VertexID;
    31.             };
    33.             struct v2f
    34.             {
    35.                 float4 pos : SV_POSITION;
    36.                 float4 color : COLOR0;
    37.                 float3 worldNormal : TEXCOORD0;
    38.             };
    40.             v2f vert(appdata v)
    41.             {
    42.                 InitIndirectDrawArgs(0);
    43.                 v2f o;
    44.                 uint cmdID = GetCommandID(0);
    45.                 uint instanceID = GetIndirectInstanceID(v.svInstanceID);
    46.                 float4 wpos = mul(unity_ObjectToWorld, v.vertex + float4( (instanceID%10) * 15, cmdID * 8, (int)(instanceID / 10) * 15, 0));
    48.                 o.pos = mul(UNITY_MATRIX_VP, wpos);
    49.                 o.color = v.vertex / 10;// v.svInstanceID;// float4(cmdID & 1 ? 0.0f : 1.0f, cmdID & 1 ? 1.0f : 0.0f, instanceID), 0.0f);
    51.                 o.worldNormal = mul((float3x3)unity_ObjectToWorld,;
    52.                 return o;
    53.             }
    55.             float4 frag(v2f i) : SV_Target
    56.             {
    57.                 float4 color = 1;
    58.        = dot(normalize(, normalize(float3(1, 1, 0)));
    60.                 return color;
    61.             }
    62.             ENDCG
    63.         }
    64.     }
    65. }

    I also had a look at the documentation again and chances are pretty high that it uses MDI under the hood on platforms that support it.
    From the documentation:
    Last edited: Aug 8, 2022
  15. kyriew


    Sep 10, 2019
    Good job! But it would be better if it can render the entire static scene with 1 drawcall, so I'm confused about different materials.
  16. TumTumTree


    Mar 16, 2015
    Hi, very interesting thread :)
    Have you found a way to implement InitIndirectDrawArgs(), GetCommandID() and GetIndirectInstanceID() in HLSL?. I'm using URP but I could not find anything in the documentation that would suggest how to properly implement this. Everything seems to work fine if I include "UnityIndirect.cginc" in my HLSL code but that does not seem like the right thing to do to me.
    TJHeuvel-net likes this.
  17. kyriew


    Sep 10, 2019
    Maybe you can create a "UnityIndirect.hlsl" and copy into it.
  18. TumTumTree


    Mar 16, 2015
    I suppose I could do that. I was hoping there would be a more officially supported way of doing it. But it'll do for now. Thanks :)
  19. Goularou


    Oct 19, 2018
    Wow: probably the most useful post ever in this forum / thank you so much Sir dotmos for the incredible help !
    Still need to figure out how to include Matrix4x4 for the individual items, but this should be manageable. Not the first time I regret not to have invested time in understanding shaders...
    Again, super mega useful stuff!!!
    dotmos likes this.
  20. MrDaveSh


    Apr 15, 2020
    Any chance to have that working in hdrp and shadergraph?
  21. TJHeuvel-net


    Jul 31, 2012
    Thank you for this great example! I wanted to ask; here you mention startInstance etc being passed but i cannot get this to happen at all.

    Like most indirect approaches, i have a single buffer with perInstanceData (e.g. local to world matrices). I need to offset the instanceId for each draw to index into the right element.

    However it seems impossible to me to get the correct draw id, and startInstance is entirely ignored for me on DirectX platforms, just like you observed. Did you manage to work around this problem, or did i misunderstand your comment?

    i.e indexing like this does not work for the second draw:
  22. n3b


    Nov 16, 2014
    @TJHeuvel-net indirect draw call accepts MaterialPropertyBlock as an argument, where you can pass a starting offset (if you have it on CPU side) or a draw call id.
  23. TJHeuvel-net


    Jul 31, 2012
    Thanks! That would work great for d3d11, where DrawIndirect is used once per IndirectArg.
    However in d3d12 ExecuteIndirect is used, which could mean a single draw call for all of our objects.

    I've got a workaround i'm entirely unhappy with, which is to add the draw-id in the mesh buffer using VertexDescriptorAttributes.

    However ive found another way which might be better. If you leave the `baseVertexIndex` at 0 when constructing the indirectArgs, the SV_VertexID passed to the vertex shader can be used to figure out what draw it was.

    Rather wonky and i'd really like to use ExecuteIndirect to its fullest potential, but it -does the trick!
  24. MrDaveSh


    Apr 15, 2020
    @TJHeuvel-net Would you mind explaining a bit more your work arround?
    I'm trying to optimise my project and i would be amazing if i could get that working with shaderhgraph/hdrp.
    I got many instanced mesh with different instance count and resolution , but same material, drawing them in one draw call would really help me.
  25. TJHeuvel-net


    Jul 31, 2012
    We combine all meshes into a giant buffer, and make one IndirectDrawArg for every submesh. Lets say we are drawing a cube, with 36 indices, and a sphere with ~3000.

    We can now reason that if SV_VertexID is less than 36, we must be drawing our first cube object.
  26. MrDaveSh


    Apr 15, 2020
    @TJHeuvel-net Thank you, i managed to have evrything working. I thought using that technique would save me some drawcall but it didn't , still one draw call per mesh unfortunately, i'm not sure if i did something wrong or if that's just the way it works. Before that i was using Graphics.RenderMeshIndirect in a loop for all mesh, but can't see any performance improvement with multi command version.
  27. TJHeuvel-net


    Jul 31, 2012
    Not sure if you are doing this already, but try combining your meshes! If you make them all into a big mesh, and make a new IndirectArgument per submesh you can use one RenderMeshIndirect call for all of them. SubMeshDescriptor maps really well to the indirect args.

    Mind you in d3d11 this will still translate to one-draw-per-submesh, only on d3d12 and vulkan will that become a single draw call.
  28. MrDaveSh


    Apr 15, 2020
    I do create a big mesh here is my code:

    Code (CSharp):
    1. using System;
    2. using UnityEngine;
    3. using UnityEngine.Rendering;
    6. public class MultiIndirectMeshRenderer : IDisposable
    7. {
    11.     public RenderParams rp;
    12.     public GraphicsBuffer commandBuf;
    13.     public GraphicsBuffer.IndirectDrawIndexedArgs[] multiDrawCommands;
    14.     public Mesh[] mesh { get; private set; }
    15.     public int[] meshInstanceCount;
    16.     public int InstanceCount { get; set; }
    17.     public bool castshadow;
    18.     private Mesh mergedMesh;
    19.     public int totalCount = 0;
    21.     public void IndirectMeshRenderer(Mesh[] _Mesh, Material _Material, int[] _InstanceCount, bool _CastShadow)
    22.     {
    23.          mesh = _Mesh;
    24.         meshInstanceCount = _InstanceCount;
    25.         commandBuf = new GraphicsBuffer(GraphicsBuffer.Target.IndirectArguments, mesh.Length, GraphicsBuffer.IndirectDrawIndexedArgs.size);
    26.         multiDrawCommands = new GraphicsBuffer.IndirectDrawIndexedArgs[mesh.Length];
    28.         if (mergedMesh != null)
    29.         {
    30.             UnityEngine.Object.Destroy(mergedMesh);
    31.         }
    32.         mergedMesh = new Mesh();
    34.         int vertexCount = 0;
    35.         int indexCount = 0;
    36.         foreach (Mesh m in mesh)
    37.         {
    39.             vertexCount += m.vertexCount;
    40.             indexCount += (int)m.triangles.Length;
    41.         }
    43.         castshadow = _CastShadow;
    45.         int currentVertexCount = 0;
    46.         int currentIndexCount = 0;
    47.         int Startinstance = 0;
    48.         totalCount = 0;
    50.         Vector3[] vertices = new Vector3[vertexCount];
    51.         int[] indices = new int[indexCount];
    52.         Vector3[] normals = new Vector3[vertexCount];
    53.         Vector2[] uv = new Vector2[vertexCount];
    54.         Vector2[] uv2 = new Vector2[vertexCount];
    55.         Vector2[] uv3 = new Vector2[vertexCount];
    56.         Vector2[] uv4 = new Vector2[vertexCount];
    57.         for (int i = 0; i < mesh.Length; i++)
    58.         {
    59.             Mesh m = generateVertexId(mesh[i], i, Startinstance);
    60.             Array.Copy(m.vertices, 0, vertices, currentVertexCount, m.vertexCount);
    61.             Array.Copy(m.triangles, 0, indices, currentIndexCount, m.triangles.Length);
    62.             Array.Copy(m.normals, 0, normals, currentVertexCount, m.vertexCount);
    63.             Array.Copy(m.uv, 0, uv, currentVertexCount, m.vertexCount);
    64.             Array.Copy(m.uv2, 0, uv2, currentVertexCount, m.vertexCount);
    66.             multiDrawCommands[i].baseVertexIndex = (uint)currentVertexCount;
    67.             multiDrawCommands[i].indexCountPerInstance = (uint)m.triangles.Length;
    68.             multiDrawCommands[i].instanceCount = (uint)meshInstanceCount[i]; // Instance par mesh
    69.             multiDrawCommands[i].startIndex = (uint)currentIndexCount;
    70.             multiDrawCommands[i].startInstance = (uint)Startinstance;
    72.             currentVertexCount += m.vertexCount;
    73.             currentIndexCount += (int)m.triangles.Length;
    75.             Startinstance += meshInstanceCount[i];
    76.             totalCount += meshInstanceCount[i];
    77.         }
    79.         mergedMesh.vertices = vertices;
    80.         mergedMesh.triangles = indices;
    81.         mergedMesh.normals = normals;
    82.         mergedMesh.uv = uv;
    83.         mergedMesh.uv2 = uv2;
    84.         mergedMesh.RecalculateTangents();
    86.         rp = new RenderParams(_Material);
    87.         rp.worldBounds = new Bounds(, 10000 *; // use tighter bounds for better FOV culling
    88.         if(_CastShadow) rp.shadowCastingMode= ShadowCastingMode.On;
    89.         else rp.shadowCastingMode = ShadowCastingMode.Off;
    90.         rp.receiveShadows= true;
    91.         rp.reflectionProbeUsage= ReflectionProbeUsage.BlendProbesAndSkybox;
    92.         rp.lightProbeUsage = LightProbeUsage.BlendProbes;
    93. = Camera.current;
    95.         commandBuf.SetData(multiDrawCommands);
    96.     }
    98.     public void Render()
    99.     {
    100.         Graphics.RenderMeshIndirect(rp, mergedMesh, commandBuf, mesh.Length, 0);
    101.     }
    103.     public void Dispose()
    104.     {
    105.         if (commandBuf != null)
    106.         {
    107.             commandBuf.Release();
    108.             commandBuf = null;
    109.         }
    111.     }
    113.     Mesh generateVertexId(Mesh m, int mId, int id)
    114.     {
    115.         Mesh mid = new Mesh();
    116.         mid.vertices = m.vertices;
    117.         mid.triangles = m.triangles;
    118.         mid.normals = m.normals;
    119.         mid.tangents = m.tangents;
    120.         mid.uv = m.uv;
    122.         mid.colors = m.colors;
    123.         Vector2[] uvid = new Vector2[m.vertices.Length];
    124.         for (int i = 0; i < m.vertices.Length; i++)
    125.         {
    126.             uvid[i] = new Vector2(mId, id);
    127.         }
    128.         mid.uv2 = uvid;
    129.         return mid;
    130.     }
    131. }

    i then call that class using :

    Code (CSharp):
    1. Mr = new MultiIndirectMeshRenderer();
    2. Mr.IndirectMeshRenderer(MeshArray, Material, InstanceCountArray, true);
    4. Mr.Render();
    Mesh and instance are rendering properly as expected, i got different vertex count and different instance count.
    I'm using Unity 2022.2.20 with HDRP, and a shadergraph with a custom function to get compute buffer with position...
    Not sure here what i'm doing wrong, also using DX12 in the project.
  29. dotmos


    Jan 2, 2011
    I hope i am not misunderstanding you and you have trouble accessing the instanceID:
    To access the global instance ID (globalInstanceIndex in the code below) we do this (Please also have a look at the attached MultiDrawIndirect.hlsl which was basically ripped from some Unity .cginc file + a few minor things added).

    Use GetIndirectInstanceID_Base(v.svInstanceID); :)

    Code (CSharp):
    2. struct VertexData
    3.     {
    4.         ...
    5.         uint svInstanceID : SV_InstanceID;
    6.     };
    8. ....
    12. InitIndirectDrawArgs(0);
    14. uint cmdID = GetCommandID(0);
    15. //"Local" instance index of the "drawcall"/group. Always starts at 0 for each group.
    16. uint instanceID = GetIndirectInstanceID(v.svInstanceID);
    18. //Global instance index, if all instances and objects use one large buffer (which is the case most likely). Use this to fetch per instance data (position, rotation, scale, etc.).
    19. uint globalInstanceIndex = GetIndirectInstanceID_Base(v.svInstanceID);
    20. //NOTE: An intermediate buffer (uint -> uint) is needed to map from globalInstanceIndex to actual bufferIndex, since ALL objects are in the buffer, but only a handful are drawn due to compute culling.
    21. // globalInstanceIndex = _MultiDrawIndexToInstance[globalInstanceIndex];
    22. //DrawID. Use this to fetch per mesh data (materialdata, etc.)
    23. uint meshID = cmdID;
    25. //Unpack TRS matrix from 4x3 to 4x4. TODO: Maybe even better compression possible if we only scale uniformly?
    26. TranslationMatrix4x3 translation = _MultiDrawTranslationBuffer[globalInstanceIndex];
    27. // float4x4 custom_ObjectToWorld;
    28. unity_ObjectToWorld._m00 = translation.r0.x;
    29. unity_ObjectToWorld._m01 = translation.r0.y;
    30. unity_ObjectToWorld._m02 = translation.r0.z;
    31. unity_ObjectToWorld._m03 = translation.r0.w;
    32. unity_ObjectToWorld._m10 = translation.r1.x;
    33. unity_ObjectToWorld._m11 = translation.r1.y;
    34. unity_ObjectToWorld._m12 = translation.r1.z;
    35. unity_ObjectToWorld._m13 = translation.r1.w;
    36. unity_ObjectToWorld._m20 = translation.r2.x;
    37. unity_ObjectToWorld._m21 = translation.r2.y;
    38. unity_ObjectToWorld._m22 = translation.r2.z;
    39. unity_ObjectToWorld._m23 = translation.r2.w;
    40. unity_ObjectToWorld._m30 = 0;
    41. unity_ObjectToWorld._m31 = 0;
    42. unity_ObjectToWorld._m32 = 0;
    43. unity_ObjectToWorld._m33 = 1;
    45. //---------------------------------------------------------------------------
    46. //Correct WorldToObject. Makes rotation work on normals/tangents/etc.
    47. // TODO: OPTIMIZE!
    48. // inverse transform matrix
    49. float3x3 w2oRotation;
    50. w2oRotation[0] = unity_ObjectToWorld[1].yzx * unity_ObjectToWorld[2].zxy - unity_ObjectToWorld[1].zxy * unity_ObjectToWorld[2].yzx;
    51. w2oRotation[1] = unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[2].yzx - unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[2].zxy;
    52. w2oRotation[2] = unity_ObjectToWorld[0].yzx * unity_ObjectToWorld[1].zxy - unity_ObjectToWorld[0].zxy * unity_ObjectToWorld[1].yzx;
    53. float det = dot(unity_ObjectToWorld[0].xyz, w2oRotation[0]);
    54. w2oRotation = transpose(w2oRotation);
    55. w2oRotation *= rcp(det);
    56. float3 w2oPosition = mul(w2oRotation, -unity_ObjectToWorld._14_24_34);
    57. unity_WorldToObject._11_21_31_41 = float4(w2oRotation._11_21_31, 0.0f);
    58. unity_WorldToObject._12_22_32_42 = float4(w2oRotation._12_22_32, 0.0f);
    59. unity_WorldToObject._13_23_33_43 = float4(w2oRotation._13_23_33, 0.0f);
    60. unity_WorldToObject._14_24_34_44 = float4(w2oPosition, 1.0f);
    61. //----------------------------------------------------------------------------
    On a sidenote:
    How are you guys handling textures when using indirect drawing?
    We are currently using a single mega texture2d array and then index into that. Indices are supplied per vertex so every triangle can potentially use a different texture from the array (doing that is not a good idea of course ;) ).
    We tried to use Unity's virtual texture solution but quickly dropped the idea since it is still in beta (alpha?).
    Afaik bindless is also not supported by Unity?

    Attached Files:

    Last edited: Jun 15, 2023
    Leonidas85 likes this.
  30. TJHeuvel-net


    Jul 31, 2012
    Thanks for your answer! It appears when using d3d12 Unity uses the same workaround as in d3d11. That is, split each indirect call up in a unique one, and supply a different id.


    Here you see the second draw, which has a different BaseCommandID. In an ideal world, they wouldnt do this at all, but instead expose the more advanced features of ExecuteIndirect so it could remain the same draw call.

    No. Its also not possible to do draw call compaction with the current C# Graphics API. ExecuteIndirect has another argument for a count buffer, which you need if you want to remove empty draws.

    If this becomes a problem i'm considering using native plugins, which could also pave a way to supporting texture arrays.
    Last edited: Jun 15, 2023
  31. dotmos


    Jan 2, 2011
    Hey thanks for your answer! :)

    What you say about d3d12 pretty much fits what we were told by Unity a few months ago. I was hoping that it might have changed in the meantime. It is not a showstopper for us, but would have been nice. But then again, we still have time till we release our next game. If Unity manages to fully implement ExecuteIndirect/MDI by then: Cool. If not: Still ok (for us).

    Just out of curiosity: Have you tried implementing meshlets/meshclusters instead? That would also somewhat work around the draw call compaction issue. I though about it a few months ago, but never gave it a try. Actually i am not even 100% sure if it can be done in Unity without native plugins. From what I know, it should probably be possible, but the devil is in the details as always :)

    I might misunderstand you here, but texture arrays are possible without native plugins.
    But maybe you mean an array of textures? (or vice versa, i always get the naming of these two mixed up, sorry).
    If all else fails, you could still emulate them with branching + texture2DArray. But i guess that is not really a valid alternative for your usecase and the GPU will probably quit it's job and jump out of the window. ;)
  32. TJHeuvel-net


    Jul 31, 2012
    Yeah sorry, i did mean an array of textures, rather than a single (supported) TextureArray.