Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

Bug DrawProceduralIndirect broken in 2021.2 with DX12/Vulkan

Discussion in '2021.2 Beta' started by Lecks, Jul 11, 2021.

  1. Lecks

    Lecks

    Joined:
    May 13, 2013
    Posts:
    18
    As the topic says I'm having trouble with DrawProceduralIndirect with DX12/Vulkan in 2021.2.0b3. Basically it just doesn't work (doesn't draw anything). It works in 2021.1, but then I can't use the new VertexID node in Shadergraph. It's an issue with at least Built-in and HDRP pipelines (likely URP too).

    Here's a minimal reproduction:
    MonoBehaviour:
    Code (CSharp):
    1. using UnityEngine;
    2.  
    3. public class DrawProceduralIndirectTest : MonoBehaviour
    4. {
    5.     public Material material;
    6.     private GraphicsBuffer indexBuffer;
    7.     private GraphicsBuffer vertexBuffer;
    8.     private ComputeBuffer args;
    9.  
    10.     void OnEnable()
    11.     {
    12.         indexBuffer =
    13.             new GraphicsBuffer(GraphicsBuffer.Target.Index, 6, 4);
    14.         indexBuffer.SetData(new uint[] { 0, 1, 2, 2, 3, 0 });
    15.         vertexBuffer =
    16.             new GraphicsBuffer( GraphicsBuffer.Target.Structured, 4,4 * 3);
    17.         vertexBuffer.SetData(new Vector3[] { new Vector3(-1, -1,0), new Vector3(-1, 1,0), new Vector3(1, 1,0), new Vector3(1, -1,0) });
    18.         args =
    19.             new ComputeBuffer(5, 4, ComputeBufferType.IndirectArguments);
    20.         args.SetData(new int[]{6,1,0,0,0});
    21.     }
    22.  
    23.     private void OnDisable()
    24.     {
    25.         indexBuffer.Dispose();
    26.         vertexBuffer.Dispose();
    27.         args.Dispose();
    28.     }
    29.  
    30.     void LateUpdate()
    31.     {
    32.         material.SetBuffer("_Vertices", vertexBuffer);
    33.         Graphics.DrawProceduralIndirect(material,
    34.             new Bounds(Vector3.zero,new Vector3(100,100,100)),
    35.             MeshTopology.Triangles,
    36.             indexBuffer,
    37.             args);
    38.     }
    39. }
    40.  
    and a basic unlit shader:
    Code (CSharp):
    1. Shader "Test/DrawProceduralShader"
    2. {
    3.     Properties
    4.     {
    5.     }
    6.     SubShader
    7.     {
    8.         Tags { "RenderType"="Opaque" }
    9.         LOD 100
    10.  
    11.         Pass
    12.         {
    13.             CGPROGRAM
    14.             #pragma vertex vert
    15.             #pragma fragment frag
    16.             #pragma target 4.5
    17.  
    18.             #include "UnityCG.cginc"
    19.  
    20.             struct appdata
    21.             {
    22.                 float4 vertex : POSITION;
    23.                 uint vertexID : SV_VertexID;
    24.             };
    25.  
    26.             StructuredBuffer<float3> _Vertices;
    27.             float4 vert (appdata v) : SV_POSITION
    28.             {
    29.                 return UnityObjectToClipPos( _Vertices[v.vertexID]);
    30.             }
    31.  
    32.             fixed4 frag () : SV_Target
    33.             {
    34.                 return float4(1,1,1,1);
    35.             }
    36.             ENDCG
    37.         }
    38.     }
    39. }
    40.  
    A project using these to show the error in it is attached.
    DX11 shows:

    DX12 and vulkan show:
     

    Attached Files:

    Last edited: Jul 11, 2021
    EmmetOT likes this.
  2. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    In my experience this is the least tested graphics feature in all of Unity. Actually, I suspect they don't even test it at all as part of their standardized testing.

    I always have to code a non-procedural fallback because it breaks in one or another platform/API. Once had it outright crash on Xbox One, for example.

    It's a shame because this is the most performant way of rendering grass and other similar effects.
     
    PutridEx likes this.
  3. PutridEx

    PutridEx

    Joined:
    Feb 3, 2021
    Posts:
    1,136
    My vegetation wouldn't be feasible at all without this feature. I hope they look into this.
     
  4. EmmetOT

    EmmetOT

    Joined:
    Apr 25, 2016
    Posts:
    44
    Wow, I just came here to report the same thing! I've already filed a bug report but here is the text of it:

    I hope they fix this soon as it's pretty crucial for me.

    Also note, this is a URP project so I can confirm it doesn't work in URP either!

    EDIT - On closer inspection this may be a distinct bug, but very similar.
     
    Last edited: Jul 11, 2021
    richardkettlewell likes this.
  5. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    It's not as performant and uses more memory, but it's possible to write a non-procedural fallback by pre-allocating a Mesh with the maximum amount of vertices per instance you're going have, then just don't read anything from it in the VS and just keep using the VertexID as usual.
     
    Lecks likes this.
  6. LeonhardP

    LeonhardP

    Unity Technologies

    Joined:
    Jul 4, 2016
    Posts:
    3,136
    Thanks for the report, we'll look into it.
     
    EmmetOT likes this.
  7. EmmetOT

    EmmetOT

    Joined:
    Apr 25, 2016
    Posts:
    44
    I should note that in the bug report (1349624) title I mistakenly refer to the node as Vertex ID and not Instance ID. Hope it doesn’t cause too much confusion.
     
    LeonhardP likes this.
  8. Lecks

    Lecks

    Joined:
    May 13, 2013
    Posts:
    18
    Should I report a bug through the issue tracker for the bug in the original post of this thread? I can't see Emmet's one (but I can't seem to search by ID, so I might be missing it).
     
  9. LeonhardP

    LeonhardP

    Unity Technologies

    Joined:
    Jul 4, 2016
    Posts:
    3,136
    Hi @Lecks,

    If you want to make sure that your case is evaluated distinctly (in case it turns out to be a separate issue), please submit a report with the contents of your initial post in this thread. Please use the Editor's bug reporter for this. There's a guide in my signature.
     
  10. LeonhardP

    LeonhardP

    Unity Technologies

    Joined:
    Jul 4, 2016
    Posts:
    3,136
    Some updates:

    We processed @EmmetOT's bug report and resolved it as Won't Fix since this is currently not expected to work.

    DrawIndirect and DrawProcedural are not supported yet by ShaderGraph so using the Instance ID node with DrawMeshInstancedIndirect is not supported either.

    There is an item for this on the team's product roadmap page and if this is important to you, please vote for it.


    Regarding the issue described by @Lecks, we were not able to reproduce the described behaviour with the reproducible that you shared in your original message (neither with the scripts in a new project nor with your attached sample).

    Please submit a bug report if you are still affected by this.
     
  11. EmmetOT

    EmmetOT

    Joined:
    Apr 25, 2016
    Posts:
    44
    Thanks for the response! So am I to understand that this will be working by the official release of 2021.2?
     
  12. LeonhardP

    LeonhardP

    Unity Technologies

    Joined:
    Jul 4, 2016
    Posts:
    3,136
    No, this is not being worked on at the moment but rather on the backlog of the team. There is an item for this on the team's product roadmap page and if this is important to you, please vote for it.
     
  13. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Aka: stay in built-in if you need advanced features.
     
  14. EmmetOT

    EmmetOT

    Joined:
    Apr 25, 2016
    Posts:
    44
    Thanks, I voted there.

    That said, this is really odd to me. You’re saying the Unity team has added an InstanceID node which literally does not work with DrawMeshInstancedIndirect. It’s not bugged. It simply does not function as standard, and does not throw any errors or warning messages.

    I spent a lot of time trying to figure out why this didn’t work before, assuming it was an issue on my end. I eventually gave up and switched to using the built-in renderer after some refactoring. I doubt I’ll be the only one this happens to after the official release.
     
  15. aras-p

    aras-p

    Joined:
    Feb 17, 2022
    Posts:
    75
    I don't quite understand how this can be closed with a "by design". Right now I'm looking at what feels like exactly the same issue: I have a shader that uses VertexID (via shader graph). Within HDRP (all Unity versions between 2021.3 and 2023.2), when using DX12, Graphics.DrawProcedural calls are simply "silently dropped". DX11 is fine; DX12 is also fine when doing Graphics.DrawMesh with the same shader (which then proceeds to ignore the mesh, and uses just vertex ID number).

    This does sound like an issue with Unity's DX12 backend implementation to me, where for some reason it decides to drop DrawProcedural calls without logging any errors or warnings. Unity's own frame debugger thinks they are happening (they are shown in frame debugger), which to me sounds like the "frontend" (GfxDeviceClient) accepts and receives them just fine, but the "backend" (GfxDeviceD3D12) just silently ignores them. How is that "by design"?!
     
    Pr0x1d and Lecks like this.
  16. aras-p

    aras-p

    Joined:
    Feb 17, 2022
    Posts:
    75
    The issue seems to be that Unity's DX12 backend has got very strict checking for what inputs does the vertex shader used by DrawProcedural needs, and if it uses anything besides instanceID / vertexID then on DX12 the DrawProcedural call is simply skipped (no warnings or errors are logged, just silently skipped).

    The validation feels "wrong" to me, as it considers vertex inputs to be "needed" if they are not actually even read. E.g. you can have vertex shader input function like "vert (uint vid : SV_VertexID, float3 completelyUnusedNormal : NORMAL)", and then completely not use the "completelyUnusedNormal" variable, and Unity will still decide to silently skip that procedural draw call.

    "But why would you have inputs that you never use?!", you ask. Because if you happen to use ShaderGraph, that's what it does; it emits a whole bunch of code with input attributes and whatever as entry point to the vertex shader, even if all these inputs are later on not used or overwritten by other calculations of the shader graph.

    What would have saved me two days of digging into this: if Unity had not done this overzealous validation (which makes DX12 behavior different from DX11). If DX12 API "needs" vertex shader input layout to match, Unity could provide "dummy" values similar to how it already does when trying to render a regular mesh where shader needs some data that the mesh does not have. Or, if Unity absolutely considers this to be "the correct behavior", then at least print some warning about the skipped draw call. And put that down into DrawProcedural documentation.
     
  17. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    Hi @aras-p ,

    Here is first some history on the functionality based on more thorough digging. Before 2019.3 DX11 also skipped drawcalls on DrawProcedural when there were no vertexbuffers available and DX12 issued errors to console. A bugfix to DX12 introduced full emulation to both of them, changing the API fundamentally from what it has been since Unity 5.2. No documentation changes were made and that still explicitly states "DrawProcedural2 does a draw call on the GPU, without any vertex or index buffers.". The correct fix would have been to just silence the DX12 error or make DX11 also issue a real error. This was our first mistake.

    Second mistake we made was in 2020 to remove the DX12 emulation as it regressed performance in cases where it was not needed, such as the particle system, and making it behave like DX11 pre 2019.3, as it's substantially heavier to do this in DX12, without noticing the change in DX11.

    So we have unfortunately introduced new behaviour implicitly as part of a bugfix where it was not really needed. We cannot let accidental changes to api edgecase behaviour dictate how it works, as the emulation is incredibly heavy so then users who do not need it will suffer untowardly. Alternatively we could just introduce DrawProcedural2 with documentation that says "DrawProcedural2 does a draw call on the GPU, without any vertex or index buffers, and this time we really mean it!".

    On a more serious note it does not mean that the user experience should be just "You're on your own". Here is what we are thinking of doing for this:

    - DX11 emulation is reverted so that there is no API where it happens by accident. Vulkan is a question mark as it still allows some cases as the spir-v optimizer handles that. But in case of DirectX shaders our hands are unfortunately mostly tied, and so in the other cases where we have several compilers outside of our control that consume HLSL directly.

    - We'll look into enabling warning if it's used. It can be rough for a short while. We also need to look if we can add a warning if non read ones are used in Vulkan.

    - Shadergraph needs a way to explicitly strip the unused attributes because this is an important usecase. It likely does this because due to the accidental change there has been no need to even consider it might be erroneous usage.

    - Improve our documentation on DrawProcedural to state that any vertexinputs in shaders are not allowed, not even those that are statically not used. This has been the case before 2019.3 so it is more about making it even more clear due to this change.

    Had the DX11 behaviour been there for longer it would've been more solidified as the de facto behaviour. But thankfully it is relatively new. If it had been there for longer we would likely be forced to go with the DrawProcedural2 route for the actually performant draws that don't require vertexbuffers. And then we'd also have a long road ahead of us in actually making every single platform perform this emulation, which most currently don't do.
     
    Last edited: Mar 28, 2023
  18. aras-p

    aras-p

    Joined:
    Feb 17, 2022
    Posts:
    75
    Thanks for the explanation, @tvirolai! If you are taking votes for which behavior to pick, then my vote would be on making the current DX11 behavior everywhere, as in I don't see that as "a mistake" -- it made things easier for the user, which is a good thing.

    (and yeah I guess before ShaderGraph existed, the fact that DrawProcedural required to only declare vertex/instance ID inputs was much less of a problem, since you were much more likely to have written the shader yourself -- but now that it's almost impossible to write a shader for HDRP/URP manually, one has to use ShaderGraph)

    I'm not sure I understand where the performance issue of providing dummy/fake vertex buffer for DrawProcedural comes from. Unity already does this, on graphics APIs, for all regular mesh rendering. So the code to do that already exists; all the vertex shaders in Unity already know what vertex input attributes they need; and the equivalent of "if ((shaderNeedsAttributesMask & ~meshHasAttributesMask) != 0) BindMissingAttributes();" code already exists. In DrawProcedural case, it's conceptually "just" a draw call with a mesh that provides no attributes at all. So why this code is not a performance issue for regular meshes on DX12, of which there's typically 1-10 thousand in a frame, but would be a performance issue for DrawProcedural, of which there's like a dozen per frame?

    By the way, my interpretation of "does a draw call on the GPU, without any vertex or index buffers" documentation is very much not that "this function makes sure that to the underlying graphics API no vertex buffers are bound". As a user, I don't care! My understanding of the docs is that they say "this function does a draw call, and you don't need to provide a vertex or index buffer for it". But if for whatever reason, on whatever graphics API, Unity needs to provide a dummy/fake/any buffers underneath, then let it do so. I, the user, would be glad if Unity code does that if that makes my life easier.
     
    JesOb likes this.
  19. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    Let me have an additional investigation if it's reasonably possible for this to happen. We do not want to change the internal per backend call to do that because at that point we don't have concept of material anymore, just raw shader bytecode. And it's used directly from our internal particle systems etc. However how the internal C++ call works doesn't dictate what the user visible C# call does.

    If we can do this in a reasonable fashion in the C# level, just like how I'd assume the normal drawcalls do it, then it is possible. But we don't want to either have a massive refactor for all backends or do emulation at a level where we really don't know anything about any attributes anymore, which is the reason the emulation the bugfix provided was so slow, it literally decompiled the DXBC shaders per single drawcall to sniff it's input layout.

    Ideally I agree that what you suggested is the best, or alternatively shader compiler being able to always strip the unused attributes. But as there are also many other deficiencies we must try to optimize the amount of benefit for the users vs amount of work bringing that benefit takes. The part where the DX11 behaviour change was a mistake was changing it only on a single backend without notifying others or having an accompanying documentation change. While the change is good for the user we can't just make these unannounced per backend and then have everyone scrambling to emulate whatever API change was pushed in.

    If you interpreted the documentation that way that means many others will too. So whatever will be the final solution the documentation does need to be clarified.
     
  20. aras-p

    aras-p

    Joined:
    Feb 17, 2022
    Posts:
    75
    I don't have the source code obviously, and I don't remember much of it right now, but IIRC a GpuProgram for a vertex shader still knows the input vertex attributes it needs. So if at GfxDevice DrawNullGeometry function you have the GpuProgram, then you should be able to get to that information, no need to do DXBC input layout parsing on each draw call.
     
  21. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    The mesh emulation for C# seems to be performed at the C# script level or right below it. Backends don't do that, they can always assume the things they are given are at least somewhat valid.

    I'm going to see if this solution works: C# level checks if there are vertexattributes needed, if yes it will essentially do a normal drawcall with mesh where the whole mesh is emulated. If not it goes to the current function. And documentation is changed to explain with a performance warning that if your shaders do require vertexattributes there is a potential performance penalty and for the fast path user should make sure no vertexattributes are required.

    The biggest part about performing this in platform neutral level is that it guarantees it works the same everywhere, meaning users can actually rely on the behaviour. And it also keeps the potential for bugs low as there is only one place performing emulation.
     
    Last edited: Mar 29, 2023
    Nexusmaster and aras-p like this.
  22. Nexusmaster

    Nexusmaster

    Joined:
    Jun 13, 2015
    Posts:
    365
    For me it would be fine if the relevant shader graph parts get fixed for this, e.g. creating a #define if DrawProcedural or RenderPrimitives is used. Could be a toggle or local KEYWORD set by the user!
     
    Last edited: Apr 14, 2023
  23. Nexusmaster

    Nexusmaster

    Joined:
    Jun 13, 2015
    Posts:
    365
    I basically got the shader graph working with DX12 and DrawProcedural:

    I also replaced the vertex attributes like this:


    Code (CSharp):
    1.     PackedVaryingsType Vert2(
    2. #if UNITY_ANY_INSTANCING_ENABLED
    3.     uint instanceID : INSTANCEID_SEMANTIC,
    4. #endif
    5.     uint vertexID : VERTEXID_SEMANTIC
    6.     )
    7.     {
    8.         VaryingsType varyingsType;
    9.  
    10. #if defined(HAVE_RECURSIVE_RENDERING)
    11.         // If we have a recursive raytrace object, we will not render it.
    12.         // As we don't want to rely on renderqueue to exclude the object from the list,
    13.         // we cull it by settings position to NaN value.
    14.         // TODO: provide a solution to filter dyanmically recursive raytrace object in the DrawRenderer
    15.         if (_EnableRecursiveRayTracing && _RayTracing > 0.0)
    16.         {
    17.             ZERO_INITIALIZE(VaryingsType, varyingsType); // Divide by 0 should produce a NaN and thus cull the primitive.
    18.         }
    19.         else
    20. #endif
    21.         {
    22.             AttributesMesh inputMesh = (AttributesMesh)0;
    23.             inputMesh.vertexID = vertexID;
    24. #if UNITY_ANY_INSTANCING_ENABLED
    25.             inputMesh.instanceID = instanceID;
    26. #endif
    27.             varyingsType.vmesh = VertMesh(inputMesh);
    28.         }
    29.  
    30.         return PackVaryingsType(varyingsType);
    31.     }
    32.  
    This works for the GBuffer, of course this has to be done for the other passes too, but I think this could be done in one work day easily, adding a shader define which let the user swap between Vert and Vert2.

    So I think this would be an easy fix and hope the Unity team can implement this! Btw. adding SV_Depth as an optional frag output would be also awesome! You did great work there with HDRP and URP just don't give up!
     
  24. Nexusmaster

    Nexusmaster

    Joined:
    Jun 13, 2015
    Posts:
    365
  25. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    It took a bit of time but it's in on the latest alpha. The drawcalls that do require vertexattributes are actually being converted into normal drawcalls as the path to DrawNullGeometry didn't have any of the required information. For maximum performance one should make sure that there are no vertexattributes required, but at least it works.
     
  26. Nexusmaster

    Nexusmaster

    Joined:
    Jun 13, 2015
    Posts:
    365
    Thanks for the update!
    So if I use custom vertexattributes like vertex ID it will use normal drawcalls?
    Sorry, but I still don't get your approach, so instead of using DrawNullGeometry you convert it to a normal drawcall and that conversion impacts the performance, right?
    Would my modified shadergraph, as I mentioned it above, now also gets passed to that slower rendering?

    Would my approach of "just" modifying the shader not result in better performance?
     
    Last edited: Jun 18, 2023
  27. tvirolai

    tvirolai

    Unity Technologies

    Joined:
    Jan 13, 2020
    Posts:
    79
    If you modify the shader then it's better performance. Reason for this conversion is that there are bunch of places in the engine that actually handle providing the vertexattributes, and they were not exactly compatible with the passthrough kind of way DrawNullGeometry operated. As an example the knowledge of what kind of vertexattributes it needs was lost way up the chain that eventually ended up in the backend, which means backends like DX12 couldn't emulate them without decompiling the shader and sniffing the contents, leading to a massive performance penalty on the CPU side. The reason this information was lost because the specification of the function is that no vertexattributes are needed.

    So to reiterate. You will only get the normal drawcall if your shader actually needs it. This guarantees that it will work on all backends exactly the same with no surprises. If your shader doesn't need it then it's the fast path as usual.

    The conversion itself is not "that" heavy. So you don't really need to care about it too much. It's not actually even a conversion. It's just check if the shader requires vertex inputs -> pass it through to the normal drawcall path that then spends some, thankfully small, time giving you some builtin vertexbuffers that return zero when read from.