Search Unity

Official Unity is adding a new "DXC" HLSL compiler backend option

Discussion in 'Shaders' started by Aras, Apr 2, 2021.

  1. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    Yes, I'll add this request
     
    Invertex likes this.
  2. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    28
    Hi I am not sure whether this is a bug, or if I have configured the editor incorrectly but
    #pragma require Int64 #pragma require Int64BufferAtomics
    do not seem to work for compute shaders. I have
    #pragma use_dxc
    enabled, and this does enable use of wave/warp intrinsics but I get the following error:

    Kernel at index (0) requires features which are unavailable on the current platform - 'int64bufferatomics'. Make sure '#pragma require' is used correctly as unnecessary arguments can restrict platform reach

    Attempting to type any variable with a 64 bit value like
    uint64_t x;
    results in the following error:

    Shader error in 'CDAtomic': opcode '64-bit atomic operations' should only be used in 'Shader Model 6.6+'. at kernel ChainedDecoupledScanAtomic at CDAtomic.compute(64) (on d3d11)

    Which is strange because the compiled code lists
    **** Platform Direct3D 12:
    . I am running UnityEditor 2023.1.0a26, with the only Direct3D12 enabled for graphics API's.
     
  3. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,019
    I think this is the culprit. It seems your system doesn't support SM 6.6 which is required here.
     
    b0nes123 likes this.
  4. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    28
    I see thank you, I didn't realize that I needed to grab the AgilitySDK and whatnot. I assumed that my stock DX12 install would be enough.
     
  5. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    49
    This is kind of buried and it took me a while to figure out, so I thought I'd mention that DXC doesn't support interfaces at all. I'm using them to abstract multiple SDF functions, so that's preventing me from trying out the feature. It's a shame too, because the exponential nature of function composition is really limiting me in terms of how many different functions I could build, and the speedup in compilation times would certainly help.

    Although, properly supporting dynamic class linking (https://learn.microsoft.com/en-us/w...alizing-interface-instances-in-an-application) would be the real solution, I think.
     
    burningmime likes this.
  6. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    The way Unity internally does abstraction is basically just via the preprocessor (eg every HDRP shader pass includes some set of files in some specific order, maybe with a couple
    #define
    s thrown in. Which was the height of code quality circa 1977.

    The impression I get is that most people who write shaders eschew OOP concepts entirely. I don't think there's a single member function in any struct throughout like 30k+ lines of HLSL in HDRP; it's all just a combination of preprocessor and loose functions.

    HLSL 2021 has templates, although that's not really a substitute, IMO. They're better in some ways, worse in others (eg no IDE support for generating function prototypes, lack of API definitions, etc). But it's a step in some sort of direction towards abstraction and code quality in HLSL. Of course, interfaces (fully inlined ones -- no dynamic linking) were also a step in that direction and not enough people used them. So we'll see how long templates last before Microsoft takes them out.

    As for dynamic linking, that's a concept that requires API (and possibly driver?) support. It's not in DX12, Vulkan, or Metal AFAIK.
     
    Kmsxkuse and b0nes123 like this.
  7. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    49
    Right. I'm using interfaces with branching to get around this. The reason is that I'm doing things that would result in probably tens of millions of shader combinations otherwise. Let's say I have 3 SDF functions - a cube a sphere and a torus, and 3 different functions to combine them - intersect, union and average. Just to represent all the ways you could combine two shapes would require 27 different shader combinations. And that's just for a trivial example.

    Instead, I have an iBaseSDF interface with a getDistance function that takes a float3 and returns a float, an iBaseCombine interface with an interpolate function that takes two iBaseSDF classes and returns a single iBaseSDF class, and a computeSDF function that loops over everything. I have to be very careful with the code so it doesn't try to unroll everything and spit out a compilation error, but it does work.

    Edit: I'm skipping over some implementation details. I based my code off of this: https://code4k.blogspot.com/2011/11/advanced-hlsl-using-closures-and.html
     
    Last edited: Mar 18, 2023
  8. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Yup; that seems like an excellent way to structure your code and keep it clean/readable/maintainable. Microsoft, Apple, AMD, NVIDIA, Kronos, Sony, Nintendo, Unity, Epic, Ubisoft, Square-Enix, and EA apparently disagree, though, which is why it's not in any new versions of shader languages.
     
  9. luyu2020

    luyu2020

    Joined:
    Sep 1, 2021
    Posts:
    5
    Same problem here. I'm working on urp and there's wired clearcoat shader code. If they just inherit BRDFData and make a BRDFDataClearCoat structure which implement "IClearCoatBRDF" or something like this, code would be very clean and simple…
    And I'm quite appaled to find out that dxc don't support interface.
     
  10. Kolyasisan

    Kolyasisan

    Joined:
    Feb 2, 2015
    Posts:
    397
    Just plugging this straight into an HDRP/Lit shader I can see a performance degradation in most of the shaders by about 1.5ms (HDRP test scene, RTX 3060 Ti hardware, both DX12 and Vulkan, Forward+ pass used for profiling). That is simply with just enabling DXC and requesting no additional features.

    Profiling with NSight reveals considerably lower occupancy of SMs with pixel warps (0.75x of what was on FXC, and that's on F+ which is already super VGPR hungry). Strangely, the VGPR pressure and vertex ISBE allocations are lower (0.93x and 0.28x respectively), but the TRAM is higher (1.42x).

    LSU throughput for Writeback and Data-Stage is much higher though (1.5x for each). I wonder why such drastic changes even happen between FXC and DXC. Are there differences in how data is being loaded between those? My compiled shader-fu is rather cranky, but I would like to investigate it a little.

    DXC sounds pretty promising to us considering that we would like to try DX12, so we'd get to experiment with async compute and SM6.0 wave intrinsics.
     
  11. Error-md1

    Error-md1

    Joined:
    Nov 9, 2022
    Posts:
    13
    Last edited: Sep 29, 2023
  12. Supuo

    Supuo

    Joined:
    Jul 8, 2023
    Posts:
    1
    Hi, I found a problem when using DXC in compute shaders.
    The Shader.SetGlobalTexture can only work with Texture2D<float4> in shaders but cannot work with Texture2D.
    ComputeShader.SetTexture works with both.
    The FXC does not have this problem.

    I work in Build-in pipeline on DX12. Unity version is 2022.3.9f1.
     
    stonstad likes this.
  13. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    Has anyone tried using the DXC compiler and
    #pragma require Native16Bit
    with half variables declared in the cbuffer?

    For me the presence of explicit half precision floats & ints in the cbuffer seems to break my variables.
    Perhaps an alignment issue?
     
  14. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    After having tried to control alignment using packoffset it seems that may not be the issue. I'm suspecting an issue with material properties being declared as "vector" (float4) not mapping correctly to their half precision counterparts in the cbuffer.
     
  15. joao_maia_u3d

    joao_maia_u3d

    Unity Technologies

    Joined:
    Dec 15, 2021
    Posts:
    6
    Hi! I was not able to find issues myself when using Unity 2022.3 and targeting Metal. I used a simple shader that I added as attachment.

    HalfShader.shader:
    Code (CSharp):
    1. cbuffer MyConstantBuffer
    2. {
    3.    int4 param0;
    4.    float4 param1;
    5.    int16_t4 param2;
    6.    float16_t4 param3;
    7. };
    Compiled shader on Metal:

    Code (CSharp):
    1. struct type_MyConstantBuffer
    2. {
    3.     int4 param0;
    4.     float4 param1;
    5.     short4 param2;
    6.     half4 param3;
    7. };
    Would it be possible to provide a few more details?
    - The Unity version you are using
    - The target platform (e.g. Metal, DirectX 12)
    - The shader you have issues with
    - Any other information or reproduction steps you find relevant

    You can also report a bug through Help -> Report a Bug... and attach the shader file you have issues with.

    It's also good to keep in mind the following (quoting from "Using buffers with GPU buffers" section of Writing shaders for different graphics APIs):
    • Use float4 and float4x4 instead of float3 and float3x3, because float4 variables are the same size on all graphics APIs, while float3 variables can become a different size on some graphics APIs.
    • Declare variables in decreasing size order, for example float4 then float2 then float, so all graphics APIs structure the data in the same way.
     

    Attached Files:

    Last edited: Dec 4, 2023
    SeanR10Chambers likes this.
  16. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    Hi! Thanks for testing it out, I should have probably provided some more details in my initial post;

    - I'm using Unity 2022.3.6f1
    - The target platform is DirectX 12
    - The shader I have issues with is a custom shader built with Better Shaders, I won't be able to share it here as my work is under NDA.
    - When using Better Shaders material properties have to be declared in both the "BEGIN_PROPERTIES" block (similar to shaderlab: https://docs.unity3d.com/Manual/SL-Properties.html) and in the constant buffer.

    While I can declare the variables as half/float16_t, int16_t, etc... in the cbuffer, in the BEGIN_PROPERTIES block I'm relegated to the available unity material property datatypes (color, int, float, texture & vector).

    So perhaps the conversion from full precision material property to half precision cbuffer variable could be creating issues? Converting to 32bit integers seems to be fine, so it would seem that it's only affecting 16bit variables if that's the issue.

    Or possibly the alignment is still posing an issue, it might be that packoffset isn't working as expected with 16bit variables.


    I have tried the attached HalfShader.shader, but it crashes my project.

    I will try again tomorrow in an empty project.
     
  17. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    I just tried the halfshader in an empty project to test if it was was working, but it crashes an empty project 2023.3.6f1 project using DX12 as well. I've submitted bug report. The case number is: CASE IN-62556
     
  18. joao_maia_u3d

    joao_maia_u3d

    Unity Technologies

    Joined:
    Dec 15, 2021
    Posts:
    6
    Thank you for testing and reporting the issue!

    Before attaching the shader to a material (and causing the crash), would it be possible to click the "Compile and show code" button? I would expect to see `float16_t` in the generated constant buffer.
     

    Attached Files:

    SeanR10Chambers likes this.
  19. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    There doesn't seem to be mention of float16_t explicitly, but the buffer is constructed with half's, and 8 half variables are being put into a single row/column (as expected with the 16bit floats).

    I changed all the variables to float16_t in this example to see if a new row/column in the cbuffer was accessed when more than 8 float16_t's were needed. Everything seems to be working as intended in that regard:



    Code (CSharp):
    1. %dx.types.Handle = type { i8* }
    2. %dx.types.CBufRet.f16.8 = type { half, half, half, half, half, half, half, half }
    3. %MyConstantBuffer = type { <4 x half>, <4 x half>, <4 x half>, <4 x half>, <4 x half> }
    4.  
    5. define void @frag() {
    6.   %1 = call %dx.types.Handle @dx.op.createHandle(i32 57, i8 2, i32 0, i32 0, i1 false)  ; CreateHandle(resourceClass,rangeId,index,nonUniformIndex)
    7.   %2 = call %dx.types.CBufRet.f16.8 @dx.op.cbufferLoadLegacy.f16(i32 59, %dx.types.Handle %1, i32 0)  ; CBufferLoadLegacy(handle,regIndex)
    8.   %3 = extractvalue %dx.types.CBufRet.f16.8 %2, 0
    9.   %4 = extractvalue %dx.types.CBufRet.f16.8 %2, 1
    10.   %5 = extractvalue %dx.types.CBufRet.f16.8 %2, 2
    11.   %6 = extractvalue %dx.types.CBufRet.f16.8 %2, 3
    12.   %7 = extractvalue %dx.types.CBufRet.f16.8 %2, 4
    13.   %8 = extractvalue %dx.types.CBufRet.f16.8 %2, 5
    14.   %9 = extractvalue %dx.types.CBufRet.f16.8 %2, 6
    15.   %10 = extractvalue %dx.types.CBufRet.f16.8 %2, 7
    16.   %11 = call %dx.types.CBufRet.f16.8 @dx.op.cbufferLoadLegacy.f16(i32 59, %dx.types.Handle %1, i32 1)  ; CBufferLoadLegacy(handle,regIndex)
    17.   %12 = extractvalue %dx.types.CBufRet.f16.8 %11, 0
    18.   %13 = extractvalue %dx.types.CBufRet.f16.8 %11, 1
    19.   %14 = extractvalue %dx.types.CBufRet.f16.8 %11, 2
    20.   %15 = extractvalue %dx.types.CBufRet.f16.8 %11, 3
    21.   %16 = extractvalue %dx.types.CBufRet.f16.8 %11, 4
    22.   %17 = extractvalue %dx.types.CBufRet.f16.8 %11, 5
    23.   %18 = extractvalue %dx.types.CBufRet.f16.8 %11, 6
    24.   %19 = extractvalue %dx.types.CBufRet.f16.8 %11, 7
     
  20. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    I did some more testing.

    In the examples below I declared my material color as a Color property in the "properties" section.
    Then I declared that property in the constant buffer as float4 in the first example, and as a float16_t4 in the second one.





    This is the property & cbuffer declaration for the float4 version:
    Code (CSharp):
    1.     Properties
    2.     {
    3.         _Color("Color", Color) = (1,1,1,1)
    4.     }
    Code (CSharp):
    1.             CBUFFER_START(UnityPerMaterial)
    2.                 float4 _Color;
    3.             CBUFFER_END
    And this is the property & cbuffer declaration for the float16_t4 version:
    Code (CSharp):
    1.     Properties
    2.     {
    3.         _Color("Color", Color) = (1,1,1,1)
    4.         _TestFloat("TestFloat", float) = 1.0
    5.     }
    Code (CSharp):
    1.             CBUFFER_START(UnityPerMaterial)
    2.                 float16_t4 _Color;
    3.                 float _TestFloat;
    4.             CBUFFER_END

    As you can see, when the material color property is declared as a float16_t4 in the constant buffer it breaks.
    This goes for any material property being declared as an explicit 16bit data type in the CBuffer.

    From looking at the compiled shader any properties declared as an explicit 16bit data type are completely stripped from the "UnityPerMaterial" constant buffer in the fragment shader.

    This renders the current implementation of
    Native16Bit
    rather useless as the none of the data savings of tightly packed half precision variables in the cbuffer can be used for material properties.

    Declaring the properties at full precision in the cbuffer is also not a good solution as the overhead incurred by converting all those 32bit properties to 16bit negates any performance benefits fp16 math might have, often even resulting in worse performance.

    Support for 32bit material property -> 16bit cbuffer equivalent or explicit 16bit material properties would be required to work.

    I've attached both shaders an their compiled versions below. As you can see in
    Compiled-Renderers-HalfCustomF16T.shader
    at line 260 the color property is entirely missing from the per material constant buffer.
     

    Attached Files:

  21. joao_maia_u3d

    joao_maia_u3d

    Unity Technologies

    Joined:
    Dec 15, 2021
    Posts:
    6
    Hi! Sorry for the late reply. I created a project with the shaders you provided and I was able to reproduce the issue. I also created a bug report about it.

    Thank you for providing all this information, including the shaders and videos!
     
    Last edited: Jan 24, 2024
    SeanR10Chambers likes this.
  22. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    Hi, thanks for following up! Great that you managed to reproduce the issue and create a bug report.
    I've upvoted it and added a comment linking back to this thread for future reference.
     
    joao_maia_u3d likes this.
  23. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    49
    When using an unsized array, Unity automatically adds an
    exclude_renderers d3d11 gles
    pragma. But that doesn't seem to work correctly with dxc, because I get
    Unlit/Raymarch Surface Shader shader is not supported on this GPU (none of subshaders/fallbacks are suitable)
    , even though I have the renderer explicitly set to DX12.
     
  24. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    I don't know if you figured out the issue, but in case you haven't is your GPU DXR compatible? There's a decent chunk of gpu's that support only the base version of DX12, not DX12 Ultimate which has all the fancy ray-tracing features.

    From what I could find online support for DXR is associated with support for Shader Model 6.3, so checking if your specific GPU supports SM6.3 would be the quickest way to see if DXR is going to work.
     
    mandisaw likes this.
  25. KillHour

    KillHour

    Joined:
    Oct 25, 2015
    Posts:
    49
    It's a 3090, so it definitely has DXR, but I'm not using DXR at all. The name of the shader might be confusing here - it's not using any DXR features. Just a standard oldschool SDF raymarcher. The only thing that is messing it up is the unbounded array, which was introduced in SM 5.1

    https://learn.microsoft.com/en-us/windows/win32/direct3d12/dynamic-indexing-using-hlsl-5-1

    The shader works fine when I remove the unbounded array.
     
    SeanR10Chambers likes this.
  26. SeanR10Chambers

    SeanR10Chambers

    Joined:
    Jan 20, 2022
    Posts:
    12
    Aha, wo
    Hmm, could this be because Unity doesn't differentiate between dx11 & dx12 in it's #pragma's?

    From a quick google a "#pragma NonUniformResourceIndexing" could be implemented to manually make sure that the feature is enabled?