Search Unity

Wave Intrinsics in HDRP

Discussion in 'High Definition Render Pipeline' started by JiangBaiShi, Feb 10, 2020.

  1. JiangBaiShi

    JiangBaiShi

    Joined:
    Aug 3, 2019
    Posts:
    27
    Hi, I was told that Unity doesn't support wave intrinsics in Compute Shader, Due to the reason
    "WaveInstrisics require DXC compiler and SM 6.0. Unity currently only support FXC path for DX11 and DX12."
    https://forum.unity.com/threads/wave-intrinsics-support-for-compute-shader.824916/

    But after reading some shader code in current HDRP package, I found there are plenty of places where wave/cross-lane operations were used. for example:
    • MotionBlur.compute
      Code (CSharp):
      1. [numthreads(16, 16, 1)]
      2. void MotionBlurCS(uint3 dispatchThreadId : SV_DispatchThreadID)
      3. {
      4.    // .......
      5. #if defined(PLATFORM_SUPPORTS_WAVE_INTRINSICS)
      6.     earlyOut = WaveActiveAllTrue(earlyOut);
      7.     fastPath = WaveActiveAllTrue(fastPath);
      8. #endif
      9.   // .......
      10. }
    • DealUtilities.compute
    Code (CSharp):
    1. #if SCALARIZE_LIGHT_LOOP
    2.        if (!fastPath)
    3.        {
    4.            // If we are not in fast path, v_lightIdx is not scalar, so we need to query the Min value across the wave.
    5.            s_decalIdx = WaveActiveMin(v_decalIdx);
    6.            // If WaveActiveMin returns 0xffffffff it means that all lanes are actually dead, so we can safely ignore the loop and move forward.
    7.            // This could happen as an helper lane could reach this point, hence having a valid v_lightIdx, but their values will be ignored by the WaveActiveMin
    8.            if (s_decalIdx == -1)
    9.            {
    10.                break;
    11.            }
    12.        }
    13.        // Note that the WaveReadLaneFirst should not be needed, but the compiler might insist in putting the result in VGPR.
    14.        // However, we are certain at this point that the index is scalar.
    15.        s_decalIdx = WaveReadLaneFirst(s_decalIdx);
    16. #endif // SCALARIZE_LIGHT_LOOP
    17.  


    And I found that the macro PLATFORM_SUPPORTS_WAVE_INTRINSICS basically controls wether you can use wave intrinsics in code, I wonder if I could make this work in my own code? or it's just a useless macro for now, even in official HDRP package? I don't have much experience on shader compilation stuff, so hopefully you guys could give me some help, thanks!
     
    Egad_McDad likes this.
  2. SebLagarde

    SebLagarde

    Unity Technologies

    Joined:
    Dec 30, 2015
    Posts:
    934
    Hi,
    PLATFORM_SUPPORTS_WAVE_INTRINSICS is define for console (PS4 / Xbox) in HDRP.
    So yes, required DXC for PC, that isn't available except for Raytracing currently in Unity.
     
  3. JiangBaiShi

    JiangBaiShi

    Joined:
    Aug 3, 2019
    Posts:
    27
    Thanks for reply! I wonder when Unity will be able to supprt these new optimization possibilities for compute shader on PC platform?
     
  4. SebLagarde

    SebLagarde

    Unity Technologies

    Joined:
    Dec 30, 2015
    Posts:
    934
    There is an effort right now about moving shader compilation from FXC to DXC, no ETA but this is in progress. This will help.
     
    Egad_McDad and cecarlsen like this.
  5. SLGSimon

    SLGSimon

    Joined:
    Jul 23, 2019
    Posts:
    80
    Is there somewhere we can follow this? I'm getting an infinite loop in the fxc based compiler that doesn't happen with dxc
     
  6. rz_0lento

    rz_0lento

    Joined:
    Oct 8, 2013
    Posts:
    2,361
    SLGSimon likes this.
  7. JiangBaiShi

    JiangBaiShi

    Joined:
    Aug 3, 2019
    Posts:
    27
    For any one who struggles to implement SM6.0 features in Unity,
    I suggest to update your Unity to the newest version with DX12 backend as follows:
    https://forum.unity.com/threads/shader-model-6-0-or-vendor-intrinsics.977700/

    However, I noticed that the performance of compute shader in DX12 backend is slower than DX11, due to their poorly designed fence and barrier strategies... and even worse, RenderDoc has poor support (performance counter invalid for indirect dispatches, and no kernel name to diaplay for you) .
    So I suggest to re-think about it before switch to SM6.0 with your project.
     
  8. merpheus

    merpheus

    Joined:
    Mar 5, 2013
    Posts:
    202
    I really hope that there is no D11OnD12 sorta poor implementation in place