Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Shader compiler breaks Kahan summation / error compensating summation

Discussion in 'Editor & General Support' started by jg482, Aug 10, 2022.

  1. jg482

    jg482

    Joined:
    Apr 5, 2018
    Posts:
    8
    I'm trying to do an error compensated sum of two floats.

    Code (CSharp):
    1. #pragma kernel CSMain
    2. #pragma disable_fastmath
    3. #pragma skip_optimizations metal
    4.  
    5. // This describes a vertex on the generated mesh, it should match that in the compute shader!
    6. struct BufferType {
    7.     float a;
    8.     float b;
    9. };
    10. RWStructuredBuffer<BufferType> _Buffer;
    11.  
    12. [numthreads(512, 1, 1)]
    13. void CSMain (uint3 id : SV_DispatchThreadID)
    14. {
    15.     float a = _Buffer[(uint)id.x].a;
    16.     float b = _Buffer[(uint)id.x].b;
    17.  
    18.     float sum = a + b;
    19.     float t = sum - a;
    20.     float err = (a - (sum - t)) + (b - t);
    21.  
    22.     _Buffer[(uint)id.x].a = sum;
    23.     _Buffer[(uint)id.x].b = err;
    24. }
    I thought re-association due to fastmath would be the problem. But now I saw in the compiled shader that the compiler simply agressively optimizes away the compensation by setting u_xlat0.y = 0.0;. This is algebraic correct but breaks my code and is exactly the problem I face.

    Code (CSharp):
    1. **** Platform Metal:
    2. Compiled code for kernel CSMain
    3. keywords: <none>
    4. #define UNITY_DISABLE_FASTMATH
    5. #include <metal_stdlib>
    6. #include <metal_texture>
    7. using namespace metal;
    8. struct _Buffer_Type
    9. {
    10.     uint value[2];
    11. };
    12.  
    13. kernel void computeMain(
    14.     device _Buffer_Type *_Buffer [[ buffer(0) ]],
    15.     uint3 mtl_ThreadID [[ thread_position_in_grid ]])
    16. {
    17.     float2 u_xlat0;
    18.     u_xlat0.xy = float2(as_type<float>(_Buffer[mtl_ThreadID.x].value[(0x0 >> 2) + 0]), as_type<float>(_Buffer[mtl_ThreadID.x].value[(0x0 >> 2) + 1]));
    19.     u_xlat0.x = u_xlat0.y + u_xlat0.x;
    20.     u_xlat0.y = 0.0;
    21.     _Buffer[mtl_ThreadID.x].value[(0x0 >> 2)] = as_type<uint>(u_xlat0.x);
    22.     _Buffer[mtl_ThreadID.x].value[(0x0 >> 2) + 1] = as_type<uint>(u_xlat0.y);
    23.     return;
    24. }
    Is there any possibility to influence the comiler to prevent it from optimizing this agressively? Any preprocessor directive or compiler arguments I can set?
     
  2. jg482

    jg482

    Joined:
    Apr 5, 2018
    Posts:
    8
    I gave DXC a try and it does not seem to have this problem. But I still have to test this more with my full codebase.

    Code (CSharp):
    1. **** Platform Metal:
    2. Compiled code for kernel CSMain
    3. keywords: <none>
    4. #include <metal_stdlib>
    5. #include <simd/simd.h>
    6.  
    7. using namespace metal;
    8.  
    9. struct BufferType
    10. {
    11.     float a;
    12.     float b;
    13. };
    14.  
    15. struct type_RWStructuredBuffer_BufferType
    16. {
    17.     BufferType _m0[1];
    18. };
    19.  
    20. kernel void computeMain(device type_RWStructuredBuffer_BufferType& _Buffer [[buffer(0)]], uint3 gl_GlobalInvocationID [[thread_position_in_grid]])
    21. {
    22.     float _28 = _Buffer._m0[gl_GlobalInvocationID.x].a;
    23.     float _30 = _Buffer._m0[gl_GlobalInvocationID.x].b;
    24.     float _7 = _28 + _30;
    25.     float _8 = _7 - _28;
    26.     _Buffer._m0[gl_GlobalInvocationID.x].a = _7;
    27.     _Buffer._m0[gl_GlobalInvocationID.x].b = (_28 - (_7 - _8)) + (_30 - _8);
    28. }
    29.  
    30.  
    31. #define UNITY_DISABLE_FASTMATH
     
  3. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    I'm not sure you can disable algebraic optimization in the shader compiler. Maybe by forcing the shader to compile in debug mode, but even tem it's not guaranteed.

    Maybe you could break up your algorithm into two passes. First write the sum into the buffer, then in the second CS you read from it to calculate the error.

    You could also try messing around with groupshared memory to try and make this possible in the same kernel. Basically you need to find a way to fool the compiler into not realizing "sum" equals "a + b".