Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

EDIT: Compute on Metal, GroupMemoryBarrier() not working?

Discussion in 'Shaders' started by ecurtz, Mar 31, 2019.

  1. ecurtz

    ecurtz

    Joined:
    May 13, 2009
    Posts:
    640
    I'm working on a compute shader that does skinning and I get different results on OS X/Metal depending on seemingly arbitrary code changes such as altering the order of unrelated lines. Are there some known rules to follow here or is the HLSL to Metal translation just broken?

    The first version of this works, the second doesn't, even though the only difference is the order of two lines near the end. parents[] is a read only StructuredBuffer<int>.

    Code (CSharp):
    1. [numthreads(128,1,1)]
    2. void PoseToMatrices (uint3 id : SV_DispatchThreadID)
    3. {
    4.     uint bone;
    5.     int parent;
    6.     half4x4 localMatrix;
    7.     half4x4 fullMatrix;
    8.    
    9.     bone = id.x;
    10.     if (bone >= boneCount)
    11.     {
    12.         return;
    13.     }
    14.    
    15.     localMatrix = CalcLocalMatrix(bone);
    16.     localMatrices[bone] = localMatrix;
    17.        
    18.     // Can't use GroupMemoryBarrierWithGroupSync with early return.
    19.     GroupMemoryBarrier();
    20.    
    21.     parent = parents[bone];
    22.     fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    23.     WriteMatrix(bone, fullMatrix);
    24.    
    25.     bone = bone + 128;
    26.     if (bone < boneCount)
    27.     {
    28.        parent = parents[bone];
    29.        localMatrix = CalcLocalMatrix(bone);
    30.  
    31.        fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    32.        WriteMatrix(bone, fullMatrix);
    33.     }
    34. }
    Code (CSharp):
    1. [numthreads(128,1,1)]
    2. void PoseToMatrices (uint3 id : SV_DispatchThreadID)
    3. {
    4.     uint bone;
    5.     int parent;
    6.     half4x4 localMatrix;
    7.     half4x4 fullMatrix;
    8.    
    9.     bone = id.x;
    10.     if (bone >= boneCount)
    11.     {
    12.         return;
    13.     }
    14.    
    15.     localMatrix = CalcLocalMatrix(bone);
    16.     localMatrices[bone] = localMatrix;
    17.        
    18.     // Can't use GroupMemoryBarrierWithGroupSync with early return.
    19.     GroupMemoryBarrier();
    20.    
    21.     parent = parents[bone];
    22.     fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    23.     WriteMatrix(bone, fullMatrix);
    24.    
    25.     bone = bone + 128;
    26.     if (bone < boneCount)
    27.     {
    28.        localMatrix = CalcLocalMatrix(bone);
    29.        parent = parents[bone];
    30.  
    31.        fullMatrix = CalcFullMatrix(bone, parent, localMatrix);
    32.        WriteMatrix(bone, fullMatrix);
    33.     }
    34. }
    35.  
     
  2. ecurtz

    ecurtz

    Joined:
    May 13, 2009
    Posts:
    640
    I've done some more experimenting and it seems like GroupMemoryBarrier(); isn't actually doing anything. I have groupshared data which I thought should be protected by that, but maybe I'm misunderstanding something because I'm new to compute shaders? Or maybe it's a bug in the translation to Metal.
     
  3. ecurtz

    ecurtz

    Joined:
    May 13, 2009
    Posts:
    640
    #if __HAVE_SIMDGROUP_BARRIER__
    simdgroup_barrier(mem_flags::mem_threadgroup);
    #else
    threadgroup_barrier(mem_flags::mem_threadgroup);
    #endif

    As I said, I'm a compute shader novice, but this bit from the "compiled" shader just seems wrong. Shouldn't GroupMemoryBarrier() always translate to something that spans the entire thread group and not just the SIMD group?

    https://github.com/Unity-Technologi...68af9f9befb97ded5b/src/toMetalInstruction.cpp
    line 3055 - seems like SYNC_THREADS_IN_GROUP should be set and isn't
     
    Last edited: Apr 3, 2019
  4. slime73

    slime73

    Joined:
    May 14, 2017
    Posts:
    107
    Just a guess, but maybe the GroupMemoryBarrierWithGroupSync() function will work where GroupMemoryBarrier() won't?

    I believe threadgroup_barrier(mem_flags::mem_threadgroup) is exactly equivalent to GroupMemoryBarrierWithGroupSync(), and the simdgroup_barrier(mem_flags::mem_threadgroup) call is roughly equivalent to GroupMemoryBarrier() because neither of the latter guarantees thread group execution sync, only group-shared memory read/write sync.

    If you aren't able to use GroupMemoryBarrierWithGroupSync() you'll probably have to restructure your code to allow it.
     
    Last edited: Apr 3, 2019
    ecurtz likes this.
  5. ecurtz

    ecurtz

    Joined:
    May 13, 2009
    Posts:
    640
    No, unfortunately that doesn't help. I've tried it both ways.
    On the other hand it does produce the code that I expected from the original version
    threadgroup_barrier(mem_flags::mem_threadgroup);
    So I guess the actual bug is more complicated than I was thinking.

    The error is related to group-shared memory read/write sync, I'm getting invalid data read from a groupshared array after the barrier.
     
    Last edited: Apr 3, 2019
  6. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    848
    @ecurtz old thread, I know. Does this mean group shared memory is working on iOS now? I tested it way back, but haven’t lately.
     
  7. ecurtz

    ecurtz

    Joined:
    May 13, 2009
    Posts:
    640
    Unfortunately I gave up on that method and rewrote the shader so I don't know if this was ever fixed (or even if it was working as intended and I misinterpreted something in the documentation.) If you do try it, please let us know your results!
     
    cecarlsen likes this.