Search Unity

Metal Compute Shader error: Requesting blit command encoder while having another active encoder

Discussion in 'Shaders' started by trepan, Jun 21, 2018.

  1. trepan

    trepan

    Joined:
    Feb 11, 2011
    Posts:
    113
    I'm trying to get some compute shader stuff working but seeing this error emitted for every thread.

    Assertion failed: Requesting blit command encoder while having another active encoder


    Since the same code works fine under DX11 I assume there're some additional constraints for Metal. The problem seems to be down to some conflict caused by using an InterlockedAdd to increment a counter within a structured buffer whilst also having other structured buffers being accessed in the same thread. ...Which is clearly ridiculous - it must be possible to use simple atomics under Metal, right?

    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. StructuredBuffer<float4> _ChunkDensities;
    4. RWStructuredBuffer<uint> _EdgeResults;
    5. RWBuffer<uint> _Counter;
    6.  
    7. uint3 _ChunkMultipliers;        // Per axis sizes to create a flattened index.
    8.  
    9. groupshared uint groupEdgeCount[9 * 9 * 3];
    10.  
    11. void ReadCornerDensities(uint fi, inout float2 v[2][2][2])
    12. {
    13.     // Read 8 values from _ChunkDensities, assign to input array.
    14.     // NOTE: If these reads are commented out the error goes away.
    15. }
    16.  
    17. uint EvaluateSurface(float2 v[2][2][2], uint groupIndex)
    18. {
    19.     uint res = ...            // Build a bit-field of edge states.
    20.     uint c = countbits(res);            // How many edge bits were just set?
    21.     groupEdgeCount[groupIndex] = c;
    22.     return res | (c << 12);
    23. }
    24.  
    25. [numthreads(9, 9, 3)]
    26. void CSMain(uint3 id: SV_DispatchThreadID, uint groupIndex : SV_GroupIndex)
    27. {
    28.     uint3 scId = id * _ChunkMultipliers;
    29.     uint fi = scId.x + scId.y + scId.z;
    30.     float2 v[2][2][2];
    31.     ReadCornerDensities(fi, v);
    32.     _EdgeResults[fi] = EvaluateSurface(v, groupIndex);
    33.  
    34.     GroupMemoryBarrierWithGroupSync();
    35.  
    36.     if (groupIndex == 0) {
    37.         uint sum = 0;
    38.         for (int i = 0; i < (9 * 9 * 3); i++) {
    39.             sum += groupEdgeCount[i];
    40.         }
    41.         InterlockedAdd(_Counter[0], sum);    // Increment edge pt count
    42.     }
    43. }
    44.  
    I've tried rewriting this a few different ways, this version using groupshared memory so that the atomic add is only used once per thread-group and with the Densities buffer as a StructuredBuffer rather than the Texture3D that it initially was. ...But no change :( What's especially driving me crazy is that I have a couple of other sample projects which are doing exactly the same kinds of things I'm trying to do and working just fine!
     
  2. trepan

    trepan

    Joined:
    Feb 11, 2011
    Posts:
    113
    Seems I've finally found the culprit. The innocuous looking line:

     uint c = countbits(res); 


    ...If I don't use 'countbits' all is well. o_O

    So whatever that intrinsic expands to on Metal appears to make it basically unusable in combination with any other meaningful compute work. Having never gotten this code running I don't know if it was actually returning a valid result or not...