Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

ComputeShaders With Multiple Kernels

Discussion in 'Scripting' started by knchaffin, Jan 15, 2014.

  1. knchaffin

    knchaffin

    Joined:
    May 18, 2013
    Posts:
    58
    Is it possible to bind a single ComputeBuffer to two ComputeShader kernels? So far I cannot seem to read from the RWStructuredBuffer in the 2nd kernel, but rather just get 0's for elements I wrote to in the first kernel. From reading MS DirectCompute threads, there are mentions of unbinding a structured buffer from one kernel before binding it to a second kernel, or at least this is my interpretation. I'm assuming that the ComputeShader.SetBuffer() function does the binding, but how do I unbind? Or am I misinterpreting what I am seeing?

    Thanks for any advice anyone can give.

    [Edit] I found the problem and it was not related to binding the same buffer to two kernels but rather a problem with SetBuffer and me mistyping the compute shader RWStructuredBuffer name. Interestingly, even though I had not successfully bound the buffer to the shader kernel, I was still able to write to it and read from it within the first kernel. When I tried to read from it in a second kernel, I just got 0's. Things seem to now be working fine with two kernels and a single ComputeBuffer and RenderTexture bound to both kernels.
     
    Last edited: Jan 17, 2014
    NullTerminated likes this.
  2. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    6,221
    what does the code look like?
    I'm trying to swap buffer and data in the compute shader but I might be running into the same problem
    Unity can't find CSMain kernel.

    Code (CSharp):
    1. #pragma kernel CSMain
    2. #pragma kernel SwapBuffer
    3.  
    4. RWStructuredBuffer<float> output;
    5. StructuredBuffer<float> buffer;
    6.  
    7. int width;
    8. int height;
    9. StructuredBuffer<float> emitters;
    10. StructuredBuffer<float> obstacles;
    11.  
    12. [numthreads(8,8,1)]
    13. void CSMain (uint2 id : SV_DispatchThreadID)
    14. {
    15.     int em = emitters[id.x + width * id.y];
    16.     float p = em
    17.         + buffer[id.x + 1 + width * id.y]
    18.         + buffer[id.x - 1 + width * id.y]
    19.         + buffer[id.x + width * (id.y + 1)]
    20.         + buffer[id.x + width * (id.y - 1)]
    21.         ;
    22.  
    23.     output[id.x] = p * obstacles[id.x + width * id.y];
    24. }
    25.  
    26. [numthreads(64,1,1)]
    27. void SwapBuffer (uint2 id : SV_DispatchThreadID)
    28. {
    29.     buffer[id.x] = output[id.x];
    30. }
     
  3. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    I currently struggle the exact same issue. However the solution or cause you described (misstype of buffer name) does not seem to be the issue. I am sure both kernels are running (I can sucessfully read out results of both of the using AsyncGPUReadback.Request(...) and due to the results I am sure that both kernels are running.
    However the results that kernel 1 writes into the TemporaryBuffer, even though they include values when reading them with AsyncGPUReadback.Request(...), are not available in kernel 2 (it has just 0s). So same issue like knchaffin has.

    This is how I set the buffers and dispatch to the two kernels.

    Code (CSharp):
    1. // Output Texture
    2. patchData.normalMapGPU = new RenderTexture(patchConstants.nPixelsPerEdge, patchConstants.nPixelsPerEdge, 24, RenderTextureFormat.ARGBHalf);
    3. patchData.normalMapGPU.enableRandomWrite = true;
    4. patchData.normalMapGPU.Create();
    5.  
    6. // Output Buffers
    7. patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);   // Temporary Buffer
    8.  
    9. // Set Buffers for kernel [1]
    10. this.shader.SetBuffer(this.kernel[1], "PatchConstantsBuffer", patchConstantsBuffer);
    11. this.shader.SetBuffer(this.kernel[1], "BodyConstantsBuffer", bodyConstantsBuffer);
    12. this.shader.SetBuffer(this.kernel[1], "TemporaryBuffer", patchData.temporaryBuffer);
    13.  
    14. // Dispatch kernel [1]
    15. this.shader.Dispatch(this.kernel[1], patchConstants.nPixelsPerEdgeWithSkirt, patchConstants.nPixelsPerEdgeWithSkirt, 1);   // (Typically 258,258,1)
    16.  
    17. // Set Buffers for kernel [2]
    18. this.shader.SetBuffer(this.kernel[2], "PatchConstantsBuffer", patchConstantsBuffer);
    19. this.shader.SetBuffer(this.kernel[2], "TemporaryBuffer", patchData.temporaryBuffer);
    20. this.shader.SetTexture(this.kernel[2], "NormalMapTexture", patchData.normalMapGPU);
    21.  
    22. // Dispatch kernel [2]
    23. this.shader.Dispatch(this.kernel[2], patchConstants.nPixelsPerEdge, patchConstants.nPixelsPerEdge, 1);                // (Typically 256,256,1)
    24.  
    25. // Request the result
    26. patchData.request = AsyncGPUReadback.Request(patchData.normalMapGPU);
    27. //patchData.request = AsyncGPUReadback.Request(patchData.temporaryBuffer); // For test purposes
    28.  
    And this is how I work with it in the two kernels.

    Code (Boo):
    1. // The structure of the temporary buffer to move the data between the kernals during the highres calculations
    2.     struct TemporaryStruct
    3.     {
    4.         float3 position;
    5.         float noise;
    6.     };
    7.  
    8.  
    9.     //Various input buffers, and an output buffer that is written to by the kernel
    10.     StructuredBuffer<PatchConstantsStruct>            PatchConstantsBuffer;
    11.     StructuredBuffer<BodyConstantsStruct>            BodyConstantsBuffer;
    12.     RWStructuredBuffer<TemporaryStruct>        TemporaryBuffer;
    13.     RWStructuredBuffer<OutputStruct>        OutputBuffer;
    14.     RWTexture2D<float4>                NormalMapTexture;
    15.  
    16.  
    17.  
    18.     // Second kernel to create the position grid for the pixels. We need to create a separate kernel as we want to add the result to the temporary buffer instead of the OutputBuffer
    19.     #pragma kernel CSMain2
    20.  
    21.     [numthreads(1, 1, 1)]
    22.     void CSMain2(uint3 id : SV_DispatchThreadID)
    23.     {
    24.         // Get the constants
    25.         PatchConstantsStruct patchConstants = PatchConstantsBuffer[0];
    26.         BodyConstantsStruct bodyConstants = BodyConstantsBuffer[0];
    27.  
    28.         // Get outBuffOffset
    29.         int outBuffOffset = id.x + id.y * patchConstants.nPerEdgeWithSkirt;
    30.  
    31.         // Get the PatchNormalizedCoord
    32.         float3 patchNormalizedCoord = PatchNormalizedCoord(id.x, id.y, patchConstants.nPerEdge, patchConstants.spacing, patchConstants.eastDirection, patchConstants.northDirection, patchConstants.centerVector);
    33.  
    34.         // Calculate its 'real world' size:
    35.         float3 patchCoord = patchNormalizedCoord * bodyConstants.radiusMeter;
    36.  
    37.         // We determine the 'planet-space' value for the patchCubeCenter:
    38.         // Note we wont be using this variable for the time being but in case we need the center coordinate we have it here
    39.         float3 patchCenter = normalize(patchConstants.centerVector) * bodyConstants.radiusMeter; // patchCenter now sits on the surface of a planet-sized sphere.
    40.  
    41.         // Next we generate the noise value using the patch's 'real-world' coordinate (patchCoord)
    42.         int octaves = bodyConstants.octaves + patchConstants.level;
    43.         octaves = clamp(octaves, 0, 10);
    44.         float noise = FBM(patchCoord, octaves, bodyConstants.frequency, bodyConstants.amplitude, bodyConstants.lacunarity, bodyConstants.persistence);
    45.  
    46.         // We create the height value taking max height into account
    47.         float height = (noise * 2) - 1;                // terrainHeight now ranges from -1 to + 1;
    48.         height = clamp(height, -1, +1);             // We clamp the height to make sure it does not overshoot -1 or +1
    49.         height *= bodyConstants.maxHeightMeter;        // terrainHeight now ranges from -terrainMaxHeight to +terrainMaxHeight.
    50.  
    51.         // This final step adds (or subtracts) the real terrain height from the real world-sized (but centered) patch.
    52.         // Note we apply this to both variables, the normal one and the centered one.
    53.         patchCoord += patchNormalizedCoord * height;
    54.  
    55.         // Result
    56.         TemporaryBuffer[outBuffOffset].position = patchCoord;
    57.         TemporaryBuffer[outBuffOffset].noise = noise;
    58.     }
    59.  
    60.  
    61.  
    62.     // Third kernel to create the normalmap and the slope
    63.     #pragma kernel CSMain3
    64.  
    65.     [numthreads(1, 1, 1)]
    66.     void CSMain3(uint3 id : SV_DispatchThreadID)
    67.     {
    68.         // Get the constants
    69.         PatchConstantsStruct patchConstants = PatchConstantsBuffer[0];
    70.  
    71.         // Get offsets
    72.         int inBuffOffset = (id.x + 1) + (id.y + 1) * patchConstants.nPerEdgeWithSkirt;
    73.         int outBuffOffset = id.x + id.y * patchConstants.nPerEdge;
    74.  
    75.         // Create Normals (Indexes)
    76.         // Create the necessary indexes of surrounding vertices
    77.         int inBuffOffsetNorth = inBuffOffset + 1 * patchConstants.nPerEdgeWithSkirt;
    78.         int inBuffOffsetEast = inBuffOffset + 1;
    79.         int inBuffOffsetSouth = inBuffOffset - 1 * patchConstants.nPerEdgeWithSkirt;
    80.         int inBuffOffsetWest = inBuffOffset - 1;
    81.  
    82.         // Method normals
    83.         float3 sideA, sideB, sideC, sideD;
    84.         float3 normalForward, normalBackward, normal;
    85.         sideA = TemporaryBuffer[inBuffOffsetNorth].position - TemporaryBuffer[inBuffOffset].position;
    86.         sideB = TemporaryBuffer[inBuffOffsetEast].position - TemporaryBuffer[inBuffOffset].position;
    87.         normalForward = cross(sideA, sideB);
    88.         sideC = TemporaryBuffer[inBuffOffsetSouth].position - TemporaryBuffer[inBuffOffset].position;
    89.         sideD = TemporaryBuffer[inBuffOffsetWest].position - TemporaryBuffer[inBuffOffset].position;
    90.         normalBackward = cross(sideC, sideD);
    91.         normal = normalBackward + normalForward;
    92.         //normal = normalize(normalized);
    93.  
    94.         // Create Texture
    95.         float3 normalRGB = float3(normal.x, normal.z, normal.y) / 2 + float3(0.5f, 0.5f, 0.5f);
    96.         uint2 textureID = uint2(id.x, id.y);
    97.         NormalMapTexture[textureID] = float4(normalRGB, 1);
    98.     }
     
    Last edited: Jul 26, 2020
    laurentlavigne likes this.
  4. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    There are limits to what can be written and read on the GPU, but it's unusual to see this with buffers. Try binding the buffer to a separate reference for each kernel and update your code accordingly;
    Code (CSharp):
    1. CS:
    2. RWStructuredBuffer<TemporaryStruct> TemporaryBufferWrite;
    3. RWStructuredBuffer<TemporaryStruct> TemporaryBufferRead;
    4.  
    5. C#:
    6. this.shader.SetBuffer(this.kernel[1], "TemporaryBufferWrite", patchData.temporaryBuffer);
    7. this.shader.SetBuffer(this.kernel[2], "TemporaryBufferRead", patchData.temporaryBuffer);
     
  5. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    Thanks for the advice grizzly. Unfortunately there is the same result, data does not reach Kernel 2.
    For testing purposes I added typos to the names of the SetData() command in C# and checked if for whatever reason the compiler would not care that it does not match to the ComputeShader so that then this could be the reason.
    But it recognized the typo for each of the ComputeBuffer names. So this is definitely not the reason.

    I begin to wonder if the AsyncGPUReadback usage might be the reason? Is a ComputeBuffer wiped in a new frame?
    When I set the AsyncGPUReadback command after dispatching Kernel 1 and 2 to asynchronously read the NormalMap which is created in Kernel 2 , maybe after Kernel 1 finishes its job and writes to TemporaryBuffer the data is wiped before Kernel 2 starts?

    Just a guess. But the only idea I have left...
     
  6. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    No, ComputerBuffer data will persist until explicitly wiped/destroyed.
    It's not clear from the snippet you've provided how the request is handled. Depending on its size, data can take a few frames to become available. For testing purposes I would suggest using GetData to retrieve the data immediately.

    If you can provide more code, or better, a working example to test I'll happily have a look.
     
  7. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    I use the same buffer between two kernels and it works fine. First kernel clears the buffer, the second writes to it. However, I use the same variable name for both kernels (the variable is declared just once in the shader). Try that instead of binding the same buffer to two variables to see if it works.
     
  8. grizzly

    grizzly

    Joined:
    Dec 5, 2012
    Posts:
    357
    Yes it should work, but apparently it does not. R/W limitations for certain resources (textures) produce similar results, hence the above working solution for the aforementioned problem was initially suggested.
     
    unity_IpxdANggCs1roQ likes this.
  9. joergzdarsky

    joergzdarsky

    Joined:
    Sep 25, 2013
    Posts:
    56
    For documentation purpose if someone stumbles across this problem, grizzly was so kind and look at a working example and found the issue. It was the size of the buffer :)eek:) of "TemporaryBuffer", which was 1-dimensional and not 2-dimensional as it should have been.
    Thats really evil as the compiler does not complain about that. buffer sizes are really worth it to look at them twice and fourth when searching for an issue ;)

    Code (CSharp):
    1. patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);
    has to become

    Code (CSharp):
    1. patchData.temporaryBuffer = new ComputeBuffer(patchConstants.nPixelsPerEdgeWithSkirt * patchConstants.nPixelsPerEdgeWithSkirt, 12 + 4, ComputeBufferType.Default);