Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Resolved Confusion with compute shader and SV_GroupIndex

Discussion in 'Shaders' started by flogelz, Mar 7, 2023.

  1. flogelz

    flogelz

    Joined:
    Aug 10, 2018
    Posts:
    141
    I'm finally throwing myself at compute shaders and stumbled over a problem where I'm not sure if I made a mistake in my code or understand the concept wrong (or maybe both? who knows!)

    Let's say I dispatch a compute shader with the goal of writing into a 3d texture. If i dispatch the compute shader in c# with the same thread group dimensions as texture size(xyz), this should give me one thread group per pixel in the 3d texture. And as far as I understood it, the numthreads[x,y,z] in the compute shader on the other hand define how many threads i want to have in each thread group. So if I say numthreads[1,1,1] each thread group would have one thread to work with.

    My confusion now comes from the numtheads[x,y,z] distribution and its effects. What I wanted to do in my shader was to use multiple threads in a group to loop through a loop, but each thread basically calculating another element of that loop. By using the SV_GroupIndex which returns the flat index of each thread, this should be doable and I've seen it done in light cluster examples where they'd loop through different lights with each thread, using that index.

    The problem I have is that my loop works fine with numtheads[1,1,1] (so 1 thread) but half of the pixels in my texture calculate incorrectly when i bump it up to numtheads[2,1,1] (so 2 threads). What confuses me even more is that if i switch dimensions (to numtheads[1,2,1] for example), the pattern in which my texture gets calculated falsely changes too, eventhough I'm using the linear index of the threads, so this shouldnt affect anything at all- As far as I understand it, when I use that linear thread index, it shouldn't matter if i put the 2 threads into the x, y or z of numthreads[], because I'll always have two threads to work with. Am I missing something here?

    (Sorry for all that rambling, I'm new to compute shaders and the concept is still a bit abstract for me-)
     
  2. flogelz

    flogelz

    Joined:
    Aug 10, 2018
    Posts:
    141
    Maybe as a better visualization, I threw together this quick example, also for me for testing purposes.
    I've created a 3D Texture (3x3x3 in size) and dispatch this compute shader from my script:
    Code (CSharp):
    1. compute.Dispatch(0, 3, 3, 3);
    And here is the shader code:
    Code (CSharp):
    1. #pragma kernel CSMain
    2.  
    3. RWTexture3D<float4> _TexRW;
    4. groupshared int count;
    5.  
    6. #define THREADS_X 3
    7.  
    8. [numthreads(THREADS_X,1,1)]
    9. void CSMain (uint3 id : SV_DispatchThreadID, uint threadIndex : SV_GroupIndex)
    10. {
    11.     if(threadIndex == 0) { count = 0; }
    12.     GroupMemoryBarrierWithGroupSync();
    13.  
    14.     for(uint i = threadIndex; i < THREADS_X; i += THREADS_X)
    15.     {
    16.         InterlockedAdd(count, 1);
    17.     }
    18.     GroupMemoryBarrierWithGroupSync();
    19.  
    20.     _TexRW[id] = count;
    21. }
    First, I set a global counter thats groupshared to 0 with the first thread and then wait so that all threads are finished. Then I go through the loop, which should work so that each thread in this group would go trough it once and adding 1 to the group shared counter. The last line is just me outputting that result.

    UPDATE: I made a typo in the code above before, so the code actually works as intended now like its written there! In the process though I discovered where my actual problem comes from. Instead of writing the count with all threads to the texture like above, I encased it in my other code file in an if to only let the first thread do the job. Which again introduces the weird streaks- Eventhough the first thread should have all the data to write the correct value into the texture?

    Code (CSharp):
    1. if(threadIndex == 0)
    2. {
    3.     _TexRW[id] = count;
    4. }
    numthreads.png
     
    Last edited: Mar 8, 2023
  3. flogelz

    flogelz

    Joined:
    Aug 10, 2018
    Posts:
    141
    Ok, so to boil my problem down to the important part! Why does a compute shader write to everything correctly when I write into a texture like this:
    Code (CSharp):
    1. [numthreads(3,1,1)]
    2. void CSMain (uint3 id : SV_DispatchThreadID, uint threadIndex : SV_GroupIndex)
    3. {
    4.     _TexRW[id] = 1;
    5. }
    But only to a third of the texture when I only want thread 0 to write to the 3d texture:
    Code (CSharp):
    1. [numthreads(3,1,1)]
    2. void CSMain (uint3 id : SV_DispatchThreadID, uint threadIndex : SV_GroupIndex)
    3. {
    4.     if(threadIndex == 0)
    5.     {
    6.         _TexRW[id] = 1;
    7.     }
    8. }
    Since I thought that each thread group would have atleast one thread 0, this line should still color the whole texture and not just one strip of it.
     
  4. flogelz

    flogelz

    Joined:
    Aug 10, 2018
    Posts:
    141
    I think I found my problem after going back to the basics and reading up on each base variables that are provided. SV_DispatchThreadID isn't what I thought it was as it gets influenced by the numthreads, which is not what I wanted in my code. That's also why changing the numthreads influences everything by offsetting it, but good that this is finally cleared. The more you know!

    The microsoft documentation is a gem for understanding these: https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/sv-dispatchthreadid

    (I'm keeping this thread up in case someone else falls into this trap)
     
  5. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    24
    If I understand your question correctly, you are asking: why does
    threadIndex == 0
    not color the entire texture?

    The issue is that you are misunderstanding how the parallelism operates. All that is happening when your kernel is dispatched is that each thread in the dispatch checks whether its Group Id (SV_GroupIndex, which I will refer to as GID from here on) equals 0, then if it does, colors the texture at its Dispatch Thread ID (DID).

    The GID is the index of a thread block/work group; it is not the index of an individual thread. Given that your dispatch line is
    compute.Dispatch(0, 3, 3, 3);
    , you are dispatching (3 * 3 * 3) = 27 thread groups, resulting in GID's of {0, 1, 2, . . . 26}. A DID is the index of an individual thread over the entire dispatch, which can be found by GID * (threads per group) + Group Thread ID. Given that you have 3 threads per group, you are getting DID's of {0, 1, 2. . . 80}.

    The reason why the second code block does not work is because your if statement selects for only the first thread block
    GID == 0
    , then allows only the threads within that block to color the texture, resulting in a coloring matching the DID indexes {0, 1, 2}. Essentially, you are sleeping the other 26 thread groups.