Resolved Need help understanding compute shaders

Buttermilch · Dec 11, 2022

Using Unity 2021.3.6f1 and URP.
I'm currently generating a lot of positions (50k) with a compute shader.I'm sending the world position of a transform to the shader and it then just caluclates some positions relative to that world position.

My problem is now that after I've iterated over 2500 positions, all the others are Vector4.zero.I don't really understand how that happens.

Here's my compute shader code:

Code (CSharp):

int _range, _size;

float4 _worldPos;

RWStructuredBuffer<float4> _positionBuffer;

[numthreads(8,8,1)]

void CalcPositions(uint3 id : SV_DispatchThreadID) {

if (id.x < 0 || id.x >= _range || id.y < 0 || id.y >= _range) { return; }

float4 pos = _worldPos;

pos.xz += (id.xy - float(_range) * 0.5) * _size;

pos.y += 10;

_positionBuffer[id.x + id.y * _size] = pos;

}

Code (CSharp):

And Csharp code:

//_instances = 50000

//_range.x = 50

_positionShader.SetInt("_range", (int)_range.x);

positionShader.SetVector("_worldPos", transform.position);

positionShader.SetInt("_size", 2);

//4*sizeof(float) = Vector4

_positionBuffer = new ComputeBuffer(_instances, 4 * sizeof(float));

_positionShader.SetBuffer(0, "_positionBuffer", _positionBuffer);

int threadSize = 8; //correct ??

_positionShader.Dispatch(0, threadSize, threadSize, 1);

Buttermilch · Dec 12, 2022

I eventually found a tutorial by ronja (https://github.com/ronja-tutorials/ShaderTutorials/blob/master/Assets/050_Compute_Shader/)Now I can calculate thread groups like this:

Code (CSharp):

Shader.GetKernelThreadGroupSizes(kernel, out threadGroupSize, out _, out _);

int threadGroups = (int)((_instances + (threadGroupSize - 1)) / threadGroupSize);

And the compute shader is configured with [numthreads(32, 1, 1)].
Still don't really understand how that works but I'm going to find out someday.

burningmime · Dec 12, 2022

Buttermilch said: ↑

I eventually found a tutorial by ronja (https://github.com/ronja-tutorials/ShaderTutorials/blob/master/Assets/050_Compute_Shader/)Now I can calculate thread groups like this:

Code (CSharp):

Shader.GetKernelThreadGroupSizes(kernel, out threadGroupSize, out _, out _);

int threadGroups = (int)((_instances + (threadGroupSize - 1)) / threadGroupSize);

And the compute shader is configured with [numthreads(32, 1, 1)].
Still don't really understand how that works but I'm going to find out someday.
Click to expand...

If your compute shader has
[numthreads(X, Y, Z)]
and you call
Dispatch(A, B, C)
it dispatches
A*X
threads in the X dimension,
B*Y
threads in the Y dimension and
C*Z
threads in the Z dimension, for a total of
A*X*B*Y*C*Z
invocations of your kernel function. Basically, on the CPU when you call
Dispatch
it sends out that many groups, and then the
[numthreads(X, Y, Z)
tells you how many individual threads are in each group.So for your above code...

Code (CSharp):

// threadGroupSize = 32, because that's what your shader has in the [numthreads()]

Shader.GetKernelThreadGroupSizes(kernel, out threadGroupSize, out _, out _);

// threadGroups = (_instances + (32 - 1))/32

// threadGroups = floor((_instances + 31)/32)

// threadGroups = ciel(_instances/32)

int threadGroups = (int)((_instances + (threadGroupSize - 1)) / threadGroupSize);

So if _instances is 50, you dispatch 2 groups, which ends up launching 2*32=64 threads.

But you only have 50 instances! So what about the other 14 threads? That's why in your shader you have the check
if(id.x < 0 || id.x >= _range) { return; }
so that when it gives you index
53, 1, 1
or whatever you're not going to crash.

Yes, it's confusing and probably a bad design, but we're stuck with it.

By the way (to make it even more confusing), it's best practice to use a multiple of 64 for
groupsX * groupsY * groupsZ
. NVIDIA GPUs mostly dispatch things in waves of 32, but on AMD, they dispatch things in waves of 64. So if you have
[numthreads(32, 1, 1]
, your AMD GPU will only allocate 32 threads per wave, and it'll be half as fast as it could be. The tl;dr here is just always to use
numthreads(64,1,1)
,
numthreads(8,8,1)
or
numthreads(4,4,4)
unless you are using groupshared memory or your kernel uses big arrays.

Buttermilch · Dec 13, 2022

@burningmime Thank you for this great explanation!

Search Unity

Unity ID

Useful Searches

Resolved Need help understanding compute shaders

Buttermilch

Buttermilch

burningmime

Buttermilch