Search Unity

3 questions about compute shader performance

Discussion in 'General Graphics' started by burningmime, Oct 8, 2021.

  1. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    1. I've read (as late as 2020) that raw byte buffers are better than structured buffers for performance. Is this true?

    2. One article I read (from NVIDIA) suggested not splitting stuff over cache lines, while another (from AMD) said you should use as little data as possible. Would it better to write to a buffer of float3s from multiple threads or a bufffer of float4s where I ignore the W?

    3. And now for the big one... WTF is up with all the state changes and API calls when dispatching a compute shader? Dispatching the same kernel in URP using a CommandBuffer generates 13 state changes in RenderDoc each:

    upload_2021-10-7_21-3-46.png

    After the first one, the command buffer between them looks like this:

    Code (CSharp):
    1. _cmd.SetComputeMatrixParam(_csExtrude, ID_OBJECT_TO_WORLD, transform);
    2. _cmd.SetComputeIntParam(_csExtrude, ID_VERTEX_STRIDE, vertices.stride);
    3. _cmd.SetComputeBufferParam(_csExtrude, kEdges, ID_EDGE_ADJACENCY, edges);
    4. _cmd.SetComputeBufferParam(_csExtrude, kEdges, ID_VERTICES_IN, vertices);
    5. _cmd.DispatchCompute(_csExtrude, kEdges, threadGroupsX, batchSize, 1);
    Which definitely isn't doing much with hull shaders. Ain't that bad?
     
  2. Shane_Michael

    Shane_Michael

    Joined:
    Jul 8, 2013
    Posts:
    158
  3. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Thanks! So if I'm reading that correctly, that answers #1; structured buffer loads are the same or better than raw on PC/PS4/Xbone GPUs that support them except for some Intel integrated ones because compiler/driver can prove alignment. I assume stores aren't too different. And that's probably a reasonable answer for #2 to just use 16-byte aligned for everything where possible.

    The massive numbers of pointless draw calls on the CPU side still concern me, though.
     
  4. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,025
    @burningmime these are not draw calls, these are commands to setup the data for the dispatch.
    Looking at the code, the extra calls there are intended.
     
    burningmime likes this.
  5. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Even the stuff like VSSetShaderResources, etc? It seems like at least those 5 (VS/PS/GS/HS/DS) could be skipped if dispatching multiple compute shaders in a row, right? And the map/unmap of the same constant buffer?
     
  6. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,025
    Perhaps. I don't think that's expensive, though.
    Are you updating uniform data between dispatches? Using these `SetComputeXXXParam`?
     
  7. burningmime

    burningmime

    Joined:
    Jan 25, 2014
    Posts:
    845
    Yes, I am updating uniforms, so I guess it needs to be reuploaded. It also seems to be setting the same buffers even if I do not change them.

    If the API calls are nothing to worry about, then cool. Thanks for your help.
     
  8. aleksandrk

    aleksandrk

    Unity Technologies

    Joined:
    Jul 3, 2017
    Posts:
    3,025
    I don't have hard data to back this up, but I suppose that's the case.