Say I compile a short list of items in a RWStructuredBuffer, perhaps by running a prefix sum to remove irrelevant items from a larger list. How do I run a CS kernel on only those items, when the CPU has no idea how many items it'll hold beforehand? ie How do I vary the thread count of a kernel using GPU data? The short list might be anywhere from 0 to 5000 items long after the prefix sum, but everything I've seen online says the thread count of a kernel must be defined CPU side. The CPU won't know how many items there are until long after the dispatch, if ever. If I don't do this, the compute shader kernel will have to run 5000 threads per frame, and some frames 0% of those threads will actually be used. (This is for collisions in a physics engine, I need to resolve any clipping points the system finds).