Search Unity

  1. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice
  2. Unity is excited to announce that we will be collaborating with TheXPlace for a summer game jam from June 13 - June 19. Learn more.
    Dismiss Notice
  3. Dismiss Notice

Showcase State of the Art Parallel-Sorting in Compute Shaders, Updated for Portability!

Discussion in 'Shaders' started by b0nes123, Aug 18, 2023.

  1. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    29
    GPUSorting vs CUB.png

    Hi all, this project has been extensively overhauled and is now available as a package GPUSorting. As a measure of the quality of the code, I've implemented my code in CUDA and benchmarked it against Nvidia's implementations in the CUB library, all of which can be found in the GPUSorting repository.

    GPUSorting is theoretically portable to all wave sizes [4, 128], that is to say any graphics device, however it has only been tested on wave sizes 4, 16, 32, and 64. You have been warned!

    All wave size/SIMD width specialization is handled logically, instead of through shader compilation defines. This significantly cuts down on shader permutations, and enables GPUSorting to be theoretically compatible to all devices down to SM capability 6.0 and at least Unity version 2021.3.35f1. However it has only been tested at a minimum SM capability of 6.2.

    GPUSorting now supports:
    • keys only sorting
    • key-value pair supporting
    • ascending order sorting
    • descending order sorting
    • direct dispatching
    • dispatching through a Unity CommandBuffer
    • a maximum sorting size of 2^31
    GPUSorting now supports the following data types for keys and values:
    • uint32_t
    • int32_t
    • float
    To use this project:
    1. Add the package through Unity's package manager with the following git url:
      https://github.com/b0nes164/GPUSorting.git?path=/GPUSortingUnity
    2. Example usage, and more detailed instructions can be found on the repository wiki here:
      https://github.com/b0nes164/GPUSorting
    Note for previous users: this is a new repository, and I will be deprecating the previous repository shortly.

    This work is based off of research by:
    Andy Adinets and Duane Merrill. Onesweep: A Faster Least Significant Digit Radix Sort for GPUs. 2022. url: https://arxiv.org/abs/2206.01784

    Duane Merrill and Michael Garland. “Single-pass Parallel Prefix Scan with De-coupled Lookback”. In: 2016. url: https://research.nvidia.com/publication/2016-03_single-pass-parallel-prefix-scan-decoupled-look-back

    Saman Ashkiani et al. “GPU Multisplit”. In: SIGPLAN Not. 51.8 (Feb. 2016). issn: 0362-1340. doi: 10.1145/3016078.2851169. url: https://doi.org/10.1145/3016078.2851169.

    For more information, see the repository. To the best of my knowledge all algorithms included in the project have been released under open-source licenses, and are free to use, as is the project itself. This is not legal advice.
     
    Last edited: Mar 2, 2024
  2. Maareliic

    Maareliic

    Joined:
    Aug 1, 2016
    Posts:
    1
    Hey, @b0nes123, great work implementing the Radix sort algorithms on compute shaders!
    Do you have any progress or a possible timeframe for a release of the implementation of the broad-phase collision detection?
    I am currently struggling with the performance of Unity's built-in physics colliders and doing collision detection on GPU seems like the only reasonable way. Unfortunately, the learning curve and time involved to actually fully implement a working broad-phase collision detection is quite steep.
     
    Last edited: Nov 8, 2023
  3. Invertex

    Invertex

    Joined:
    Nov 7, 2013
    Posts:
    1,558
    You don't need to do it on the GPU. Look up the Unity DOTS physics system if you need to process large amounts of collisions. Or Havok to a lesser extent.
     
    Maareliic likes this.
  4. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    29
    Hi @Maareliic, unfortunately I've gotten so busy with other projects that I haven't had time to work on the collision algorithm. However, as Invertex has said you're probably much better off using the built in DOTS system which is far easier integrate into a project. I was only planning on making a toy demo that would return a bitfield of detected collisions--building out everything else around that on the CPU side would take considerable work, and is probably not worth your time.

    However, if you are hell bent on GPU driven collision detection you can follow this chapter from GPU-Gems. Good luck!
     
    Last edited: Nov 10, 2023
    Maareliic likes this.
  5. unity_Aerw3PU9eiDtOw

    unity_Aerw3PU9eiDtOw

    Joined:
    Apr 3, 2023
    Posts:
    2
    Is this project still active? I'm trying to set up a GPU accelerated radix sort for a project I'm working on, and it's a problem that's a little above my level. Finding an existing implementation that's generalizable would be awesome.
     
  6. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    29
    See the updated post.
     
  7. MyeongseongKim

    MyeongseongKim

    Joined:
    Sep 14, 2020
    Posts:
    1
    Is there any other way to run it in direct3d11 env?
     
  8. b0nes123

    b0nes123

    Joined:
    Nov 6, 2019
    Posts:
    29
    Unfortunately, no.