Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Compute shader question

Discussion in 'Shaders' started by DaveHoskins, Aug 22, 2015.

  1. DaveHoskins

    DaveHoskins

    Joined:
    Sep 9, 2013
    Posts:
    190
    Hi, is it possible to add up all the pixel values of a RenderTexture, and output the single total value of all those pixels?
    A specific and distinctively non-parallel task but probably still faster than ReadPixels/GetPixels then doing a loop in c#?
     
  2. Plutoman

    Plutoman

    Joined:
    May 24, 2013
    Posts:
    257
    Doubtful, it would be a singular task. I can't think of any way to parallelize that into a compute shader form. It'd be run through 1,1,1 which would be quite slow on the GPU, since a single core is fairly slow compared to a CPU core.

    Someone might be able to chime in with a way to use shared memory or somesuch, but I would imagine GetPixels + a multi-threaded loop would be many times faster for this purpose.
     
  3. mholub

    mholub

    Joined:
    Oct 3, 2012
    Posts:
    123
    You may probably use mipmap for this task. Just create rendertexture, make sure it has mipmaps on, then Graphic.Blit from it to temporal texture which is 1x1 and then ReadPixels (just one pixel) from this texture. Then there you will have average value of texture and if you multiply it by original texture size, you will get approximate value of sum.

    You can sum up pixel values in compute shader with scan algorithm http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
    It is pretty complex to implement in optimal way. If you want to learn about how to write parallel algorithms, I really recommend this free udacity course: https://www.udacity.com/course/intro-to-parallel-programming--cs344
     
    Last edited: Aug 22, 2015
    hippocoder likes this.
  4. Plutoman

    Plutoman

    Joined:
    May 24, 2013
    Posts:
    257
    That algo is quite interesting, I hadn't seen that before. Thanks for the link there.
     
  5. jbooth

    jbooth

    Joined:
    Jan 6, 2014
    Posts:
    5,445
    You could also do this in a standard pixel shader using multiple passes. Basically, write a downsampling shader that adds instead of averages a 2x2 block of pixel samples into a single output pixel, then run it until you get down to a 1x1 texture.
     
  6. jistyles

    jistyles

    Joined:
    Nov 6, 2013
    Posts:
    34
    Yes you can do this, yes you can abuse parallelism to quite a high degree with it, and yes it is a common operation for dx11 era computation of a number of use-cases in varying degrees of complexity, eg:
    - real time histograms
    - dynamic exposure component of tone mapping
    - Summed area tables
    - dynamic depth analysis (useful for dynamic depth of field focus)

    There's a number of common ways to adapt this type of operation to compute shaders, and although each method may be quite different, they all share the same read bottleneck so have fairly similar solutions for that part.

    For instance you can use atomic add to sum up everything in a group, write out that group result to either an image to be multi-passed 'til you get your desired final step, or written into an append buffer to decimate in 1D, or even a float RWTexture so you can keep it all in the one pass and iterate across thread groups (BIG TIP: if you can manually guarantee lock step order of your op's, you don't need to stall with memory barriers since no sync is required. This is an abuse of the hardware, but works amazingly well).
    More advanced schema's exist for managing the dispatch, bandwidth, and cache concerns. For example, waterfalling to minimise reads across groups while keeping dispatches as tightly bunched as possible; on most hardware this keeps it nice and cache coherent, saves a lot of read time since all your operations are spatially coherent.

    One common issue with compute shader info is there's not a lot of good public examples out in the wild. It's either all academic, amateur, non-existent, or plain bad.

    I'd recommend grokking some shaderX/gpu gem style literature on the topic :D
    In the meantime, here's an old reduction example:
    http://s08.idav.ucdavis.edu/boyd-dx11-compute-shader.pdf

    Here's a SAT example:
    http://docs.nvidia.com/gameworks/co...amples/d3d_samples/d3dcomputefiltersample.htm
     
  7. DaveHoskins

    DaveHoskins

    Joined:
    Sep 9, 2013
    Posts:
    190
    I'm, not looking for the average value, but the actual sum.
    Thanks for the answers and links guys, that Nvidia page looks complex!
    Dividing the texture up into smaller groups seems like a good idea. This is a research thing, so I have the full range of my own GPU to play with.