Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Will a compute shader improve anything in this case?

Discussion in 'Shaders' started by 8bit4life, Aug 13, 2015.

  1. 8bit4life

    8bit4life

    Joined:
    Sep 30, 2014
    Posts:
    9
    Hi fellow Unity devs,

    I've implemented a full screen image effect that consists of two passes:
    1. Generate texture with shader #1 based on shader parameters (not based on source screen contents)
    2. Uses the output texture from shader #1 in shader #2 to create the destination screen contents based on the source screen contents

    I accomplish this using two Graphics.Blit operations. One for each phase.
    It works nicely (compared to the performance of other image effects I tested).

    However, I noticed that if I don't really do anything in shaders used in Graphics.Blit, the framerate drop coming from enabling the effect is still comparable to the framerate drop created by the full/meaningful version of my image effect. So the chaining of source/destination screen contents itself is killing performance already.

    Since shader #1 that generates the texture for shader #2 is not actually using the source screen contents to generate its output texture, I'm wondering if a compute shader would be even more performant for phase #1.

    The output texture of phase #1 is 1/16th of the screen dimensions. (pretty tiny)

    Do you think there would be a noticeable performance gain between a Graphics.Blit that takes a "null" as source and using a computer shader?

    Thank you for all suggestions.
     
  2. Dolkar

    Dolkar

    Joined:
    Jun 8, 2013
    Posts:
    576
    If you use the exact same logic in your compute shader as in the pixel shader, the compute shader is almost certainly going to be a bit slower. The point of using compute shaders is the freedom of data manipulation. Only if an algorithm allows specific optimizations that are not really possible in the vert/frag pipeline, it makes sense to use a compute shader instead.

    Since the output of pass #1 is so tiny, even if you do expensive operations in there, it won't have much of an impact. A 1/16th of texture dimensions translate to 1/256 as many shader invocations, or a 256 times faster blit than if it was done in full size. That suggests the vast majority of time is spent in the pass #2, which, by what you're saying, probably doesn't do anything expensive anyway, because the performance seems to be bound to the input / output bandwidth... You can't do much about that in an image effect, I fear. At the very least, you need to copy the source to the destination render texture, requiring a full screen texture read and write.
     
    8bit4life likes this.
  3. 8bit4life

    8bit4life

    Joined:
    Sep 30, 2014
    Posts:
    9
    Thank you for your informative reply Dolkar.
    Based on the conclusion, am I right when thinking that if in theory, I'd append the shader code of all the image effects I use (built in bloom, anti-alias and my custom shader) into one giant shader, the net effect would be a much shorter "chain" => much better performance?
    If this is true, isn't there a solution for this already? (like "bake image effects", so that it takes all you have and make a static run-time version that's combined into one single thing? - even rendering all possible combinations for dynamic individual effect enable/disable operations would be perfectly feasible I presume)
     
  4. Dolkar

    Dolkar

    Joined:
    Jun 8, 2013
    Posts:
    576
    Yes, that is very much true and is the reason post processing packs like Scion perform better than if the same effects were applied individually. It's not so simple, though, and does not work for all effects. If you combine bloom and anti-aliasing in the same pass, to make it look correct, you'd have to apply bloom separately to all the pixels sampled by the following anti-aliasing part, instead of just once. That's why it's best to only merge effects that use only a single texel from the source image, like tonemapping, color correction, vignetting effects, final AO pass, etc...
     
    8bit4life likes this.
  5. jistyles

    jistyles

    Joined:
    Nov 6, 2013
    Posts:
    34
    In practice, compute shaders for image/post effects are fantastic for performance if you adopt 16x16 xy thread buckets, and really make sure to abuse per group classified conditionals to early out where possible, or expand/contract things like sample count or dispatch indirect's for dependent passes.

    Do those three things, and post effects with compute shaders become trivial to outperform pixel shader throughput.
     
    8bit4life likes this.