Search Unity

  1. If you have experience with import & exporting custom (.unitypackage) packages, please help complete a survey (open until May 15, 2024).
    Dismiss Notice
  2. Unity 6 Preview is now available. To find out what's new, have a look at our Unity 6 Preview blog post.
    Dismiss Notice

Using if-else statements in shaders; how complex is bad?

Discussion in 'Shaders' started by Mr_Admirals, Dec 24, 2018.

  1. Mr_Admirals

    Mr_Admirals

    Joined:
    May 13, 2017
    Posts:
    86
    Hey guys,

    So I think I found a good solution on how to do dynamic tessellation on my snow shader. The issue is that it likely involves an if-else statement.

    I know branching/jumps aren't performant for the GPU, so I have two questions:

    1. Is the below if-else statement acceptable in a shader?

    2. Is there any way to make this simpler or even not an if-else statement?

    Code (CSharp):
    1. if (factor > 0.0) {
    2.     f.edge[0] = _TessellationUniform;
    3.     f.edge[1] = _TessellationUniform;
    4.     f.edge[2] = _TessellationUniform;
    5.     f.inside = _TessellationUniform;
    6. }
    7. else {
    8.     f.edge[0] = 1;
    9.     f.edge[1] = 1;
    10.     f.edge[2] = 1;
    11.     f.inside = 1;
    12. }
    Thanks.
     
  2. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    508
    You can avoid if-else statement using "step".
    For example you can change this code

    Code (CSharp):
    1.  
    2. if (factor > 0.0) {
    3.     f.edge[0] = x;
    4. }
    5. else {
    6.     f.edge[0] = y;
    7. }
    8.  
    on this


    Code (CSharp):
    1.  
    2. fixed stepFactor = step(0.0, factor);
    3. f.edge[0] = lerp(x, y, stepFactor);
    4.  
     
  3. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,363
    No you can't, step() is implemented as an if else.

    However the difference is one is written assuming a branch where only one side of the code runs, which isn't how GPUs really work, and the other is written where both branches are always executed and only one side is used, which is how both examples actually end up running on the GPU.
     
    Xczzxcv and Kokowolo like this.
  4. dadude123

    dadude123

    Joined:
    Feb 26, 2014
    Posts:
    789
    step won't save you any performance.
    Internally it's computing both results.
    Keep in mind that GPUs work in a completely different way than CPUs.
    A GPU is not really able to do much logic stuff, and it shows everywhere, from how simple if()s work, to how texels are grouped into blocks, ...

    edit: oh S*** a new post; seems bgolus was typing a bit faster :p
     
    Kokowolo and bgolus like this.
  5. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,363
    Try this instead:

    float tess = factor > 0.0 ? _TessellationFactor : 1.0;
     
    Xczzxcv and rayD8 like this.
  6. Mr_Admirals

    Mr_Admirals

    Joined:
    May 13, 2017
    Posts:
    86
    Wouldn't this compile the same as an if-else statement? Or is the improvement just by creating only a single if-else statement (which I honestly have no idea why I didn't do to begin with, haha)?
     
    Kokowolo likes this.
  7. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    508
    I've just checked what glsp compile and you right,
    step (0, factor) just compiled to float(factor >= 0).
    I used this function everywhere instead of if-else....

    Anyway he can use somethink like that, for example

    Code (CSharp):
    1.  
    2. f.edge[0] = lerp(y, x, saturate((factor - 0.0001) * 10000))
    3.  
    if factor will have "~0.0001" or less, then "saturate((factor - 0.0001) * 10000)" will have 0, else 1
    In some cases it should be faster then if-else?
     
  8. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,363
    That was a suggestion against @kripto289 ‘s earlier suggestion. His is an if & a lerp, where mine is just an if.

    There are also two kinds of if-else statements on the GPU. There are real branches and fast comparisons, for lack of a better term. Branches in shaders really do exist on most modern hardware, but there’s a big cost to the branch existing, and a lot of gotchas that prevent them from actually acting as branches most of the time and still execute both paths. Most of the time unless the thing you're trying to avoid is really expensive (>20 instructions) a real branch is unlikely to be worth while. Fast compares on the other hand can do single simple tests like less than, or equal to, and return one of two values. On most hardware this is a single instruction, but on some older mobile hardware a lerp can faster.

    Luckily shader compilers know this too, and will (usually) compile code to not do a real branch.

    So best case the shader compiles your code “flattened” where both sides always execute and then it chooses the results from one side and throws away the rest using the fast comparison. The worst case is it does a real branch for something that doesn't need it, or flattens it in a non optimal way.

    It's often best to write your shader in the form that will be optimal rather than rely on the compiler to do it for you.
     
  9. Mr_Admirals

    Mr_Admirals

    Joined:
    May 13, 2017
    Posts:
    86
    Ah okay. Thank you all for the explanations and discussion! I've learned a lot today!

    This is what I have now:
    Code (CSharp):
    1. f.edge[0] = factor > 0.0 ? TessellationEdgeFactor(p1, p2) : 1.0;
    2. f.edge[1] = factor > 0.0 ? TessellationEdgeFactor(p2, p0) : 1.0;
    3. f.edge[2] = factor > 0.0 ? TessellationEdgeFactor(p0, p1) : 1.0;
    4. f.inside = factor > 0.0 ? (TessellationEdgeFactor(p1, p2) +
    5.                            TessellationEdgeFactor(p2, p0) +
    6.                            TessellationEdgeFactor(p0, p1)) * (1 / 3.0) : 1.0;
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,363
    I wanted to touch on this.

    That particular line will almost never be faster than the ? line. Let’s go over the two real quick.

    float val = lerp(x, y, factor);

    A lerp is generally implemented as x + factor * (y - x). This is two instructions on all GPUs, a SUB (y - x) and a MAD (factor * SUB + x).

    float val = factor > 0.0 ? y : x;

    On modern hardware this is one instruction to do the compare, and a second to swap the values around in memory. In older hardware this may have appeared as one instruction, but may have actually taken two or more clocks to complete, effectively “costing” two or more instructions.

    So, lerp or “> ? :” are effectively equivalent, but the lerp case assumes the factor is guranteed to be 0.0 or 1.0. That extra saturate((factor - 0.0001) * 10000) is one more instruction. It’s a single MAD (the saturate is free), but that line is now three instructions. Some old OpenGL ES 2.0 hardware may take 3 or 4 instructions to do the comparison vs 3 of the lerp & saturate mad, and in that case it’ll be faster. On pretty much any recent hardware OpenGL ES 3.0 or better it’s going to be the same 2 instruction cost as everywhere else.
     
  11. kripto289

    kripto289

    Joined:
    Feb 21, 2013
    Posts:
    508
    Big thanks for an explanation!
    Is there somewhere a detailed explanation of how shader instructions work inside, for example, pow / sin / step etc?
     
  12. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,363
    The Cg documentation is a decent start, but almost guaranteed to be wrong for the more complex functions. AMD has some documentation for various GPU families, but not necessarily everything. Nvidia posts almost nothing. Much of it is considered trade secrets, so it's more about using analysis tools and benchmarks to suss it out yourself.