Using if-else statements in shaders; how complex is bad?

Mr_Admirals · Dec 24, 2018

Hey guys,

So I think I found a good solution on how to do dynamic tessellation on my snow shader. The issue is that it likely involves an if-else statement.

I know branching/jumps aren't performant for the GPU, so I have two questions:

1. Is the below if-else statement acceptable in a shader?

2. Is there any way to make this simpler or even not an if-else statement?

Code (CSharp):

if (factor > 0.0) {

f.edge[0] = _TessellationUniform;

f.edge[1] = _TessellationUniform;

f.edge[2] = _TessellationUniform;

f.inside = _TessellationUniform;

}

else {

f.edge[0] = 1;

f.edge[1] = 1;

f.edge[2] = 1;

f.inside = 1;

}

Thanks.

kripto289 · Dec 24, 2018

Mr_Admirals said: ↑

Hey guys,

So I think I found a good solution on how to do dynamic tessellation on my snow shader. The issue is that it likely involves an if-else statement.

I know branching/jumps aren't performant for the GPU, so I have two questions:

1. Is the below if-else statement acceptable in a shader?

2. Is there any way to make this simpler or even not an if-else statement?

Code (CSharp):

if (factor > 0.0) {

f.edge[0] = _TessellationUniform;

f.edge[1] = _TessellationUniform;

f.edge[2] = _TessellationUniform;

f.inside = _TessellationUniform;

}

else {

f.edge[0] = 1;

f.edge[1] = 1;

f.edge[2] = 1;

f.inside = 1;

}

Thanks.
Click to expand...

You can avoid if-else statement using "step".
For example you can change this code

Code (CSharp):

if (factor > 0.0) {

f.edge[0] = x;

}

else {

f.edge[0] = y;

}

on this

Code (CSharp):

fixed stepFactor = step(0.0, factor);

f.edge[0] = lerp(x, y, stepFactor);

bgolus · Dec 24, 2018

kripto289 said: ↑

You can avoid if-else statement using "step".
Click to expand...

No you can't, step() is implemented as an if else.

However the difference is one is written assuming a branch where only one side of the code runs, which isn't how GPUs really work, and the other is written where both branches are always executed and only one side is used, which is how both examples actually end up running on the GPU.

dadude123 · Dec 24, 2018

kripto289 said: ↑

You can avoid if-else statement using "step".
Click to expand...

step won't save you any performance.
Internally it's computing both results.
Keep in mind that GPUs work in a completely different way than CPUs.
A GPU is not really able to do much logic stuff, and it shows everywhere, from how simple if()s work, to how texels are grouped into blocks, ...

edit: oh S*** a new post; seems bgolus was typing a bit faster

bgolus · Dec 24, 2018

Try this instead:

float tess = factor > 0.0 ? _TessellationFactor : 1.0;

Mr_Admirals · Dec 24, 2018

bgolus said: ↑

Try this instead:

float tess = factor > 0.0 ? _TessellationFactor : 1.0;
Click to expand...

Wouldn't this compile the same as an if-else statement? Or is the improvement just by creating only a single if-else statement (which I honestly have no idea why I didn't do to begin with, haha)?

kripto289 · Dec 24, 2018

bgolus said: ↑

No you can't, step() is implemented as an if else.

However the difference is one is written assuming a branch where only one side of the code runs, which isn't how GPUs really work, and the other is written where both branches are always executed and only one side is used, which is how both examples actually end up running on the GPU.
Click to expand...

I've just checked what glsp compile and you right,
step (0, factor) just compiled to float(factor >= 0).
I used this function everywhere instead of if-else....

Anyway he can use somethink like that, for example

Code (CSharp):

f.edge[0] = lerp(y, x, saturate((factor - 0.0001) * 10000))

if factor will have "~0.0001" or less, then "saturate((factor - 0.0001) * 10000)" will have 0, else 1
In some cases it should be faster then if-else?

bgolus · Dec 24, 2018

That was a suggestion against @kripto289 ‘s earlier suggestion. His is an if & a lerp, where mine is just an if.

There are also two kinds of if-else statements on the GPU. There are real branches and fast comparisons, for lack of a better term. Branches in shaders really do exist on most modern hardware, but there’s a big cost to the branch existing, and a lot of gotchas that prevent them from actually acting as branches most of the time and still execute both paths. Most of the time unless the thing you're trying to avoid is really expensive (>20 instructions) a real branch is unlikely to be worth while. Fast compares on the other hand can do single simple tests like less than, or equal to, and return one of two values. On most hardware this is a single instruction, but on some older mobile hardware a lerp can faster.

Luckily shader compilers know this too, and will (usually) compile code to not do a real branch.

So best case the shader compiles your code “flattened” where both sides always execute and then it chooses the results from one side and throws away the rest using the fast comparison. The worst case is it does a real branch for something that doesn't need it, or flattens it in a non optimal way.

It's often best to write your shader in the form that will be optimal rather than rely on the compiler to do it for you.

Mr_Admirals · Dec 25, 2018

Ah okay. Thank you all for the explanations and discussion! I've learned a lot today!

This is what I have now:

Code (CSharp):

f.edge[0] = factor > 0.0 ? TessellationEdgeFactor(p1, p2) : 1.0;

f.edge[1] = factor > 0.0 ? TessellationEdgeFactor(p2, p0) : 1.0;

f.edge[2] = factor > 0.0 ? TessellationEdgeFactor(p0, p1) : 1.0;

f.inside = factor > 0.0 ? (TessellationEdgeFactor(p1, p2) +

TessellationEdgeFactor(p2, p0) +

TessellationEdgeFactor(p0, p1)) * (1 / 3.0) : 1.0;

bgolus · Dec 25, 2018

kripto289 said: ↑

if factor will have "~0.0001" or less, then "saturate((factor - 0.0001) * 10000)" will have 0, else 1
In some cases it should be faster then if-else?
Click to expand...

I wanted to touch on this.

That particular line will almost never be faster than the ? line. Let’s go over the two real quick.

float val = lerp(x, y, factor);

A lerp is generally implemented as x + factor * (y - x). This is two instructions on all GPUs, a SUB (y - x) and a MAD (factor * SUB + x).

float val = factor > 0.0 ? y : x;

On modern hardware this is one instruction to do the compare, and a second to swap the values around in memory. In older hardware this may have appeared as one instruction, but may have actually taken two or more clocks to complete, effectively “costing” two or more instructions.

So, lerp or “> ? :” are effectively equivalent, but the lerp case assumes the factor is guranteed to be 0.0 or 1.0. That extra saturate((factor - 0.0001) * 10000) is one more instruction. It’s a single MAD (the saturate is free), but that line is now three instructions. Some old OpenGL ES 2.0 hardware may take 3 or 4 instructions to do the comparison vs 3 of the lerp & saturate mad, and in that case it’ll be faster. On pretty much any recent hardware OpenGL ES 3.0 or better it’s going to be the same 2 instruction cost as everywhere else.

kripto289 · Dec 25, 2018

bgolus said: ↑

I wanted to touch on this.

That particular line will almost never be faster than the ? line. Let’s go over the two real quick.

float val = lerp(x, y, factor);

A lerp is generally implemented as x + factor * (y - x). This is two instructions on all GPUs, a SUB (y - x) and a MAD (factor * SUB + x).

float val = factor > 0.0 ? y : x;

On modern hardware this is one instruction to do the compare, and a second to swap the values around in memory. In older hardware this may have appeared as one instruction, but may have actually taken two or more clocks to complete, effectively “costing” two or more instructions.

So, lerp or “> ? :” are effectively equivalent, but the lerp case assumes the factor is guranteed to be 0.0 or 1.0. That extra saturate((factor - 0.0001) * 10000) is one more instruction. It’s a single MAD (the saturate is free), but that line is now three instructions. Some old OpenGL ES 2.0 hardware may take 3 or 4 instructions to do the comparison vs 3 of the lerp & saturate mad, and in that case it’ll be faster. On pretty much any recent hardware OpenGL ES 3.0 or better it’s going to be the same 2 instruction cost as everywhere else.
Click to expand...

Big thanks for an explanation!
Is there somewhere a detailed explanation of how shader instructions work inside, for example, pow / sin / step etc?

bgolus · Dec 25, 2018

The Cg documentation is a decent start, but almost guaranteed to be wrong for the more complex functions. AMD has some documentation for various GPU families, but not necessarily everything. Nvidia posts almost nothing. Much of it is considered trade secrets, so it's more about using analysis tools and benchmarks to suss it out yourself.

Search Unity

Unity ID

Useful Searches

Using if-else statements in shaders; how complex is bad?