Question Branching in Shaders

CortiWins · Jan 28, 2022

I found a Blogpost about branching and why and how it should be avoided.
https://exiin.com/blog/unity-shadergraph-how-to-properly-use-booleans-in-a-shader/
which is backed up by this other blogpost
http://xdpixel.com/how-to-avoid-branching-on-the-gpu/

Which recommends to avoid if/else and replace it by lerp

In Shadergraph 6.9, the recommended lerp is shown as a "possible outcome"
https://docs.unity3d.com/Packages/com.unity.shadergraph@6.9/manual/Branch-Node.html

While in the newer ShaderGraph 12.1, the less optimal predicate ? a : b is shown as a "possible outcome"
https://docs.unity3d.com/Packages/com.unity.shadergraph@12.1/manual/Branch-Node.html

So did unity replace a better version of the node's code or how much can the "possible outcome" be trusted?

hippocoder · Jan 28, 2022

It did matter - on ES 2.0 mobiles a long time ago in my tests. I think times have changed a bit though.

https://twitter.com/bgolus/status/1235254923819802626

@bgolus and @jbooth both don't seem to mind conditionals used correctly. I don't think shadergraph would be any different in this respect.

CortiWins · Jan 28, 2022

Sweet, thanks!

AcidArrow · Jan 28, 2022

CortiWins said: ↑

I found a Blogpost about branching and why and how it should be avoided.
https://exiin.com/blog/unity-shadergraph-how-to-properly-use-booleans-in-a-shader/
Click to expand...

Maybe I'm missing something, but this blog post seems flat our wrong to me?

It replaces a conditional with a lerp and then claims it's better now?

The problem with conditionals like that, is that people expected the code in them to only be executed if the conditions are fulfilled, but GPUs tend to execute everything anyway and just discard the result of the don't end up using it.

So the actual problem is if you were hiding expensive functions inside conditionals and you were expecting that they would not be evaluated for every pixel then nope, you were wrong. Somehow people came to the conclusion that conditionals are expensive, which is the wrong conclusion.

Qriva · Jan 28, 2022

I might be completely wrong, so please do not trust too much what I say, but so far my understanding of branching is that it all depends on the context, for example if I use constant like this:

Code (CSharp):

void SomeFunction(float Predicate, float4 TrueValue, float4 FalseValue, out float4 Out)

{

Out = Predicate ? TrueValue : FalseValue;

}

// [...] Somewhere in the code

SomeFunction(1, 0.5, 0.8, outVar);

Compiler knows it's always true and there will be no "if" at all in compiled code, but if predicate came from uniform variable it is not possible to determine the result during compilation, however it is uniform for all pipes, so it must be checked only once and this is why such simple if (aka static branch?) is super fast. As @AcidArrow said both sides are going to be executed anyway and if I am correct in HLSL ternary operator enforces that, and branch is never dynamic.

There is also dynamic braching and it happens when you test dynamic value that can be different per pixel, for example you sample texture or world space position. From what I know If you had huge chunk of code inside branch it could make sense (it will try to process one branch and it will be still worth after reverting), but often that is not the case. I think you can enforce types of branch in HLSL with [branch] or [flatten], however I have no knowledge how compiler picks type of branch without hint (I know it's flatten by default).
My guess is that short if can be optimized by compiler and it's better to let him do the work or they changed lerp to 'if' as it makes sense - the node name is "Branch", not "Lerp".

I hope I haven't written too many heresies

bgolus · Jan 28, 2022

Unfortunately both of those links are flat out wrong.

Quick bullet points:
Branches on GPUs are not a huge performance problem when used smartly on modern GPUs.
A conditional statement like an
if
or
?
in a shader does not mean it'll be a branch! Shader compilers will try to "smartly" choose whether to use a branch or a swap based on the code.
Most conditionals will end up not being a branch because the compiler won't choose to make it one in most cases.
Using a
lerp(b, a, step(y, x))
is always slower than
(x >= y ? a : b)
because the
step
is already doing
(x >= y ? 1 : 0)
so you're just adding two more instructions to do a lerp that could be skipped.
Any conditional can become a branch if the shader compiler decides to make it one! That includes
step
because it's a conditional statement! (Though this is highly unlikely.)
A
discard
or
clip()
is also a branch!
Let's focus on the xdpixel link since the exiin link is based on the information from that. That post talks about branch prediction and how missed predictions are bad for CPUs performance due to pipeline stalls. This is true.* Then it goes to talk about how on GPUs it's not really an issue of missed prediction, but rather potentially bad utilizations of the ALU if more than one branch path needed to run at the same time. This is also true, but we'll come back to that in a moment.

* Though it ignores the fact it's not really a major problem on modern CPUs as the pipeline is relatively short and memory access is almost alway the limiting performance factor for modern PCs.

The example he then gives is using the stereo eye index to branch between two possible color values and goes "see, branch bad!"

There's a big problem with that conclusion. In that example there will never be more than one branch ever executing at one time. It is the perfect use case for using branching on a GPU. Indeed the whole reason why the compiler chose to use a branch there is because it knows it's a good use case.

The big thing missing in the discussion about GPUs and ALU utilization in branches is what the phrase "at the same time" means. In the example given, it'd be a SIMD core with 8 ALU threads. On modern GPUs it's more like 32 or 64 ALU threads which different GPU manufacturers refer to as "waves" or "warps". When rendering pixels, each warp is rendering an 8x4 or 8x8 square of pixels on screen, and only on a single triangle at a time. If the value the branch is dependent on does not change across the entire triangle, or just within that group of pixels, then there's perfect utilization of those ALU threads!

In the example case of the stereo eye index, that is a value coming from the GPU that is guaranteed to be constant for the entire triangle. So the branch will always only ever take one path for all threads in the group. It should be a branch!

Other types of values that are great for branching on:
Material properties

Instance ID or Instanced properties

Primitive ID
Values passed from the vertex to the fragment using a
nointerpolation
modifier.
All of those will be constant across the entire triangle when rendering the fragment shader, so there will never be any issue with both branches running. GPU can guarantee that ahead of time.

That all said, it's still not necessarily a bad thing to use a real branch in other cases where you know the value will often be consistent for many of those "warp" groups. For example branching on a texture mask or regular interpolated value passed to the fragment shader. If it means avoiding some expensive calculations for a good portion of the screen, then great. And think about it this way: if you don't use a branch you're just doing the expensive calculation all of the time anyway.

Modern GPUs are also just shockingly fast these days, even on mobile. They can do an amazing amount of math without any problem. Memory bandwidth is often the bigger limiting factor. So things like using a lot of interpolators to pass data between the vertex and fragment shader can be much slower than recalculating the same data in the fragment shader, or sampling from a lot of textures, or a lot of random positions within a single texture, etc. can be the thing that makes things slow. Your shader using a branch or not probably isn't going to be the factor that is limiting your performance.

AcidArrow said: ↑

The problem with conditionals like that, is that people expected the code in them to only be executed if the conditions are fulfilled, but GPUs tend to execute everything anyway and just discard the result of the don't end up using it.

So the actual problem is if you were hiding expensive functions inside conditionals and you were expecting that they would not be evaluated for every pixel then nope, you were wrong. Somehow people came to the conclusion that conditionals are expensive, which is the wrong conclusion.
Click to expand...

This was especially true in the days before Direct3D 10.0 and OpenGLES 3.1 where a lot of GPUs did not actually support branches at all! All code was always run no matter what, so like @AcidArrow described people would write an if statement and get terrible performance and then blame the if statement not realizing it didn't actually do a branch at all. But, as mentioned above, this can still be the case today as most of the time the GPU will not actually end up branching because the shader compiler won't compile conditionals as a branch the majority of the time. And you end up with the same problem of "I used an if statement and it's slow, so it must be the if statement's fault" misdiagnosis.

Qriva said: ↑

Compiler knows it's always true and there will be no "if" at all in compiled code
Click to expand...

If you're branching based on hardcoded values within the shader code itself, those will never end up in the compiled shader as a branch because the shader compiler will calculate the result and use that value instead. But compiler code stripping is kind of a different topic.

Qriva said: ↑

if I am correct in HLSL ternary operator enforces that, and branch is never dynamic.
Click to expand...

If you read that twitter thread you'll see I was wrong about that. A ternary (
(x > y ? a : b)
) can end up being a branch if the shader compiler decides to make it one. It's just very rare because most of the time the compiler won't make any conditional a branch unless you explicitly ask it to make one or you're testing against a value that it knowns will be constant.

bgolus · Jan 28, 2022

To answer a specific question in the original post

CortiWins said: ↑

how much can the "possible outcome" be trusted?
Click to expand...

All of Unity's Shader Graph documentation has that "possible outcome" phrase. It seems to be used as a catch all for cases where the code generation might have multiple options, or for cases where the version of Shader Graph you're using and the version of the documentation you're looking at don't match. But in the case of the Branch node it can be trusted 100% to be those outcomes for those versions of Shader Graph since that's the only code they generate.

Here's the old version of the node, which you can see at the bottom only has one code snippet doing a lerp.
https://github.com/Unity-Technologi...Editor/Data/Nodes/Utility/Logic/BranchNode.cs

And here's the latest version, which also has only one code snippet, but doing the ternary.
https://github.com/Unity-Technologi...Editor/Data/Nodes/Utility/Logic/BranchNode.cs

Other nodes often have a lot more code snippets, or call a function in an external file that might have different functions depending on the platform.

Also they changed those nodes because I and several others complained about the Branch node being a lerp instead of a ternary, and Unity verified internally that it was slower and changed it. The funny thing being, as discussed above, it probably won't actually be a branch!

Qriva · Jan 28, 2022

I am always amazed at how extensive your answers are.

I want to make something clear - does it matter for compiler if tested value (in branch) comes from consistent source? I mean, do I pay cost only when it comes to this situation when different branches are taken or compiler optimizes it differently (structure). Is branch like this
if (materialProp > 0) ...
exactly the same type as
if (randomVal > 0) ...
, but the second one is bad only becuase branch result will be different very often by definition?

bgolus said: ↑

If you read that twitter thread you'll see I was wrong about that. A ternary (
(x > y ? a : b)
) can end up being a branch if the shader compiler decides to make it one.
Click to expand...

I think initially I based this knowledge on this answer: https://stackoverflow.com/a/41871876

bgolus said: ↑

If you're branching based on hardcoded values within the shader code itself, those will never end up in the compiled shader as a branch because the shader compiler will calculate the result and use that value instead. But compiler code stripping is kind of a different topic.
Click to expand...

Yeah, I meant code stripping here, however I would like to ask something when it comes to this topic.
Here is the code from URP Lit shader (source link):

Code (CSharp):

#ifndef _SPECULARHIGHLIGHTS_OFF

[branch] if (!specularHighlightsOff)

{

brdf += brdfData.specular * DirectBRDFSpecular(brdfData, normalWS, lightDirectionWS, viewDirectionWS);

// [...]

}

#endif // _SPECULARHIGHLIGHTS_OFF

There is constant "variable"
specularHighlightsOff
passed to this function based on
_SPECULARHIGHLIGHTS_OFF
keyword, what is the reason to create branch here if result will be always the same, does it serve some special purpose or it is just mistake? Is compiler going to strip this anyway?

AcidArrow · Jan 28, 2022

Qriva said: ↑

Is branch like this
if (materialProp > 0) ...
exactly the same type as
if (randomVal > 0) ...
, but the second one is bad only becuase branch result will be different very often
Click to expand...

Those are not branches, those are just conditionals which most of the time will not become branches.

Unless one of those variables is a constant, those two conditionals are the same performance wise.

bgolus · Jan 28, 2022

Qriva said: ↑

I want to make something clear - does it matter for compiler if tested value (in branch) comes from consistent source? I mean, do I pay cost only when it comes to this situation when different branches are taken or compiler optimizes it differently (structure). Is branch like this
if (materialProp > 0) ...
exactly the same type as
if (randomVal > 0) ...
, but the second one is bad only becuase branch result will be different very often by definition?
Click to expand...

AcidArrow said: ↑

Unless one of those variables is a constant, those two conditionals are the same.
Click to expand...

It does actually matter.

If it decides to compile as a branch, and the branch is coming from a source the compiler can guarantee is constant per warp, it's a "fast" dynamic branch. If it can't be guaranteed to be constant, then it's a "slow" dynamic branch. This distinction seems to matter more on mobile and AMD GPUs than it does on Nvidia GPUs.

(edit: To clarify "fast" dynamic branches are basically free on desktop GPUs, where "slow" dynamic branches will cost a few cycles in overhead.)

There are complicated ways in some of the latest graphics APIs to tell the GPU the value is going to be constant per warp even if it's not coming from a source that normally is that I've seen discussed for use with clustered & tiled lighting setups to force fast branches. But I'm honestly not up to speed enough with those to explain how that works apart from at the very high level "that's what it's doing".

Qriva said: ↑

There is constant variable
specularHighlightsOff
passed to this function based on
_SPECULARHIGHLIGHTS_OFF
keyword, what is the reason to create branch here if result will be always the same, does it serve some special purpose or it is just mistake? Is compiler going to strip this anyway?
Click to expand...

The answer is yes, it will strip that code, except when it doesn't.

There are multiple declarations of the
LightingPhysicallyBased
function in that file. Some of which take a
specularHighlightsOff
and some that don't. In the ones that don't, the value is hardcoded based on the define and then it calls the function that does take that as an input. In that case the if statement and the preceding
[branch]
will be stripped. However if the shader code is directly calling the function with the bool, and that bool isn't hardcoded, then it'll try to be a branch.

bgolus · Jan 28, 2022

Another wrench to throw into all of this is the "compiled shader" isn't the final form. Even if the "compiled shader" has a branch in it, that shader gets "compiled" again when the drivers take it and convert it to assembly code for the current GPU hardware. That can also make decisions to make something a branch, or not.

hippocoder · Jan 28, 2022

I'll sticky this thread as there's a whole bunch of quality info.

CortiWins · Jan 30, 2022

Wow, this is far more than i expected. Thanks everyone!

tutorial4unity · Dec 24, 2022

Late to the party, can I just say this is one amazing thread

AshwinMods · Jan 31, 2023

I can't express how blessed i feel while reading detailed and sharp answers from bgolus. Thank you sir.

found another good explanation here.
https://solidpixel.github.io/2021/12/09/branches_in_shaders.html

about "DFC best used on spatially-coherent branches"
https://developer.amd.com/wordpress/media/2012/10/03_Clever_Shader_Tricks.pdf

an interesting post for visualizing branch and sample counts.
https://medium.com/@jasonbooth_86226/branching-on-a-gpu-18bfc83694f2

One of highlight for me is *it's okay if same branch is being used for that entire triangle*.
What can be passed from Vertex Shader that is consistent for all three vertices of triangle,
So that after interpolation, the value is still above/below a certain threshold to have a common branch in fragment shader.
Or Compiler won't understand what i am trying to do, and the output will be very different ?

vlery · Mar 10, 2023

bgolus said: ↑

Another wrench to throw into all of this is the "compiled shader" isn't the final form. Even if the "compiled shader" has a branch in it, that shader gets "compiled" again when the drivers take it and convert it to assembly code for the current GPU hardware. That can also make decisions to make something a branch, or not.
Click to expand...

Hey, I just encounter a weird issue that in a reproduce case in HLSL in renderdoc and simple logic looks like below:
[branch]
if(View.HasLight){
// do something, But may has NaN or Inf or precision issue
}

I tried to fix artifact like half pixels on a model is totally wrong from some directions. We fix some obvious inpropriate code in the dynamic branch and got the right result. But the problem the 'View.HasLight = 0' and debug in renderdoc can verify that step can successfully skip it.
Just wondering is there any strategies for hlsl compiler that affects in this case the branch is stripped but there has bug inside it?

jacketjlzUnity · Jun 6, 2023

This is such a facinating read! Thank you @bgolus
Could you please elaborate on how branching works in following cases?

1. Branching on multiple material props:

Code (CSharp):

[branch]

if(materialProp1 * materialProp2 > 0)

{

//some code

}

2. On a (float) materialProp as index of another (vector) materialProp:

Code (CSharp):

uint index = materialFloatProp;

float value = materialVectorProp[index];

[branch]

if(value > 0)

{

//some code

}

3. And what about embedded ifs?:

Code (CSharp):

[branch]

if(materialProp1 > 0)

{

//some code

[branch]

if(materialProp2 > 0)

{

//some code

}

}

bgolus · Jun 6, 2023

Assuming the various material properties are never written to by the shader code, these can all be handled as "fast" dynamic branches as the values will be constant across the entire draw.

Again, whether or not they will be depends on the compiler, but they can be.

jacketjlzUnity · Jun 7, 2023

bgolus said: ↑

Assuming the various material properties are never written to by the shader code, these can all be handled as "fast" dynamic branches as the values will be constant across the entire draw.

Again, whether or not they will be depends on the compiler, but they can be.
Click to expand...

Thank you very much for the quick reply!

EricFFG · Jun 12, 2023

There are also some interesting things in this talk from Jason Booth

(I was also surprised that Chat GPT knew about a lot of intricate GPU architecture details I would have never expected due to their extreme niche)

For Branches I do understand that it should be avoided heavily that the branch condition is any expensive, as it might become quickly more expensive than the savings

So I gather using things like branching by texture masks should be heavily avoided
And stuff like Normal Direction Y, or Vertex color, or Object position (Or of course simple statics) are viable to branch and remove expensive areas on the material when not needed ?

bgolus · Jun 12, 2023

Funny you link both Jason's video which shows a use case where heavy use of dynamic branches lead to substantially improved performance vs not, and then follow that up with the conclusion that branches should be avoided. Trust Jason, don't trust ChatGPT.

ChatGPT's main power is the ability to tell lies with plausible authority by giving (mostly) accurate information for a bit before it devolves to total bullshot.

bgolus · Jun 12, 2023

To be more targeted in my response:

EricFFG said: ↑

So I gather using things like branching by texture masks should be heavily avoided
Click to expand...

It's not that they should be avoided, it's that there is a cost to them that should be kept in mind. The terrain shader example in Jason's above video is doing branching based on texture samples, but it saves enough work that the cost of doing that branch is worth it.

More specifically, you should always test and validate that your changes are improving performance on target platforms! Use branches, see if it helps, try to figure out why if it doesn't.

EricFFG · Jun 12, 2023

bgolus said: ↑

Funny you link both Jason's video which shows a use case where heavy use of dynamic branches lead to substantially improved performance vs not, and then follow that up with the conclusion that branches should be avoided. Trust Jason, don't trust ChatGPT.

ChatGPT's main power is the ability to tell lies with plausible authority by giving (mostly) accurate information for a bit before it devolves to total bullshot.
Click to expand...

It was phrased as a question

That was completely unrelated to the chat, but it is clear that the savings must exceed the cost of the sample lookup for the branch and the branch can be quite expensive with a sample from what I heard in the office
If you have a hyper expensive terrain shader (without branching) that might still be very easily worth it, for sure

But doing a texture mask based branch to cut-off a single sample of sand I must assume is negative value, as the sample check is more expensive than the sample. But on the other hand, doing a World height or normal direction based cutoff could be well worth it, for something as simple as a top layer of sand on a mesh, as the check must be faster than the sample lookup.

(Its very hard to test these small optimizations, lately I tried to get some performance impact to test some culling and I tried for 15 minutes to make the most expensive shader possible with a ton of parallax stacked and whatnot and the GPUs I have available are barely caring - trying to see a branch of a single texture sample in a profiler - no way)

bgolus · Jun 12, 2023

Make sure you're profiling the GPU, not CPU. By default Unity's profiler isn't showing GPU times at all. When GPU profiling enabled you can see the cost of individual draws. However yes, it can still be hard to see a difference. But also, if you can't see a difference that should tell you that the branch isn't that expensive in of itself (or the shader compiler decided it wasn't worth it and it didn't compile as a branch at all anyway).

The "branches are expensive" mentality comes from 20 years ago when the cost of adding 10 instructions to a shader needed to have an engineering meeting to discuss if it was worthwhile. The Standard shader's fragment shader is between 200 and 600 instructions depending on the use case (baked lighting is more expensive!!!), and the HDRP is thousands of instructions.

bgolus · Jun 12, 2023

EricFFG said: ↑

(??) I assume its more nasty to use samples for branching as then the entire branch is tied into the sample which has to recursively know about the sample to check against, basically "infecting" all the linked nodes with the sample lookup, where they would have been previously unlinked, while you could check cheaply for world position or whatever at little cost instead (? Is this how this works?)
Click to expand...

Using a texture sample is more costly because you have to sample the texture first before running the branch. Sampling a texture is one of the most expensive things you can do in a shader, not because the shader itself has to do a lot of work, but in fact the opposite, because it has to wait for other bits of hardware on the GPU to do that work. Most of the time the shader compiler will do it's best to reshuffle when things happen in the shader, putting texture samples at the very start of the shader and anything that's not dependent on those texture reads immediately afterwards to hide the latency between calling "tex2D(myTexture, uv)" and that data being accessible.

This also makes it very hard sometimes to understand how expensive something is in a shader as adding an extra texture sample, or a dozen, could have negligible impact on the performance in an already complex shader, but massive impact on a more simple one.

EricFFG · Jun 12, 2023

Yes always with GPU

Yes but its very easy to do mistakes with the branch.
I had made a debugging subgraph. This subgraph has 8 texture samplers for debugging or something inside, and then is branched off. The GPU still has to allocate the memory to initialize them, and also calculate it from what I understand,(as the compilers still compile it all) just then dosn't pass it on. I read that even as static the compilers still compile the branches in shaders.
So this would be a big mistake, but one might think its all safe and dandy and I thought "oh no problem just branch it". So my 3 sampler shader I spend weeks optimizing would have 8 hidden ones in memory.

So you can't really say that branching is not expensive, it is not expensive and greatly beneficial if done correctly.
But it can also be expensive and a waste, like in my example before. Branching off 1 texture sampler by using another texture sampler is a negative in performance. Or you could even branch off a color by using a texture sampler mask, which would be more than a full extra sampler in cost, which would be quite bad, especially if you do these multiple times. So if doing wrong things, your 4 texture sampler shader could be a multitude of the cost in samplers (the samplers are usually the key performance hog in a typical shader) or you'd make a mess for no reason.

For the Sampler topic, ive heard that you can actually have ""free"" calculations after the sampler (to a degree), as the GPU has to wait anyways for the sampler to pass the cache, so these would essentially be done in the meantime for no real change in cost.

bgolus · Jun 12, 2023

The topic of samplers inside branches is a deep and complicated one. The short answer is sampling a texture inside a branch is undefined behavior in the D3D spec. On some GPUs this means any texture sample inside a branch is always sampled anyways and the branch is ignored. Others it respects the branch, but there can be visual anomalies around the edge. And others handle it all gracefully sampling the texture where it's needed and not where it's not. Which behavior you might get depends on the generation of GPU, and it's not as simple as Nvidia vs AMD vs Mali.

One thing I'll say about Jason's terrain shader is he understands GPUs quite well and he's doing some extra tricks there. The main one is he's making heavy use of texture arrays, and the branches change which array indices and UVs to use, but the number of samples and texture objects being referenced stay constant. In DX12 and Vulkan there are ways to change the texture itself, but that's not something Unity supports as it requires a very different way of handling asset declaration.

Goularou · Aug 22, 2023

bgolus said: ↑

Funny you link both Jason's video which shows a use case where heavy use of dynamic branches lead to substantially improved performance vs not, and then follow that up with the conclusion that branches should be avoided. Trust Jason, don't trust ChatGPT.
Click to expand...

The same Jason Booth made a good blog on branching:
https://medium.com/@jasonbooth_86226/branching-on-a-gpu-18bfc83694f2

Thank you @bgolus , so much!
PS: @bgolus why not writing a book on GPU programming, or doing a blog / I learnt a bunch from your posts, but they are scattered apart, by definition...

bgolus · Aug 22, 2023

Goularou said: ↑

PS: @bgolus why not writing a book on GPU programming, or doing a blog / I learnt a bunch from your posts, but they are scattered apart, by definition...
Click to expand...

There are a few reasons why I don't write a book. For one, a book usually deals with abstract and artificial use cases. Ones which I would have to come up with and which may not be ones people in the real world actually encounter. I believe addressing specific use cases in which people actually have problems understanding how to achieve some goal or fix some problem or just better insight as to why what they're trying isn't working is more useful than a book that gives general information. My aim is to help people understand shaders, and while I may have answered similar questions multiple times, usually the solutions are different because the goal isn't exactly the same.

Secondly, books cost money. I don't respond on these forums with the intent to make a profit. Unity certainly doesn't pay me to do it. Sure I could release some PDF for free, but no one will read that. Today people find what they need by searching online via Google or some other search engine. And a book isn't going to be what they click on.

Third, and the most important one. I'm lazy. A book sounds like a lot of work.

thang_unity516 · Jan 10, 2024

Hi guys, I came across this topic with branching problem, hope you guys give me some helps.

As I know, some specific cases of branching will made the shader run slower depending on GPU. Assuming that:
- I want to run a shader on various GPUs (especially mobile)
- The logics in each branch are simple and force branching. Example:

Code (CSharp):

half4 color = input.u > input.v ? half4(0) : half4(1)

Is turning the branching to a calculation a good solution for it (Reference: https://theorangeduck.com/page/avoiding-shader-conditionals)? After I read this article, I feel that this way kind of magical to me: does the function like max, abs, sign,... don't turn into an if underhand?

Thanks for your help

AshwinMods · Jan 11, 2024

thang_unity516 said: ↑

- The logics in each branch are simple and force branching. Example:

Code (CSharp):

half4 color = input.u > input.v ? half4(0) : half4(1)

Is turning the branching to a calculation a good solution for it
Click to expand...

As UV will be different for all fragments and FAST branching is not the option here,
Compiler will do what's required to get same output. I don't think much can be saved from writing an equation here.
(Can't wait to be corrected by someone here)

thang_unity516 said: ↑

After I read this article, I feel that this way kind of magical to me: does the function like max, abs, sign,... don't turn into an if underhand?
Click to expand...

I also read some articles similar to this, after which I don't take many functions like Step, Sign, Sin, Abs, etc for granted.
https://interplayoflight.wordpress....hinking-in-high-level-shading-languages-2023/

But as always, Everything must be tested on device, as look at the semi-final assembly code.
or maybe in HLSL section of online compiler websites. https://godbolt.org/

thang_unity516 · Jan 11, 2024

To be clear, what I intended to do is convert this

Code (CSharp):

half4 color = input.u > input.v ? half4(0) : half4(1)

into something like this

Code (CSharp):

float compareResult = isHigher(input.u, input.v); //true return 1 and otherwise

half4 color = compareResult * half4(0) + (1 - compareResult) * half4(1)

I don't know if this is more performance in this case

bgolus · Jan 11, 2024

The "is higher" function is the
step()
function. It is an if statement.

The second example "optimized" code is three to four times slower than just using the ternary statement (
x > y ? A : B
) alone, as the second example still has the ternary hidden behind the
step()
, and then what is effectively a lerp, which is another few instructions.

Kobix · Jan 23, 2024

In picture provided, that node & branch should mean that compilation will 'bake' conditional, aka zero branching/conditional is result?

bgolus · Jan 24, 2024

It shouldn't cause any branching, no. The shader compiler should be smart enough to know the other "branch" can never be executed and that "branch's" code stripped out.

Kobix · Jan 25, 2024

Thank you .

Search Unity

Unity ID

Useful Searches

Question Branching in Shaders

Digital Ape

Digital Ape

Attached Files: