Search Unity

Question Branching in Shaders

Discussion in 'Shader Graph' started by CortiWins, Jan 28, 2022.

  1. CortiWins

    CortiWins

    Joined:
    Sep 24, 2018
    Posts:
    150
    I found a Blogpost about branching and why and how it should be avoided.
    https://exiin.com/blog/unity-shadergraph-how-to-properly-use-booleans-in-a-shader/
    which is backed up by this other blogpost
    http://xdpixel.com/how-to-avoid-branching-on-the-gpu/

    Which recommends to avoid if/else and replace it by lerp

    In Shadergraph 6.9, the recommended lerp is shown as a "possible outcome"
    https://docs.unity3d.com/Packages/com.unity.shadergraph@6.9/manual/Branch-Node.html

    While in the newer ShaderGraph 12.1, the less optimal predicate ? a : b is shown as a "possible outcome"
    https://docs.unity3d.com/Packages/com.unity.shadergraph@12.1/manual/Branch-Node.html

    So did unity replace a better version of the node's code or how much can the "possible outcome" be trusted?
     
    tmonestudio likes this.
  2. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    CortiWins likes this.
  3. CortiWins

    CortiWins

    Joined:
    Sep 24, 2018
    Posts:
    150
    Sweet, thanks!
     
  4. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,735
    Maybe I'm missing something, but this blog post seems flat our wrong to me?

    It replaces a conditional with a lerp and then claims it's better now?

    The problem with conditionals like that, is that people expected the code in them to only be executed if the conditions are fulfilled, but GPUs tend to execute everything anyway and just discard the result of the don't end up using it.

    So the actual problem is if you were hiding expensive functions inside conditionals and you were expecting that they would not be evaluated for every pixel then nope, you were wrong. Somehow people came to the conclusion that conditionals are expensive, which is the wrong conclusion.
     
    Sluggy likes this.
  5. Qriva

    Qriva

    Joined:
    Jun 30, 2019
    Posts:
    1,307
    I might be completely wrong, so please do not trust too much what I say, but so far my understanding of branching is that it all depends on the context, for example if I use constant like this:
    Code (CSharp):
    1. void SomeFunction(float Predicate, float4 TrueValue, float4 FalseValue, out float4 Out)
    2. {
    3.     Out = Predicate ? TrueValue : FalseValue;
    4. }
    5.  
    6. // [...] Somewhere in the code
    7. SomeFunction(1, 0.5, 0.8, outVar);
    Compiler knows it's always true and there will be no "if" at all in compiled code, but if predicate came from uniform variable it is not possible to determine the result during compilation, however it is uniform for all pipes, so it must be checked only once and this is why such simple if (aka static branch?) is super fast. As @AcidArrow said both sides are going to be executed anyway and if I am correct in HLSL ternary operator enforces that, and branch is never dynamic.

    There is also dynamic braching and it happens when you test dynamic value that can be different per pixel, for example you sample texture or world space position. From what I know If you had huge chunk of code inside branch it could make sense (it will try to process one branch and it will be still worth after reverting), but often that is not the case. I think you can enforce types of branch in HLSL with [branch] or [flatten], however I have no knowledge how compiler picks type of branch without hint (I know it's flatten by default).
    My guess is that short if can be optimized by compiler and it's better to let him do the work or they changed lerp to 'if' as it makes sense - the node name is "Branch", not "Lerp".

    I hope I haven't written too many heresies :rolleyes:
     
    FredMoreau and Sluggy like this.
  6. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Unfortunately both of those links are flat out wrong.

    Quick bullet points:
    • Branches on GPUs are not a huge performance problem when used smartly on modern GPUs.
    • A conditional statement like an
      if
      or
      ?
      in a shader does not mean it'll be a branch! Shader compilers will try to "smartly" choose whether to use a branch or a swap based on the code.
    • Most conditionals will end up not being a branch because the compiler won't choose to make it one in most cases.
    • Using a
      lerp(b, a, step(y, x))
      is always slower than
      (x >= y ? a : b)
      because the
      step
      is already doing
      (x >= y ? 1 : 0)
      so you're just adding two more instructions to do a lerp that could be skipped.
    • Any conditional can become a branch if the shader compiler decides to make it one! That includes
      step
      because it's a conditional statement! (Though this is highly unlikely.)
    • A
      discard
      or
      clip()
      is also a branch!
    Let's focus on the xdpixel link since the exiin link is based on the information from that. That post talks about branch prediction and how missed predictions are bad for CPUs performance due to pipeline stalls. This is true.* Then it goes to talk about how on GPUs it's not really an issue of missed prediction, but rather potentially bad utilizations of the ALU if more than one branch path needed to run at the same time. This is also true, but we'll come back to that in a moment.

    * Though it ignores the fact it's not really a major problem on modern CPUs as the pipeline is relatively short and memory access is almost alway the limiting performance factor for modern PCs.

    The example he then gives is using the stereo eye index to branch between two possible color values and goes "see, branch bad!"

    There's a big problem with that conclusion. In that example there will never be more than one branch ever executing at one time. It is the perfect use case for using branching on a GPU. Indeed the whole reason why the compiler chose to use a branch there is because it knows it's a good use case.

    The big thing missing in the discussion about GPUs and ALU utilization in branches is what the phrase "at the same time" means. In the example given, it'd be a SIMD core with 8 ALU threads. On modern GPUs it's more like 32 or 64 ALU threads which different GPU manufacturers refer to as "waves" or "warps". When rendering pixels, each warp is rendering an 8x4 or 8x8 square of pixels on screen, and only on a single triangle at a time. If the value the branch is dependent on does not change across the entire triangle, or just within that group of pixels, then there's perfect utilization of those ALU threads!

    In the example case of the stereo eye index, that is a value coming from the GPU that is guaranteed to be constant for the entire triangle. So the branch will always only ever take one path for all threads in the group. It should be a branch!

    Other types of values that are great for branching on:
    • Material properties
    • Instance ID or Instanced properties
    • Primitive ID
    • Values passed from the vertex to the fragment using a
      nointerpolation
      modifier.
    All of those will be constant across the entire triangle when rendering the fragment shader, so there will never be any issue with both branches running. GPU can guarantee that ahead of time.


    That all said, it's still not necessarily a bad thing to use a real branch in other cases where you know the value will often be consistent for many of those "warp" groups. For example branching on a texture mask or regular interpolated value passed to the fragment shader. If it means avoiding some expensive calculations for a good portion of the screen, then great. And think about it this way: if you don't use a branch you're just doing the expensive calculation all of the time anyway.

    Modern GPUs are also just shockingly fast these days, even on mobile. They can do an amazing amount of math without any problem. Memory bandwidth is often the bigger limiting factor. So things like using a lot of interpolators to pass data between the vertex and fragment shader can be much slower than recalculating the same data in the fragment shader, or sampling from a lot of textures, or a lot of random positions within a single texture, etc. can be the thing that makes things slow. Your shader using a branch or not probably isn't going to be the factor that is limiting your performance.

    This was especially true in the days before Direct3D 10.0 and OpenGLES 3.1 where a lot of GPUs did not actually support branches at all! All code was always run no matter what, so like @AcidArrow described people would write an if statement and get terrible performance and then blame the if statement not realizing it didn't actually do a branch at all. But, as mentioned above, this can still be the case today as most of the time the GPU will not actually end up branching because the shader compiler won't compile conditionals as a branch the majority of the time. And you end up with the same problem of "I used an if statement and it's slow, so it must be the if statement's fault" misdiagnosis.

    If you're branching based on hardcoded values within the shader code itself, those will never end up in the compiled shader as a branch because the shader compiler will calculate the result and use that value instead. But compiler code stripping is kind of a different topic.

    If you read that twitter thread you'll see I was wrong about that. A ternary (
    (x > y ? a : b)
    ) can end up being a branch if the shader compiler decides to make it one. It's just very rare because most of the time the compiler won't make any conditional a branch unless you explicitly ask it to make one or you're testing against a value that it knowns will be constant.
     
    Beauque, lclemens, ekakiya and 21 others like this.
  7. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    To answer a specific question in the original post
    All of Unity's Shader Graph documentation has that "possible outcome" phrase. It seems to be used as a catch all for cases where the code generation might have multiple options, or for cases where the version of Shader Graph you're using and the version of the documentation you're looking at don't match. But in the case of the Branch node it can be trusted 100% to be those outcomes for those versions of Shader Graph since that's the only code they generate.

    Here's the old version of the node, which you can see at the bottom only has one code snippet doing a lerp.
    https://github.com/Unity-Technologi...Editor/Data/Nodes/Utility/Logic/BranchNode.cs

    And here's the latest version, which also has only one code snippet, but doing the ternary.
    https://github.com/Unity-Technologi...Editor/Data/Nodes/Utility/Logic/BranchNode.cs

    Other nodes often have a lot more code snippets, or call a function in an external file that might have different functions depending on the platform.

    Also they changed those nodes because I and several others complained about the Branch node being a lerp instead of a ternary, and Unity verified internally that it was slower and changed it. The funny thing being, as discussed above, it probably won't actually be a branch!
     
    r3dux, NibbleByte3, Liam2349 and 6 others like this.
  8. Qriva

    Qriva

    Joined:
    Jun 30, 2019
    Posts:
    1,307
    I am always amazed at how extensive your answers are.

    I want to make something clear - does it matter for compiler if tested value (in branch) comes from consistent source? I mean, do I pay cost only when it comes to this situation when different branches are taken or compiler optimizes it differently (structure). Is branch like this
    if (materialProp > 0) ...
    exactly the same type as
    if (randomVal > 0) ...
    , but the second one is bad only becuase branch result will be different very often by definition?

    I think initially I based this knowledge on this answer: https://stackoverflow.com/a/41871876

    Yeah, I meant code stripping here, however I would like to ask something when it comes to this topic.
    Here is the code from URP Lit shader (source link):
    Code (CSharp):
    1. #ifndef _SPECULARHIGHLIGHTS_OFF
    2.     [branch] if (!specularHighlightsOff)
    3.     {
    4.         brdf += brdfData.specular * DirectBRDFSpecular(brdfData, normalWS, lightDirectionWS, viewDirectionWS);
    5.         // [...]
    6.     }
    7. #endif // _SPECULARHIGHLIGHTS_OFF
    There is constant "variable"
    specularHighlightsOff
    passed to this function based on
    _SPECULARHIGHLIGHTS_OFF
    keyword, what is the reason to create branch here if result will be always the same, does it serve some special purpose or it is just mistake? Is compiler going to strip this anyway?
     
  9. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,735
    Those are not branches, those are just conditionals which most of the time will not become branches.

    Unless one of those variables is a constant, those two conditionals are the same performance wise.
     
  10. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    It does actually matter.

    If it decides to compile as a branch, and the branch is coming from a source the compiler can guarantee is constant per warp, it's a "fast" dynamic branch. If it can't be guaranteed to be constant, then it's a "slow" dynamic branch. This distinction seems to matter more on mobile and AMD GPUs than it does on Nvidia GPUs.

    (edit: To clarify "fast" dynamic branches are basically free on desktop GPUs, where "slow" dynamic branches will cost a few cycles in overhead.)

    There are complicated ways in some of the latest graphics APIs to tell the GPU the value is going to be constant per warp even if it's not coming from a source that normally is that I've seen discussed for use with clustered & tiled lighting setups to force fast branches. But I'm honestly not up to speed enough with those to explain how that works apart from at the very high level "that's what it's doing".

    The answer is yes, it will strip that code, except when it doesn't.

    There are multiple declarations of the
    LightingPhysicallyBased
    function in that file. Some of which take a
    specularHighlightsOff
    and some that don't. In the ones that don't, the value is hardcoded based on the define and then it calls the function that does take that as an input. In that case the if statement and the preceding
    [branch]
    will be stripped. However if the shader code is directly calling the function with the bool, and that bool isn't hardcoded, then it'll try to be a branch.
     
    Last edited: Jan 28, 2022
    lclemens, AshwinMods and Qriva like this.
  11. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Another wrench to throw into all of this is the "compiled shader" isn't the final form. Even if the "compiled shader" has a branch in it, that shader gets "compiled" again when the drivers take it and convert it to assembly code for the current GPU hardware. That can also make decisions to make something a branch, or not.
     
  12. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I'll sticky this thread as there's a whole bunch of quality info.
     
  13. CortiWins

    CortiWins

    Joined:
    Sep 24, 2018
    Posts:
    150
    Wow, this is far more than i expected. Thanks everyone!
     
  14. tutorial4unity

    tutorial4unity

    Joined:
    Apr 14, 2019
    Posts:
    14
    Late to the party, can I just say this is one amazing thread
     
  15. AshwinMods

    AshwinMods

    Joined:
    Jul 1, 2014
    Posts:
    13
    I can't express how blessed i feel while reading detailed and sharp answers from bgolus. Thank you sir.

    found another good explanation here.
    https://solidpixel.github.io/2021/12/09/branches_in_shaders.html

    about "DFC best used on spatially-coherent branches"
    https://developer.amd.com/wordpress/media/2012/10/03_Clever_Shader_Tricks.pdf

    an interesting post for visualizing branch and sample counts.
    https://medium.com/@jasonbooth_86226/branching-on-a-gpu-18bfc83694f2

    One of highlight for me is *it's okay if same branch is being used for that entire triangle*.
    What can be passed from Vertex Shader that is consistent for all three vertices of triangle,
    So that after interpolation, the value is still above/below a certain threshold to have a common branch in fragment shader.
    Or Compiler won't understand what i am trying to do, and the output will be very different ?
     
    Last edited: Jan 31, 2023
    Faysik likes this.
  16. vlery

    vlery

    Joined:
    Jan 23, 2017
    Posts:
    16
    Hey, I just encounter a weird issue that in a reproduce case in HLSL in renderdoc and simple logic looks like below:
    [branch]
    if(View.HasLight){
    // do something, But may has NaN or Inf or precision issue
    }

    I tried to fix artifact like half pixels on a model is totally wrong from some directions. We fix some obvious inpropriate code in the dynamic branch and got the right result. But the problem the 'View.HasLight = 0' and debug in renderdoc can verify that step can successfully skip it.
    Just wondering is there any strategies for hlsl compiler that affects in this case the branch is stripped but there has bug inside it?
     
  17. jacketjlzUnity

    jacketjlzUnity

    Joined:
    Jun 21, 2021
    Posts:
    4
    This is such a facinating read! Thank you @bgolus
    Could you please elaborate on how branching works in following cases?

    1. Branching on multiple material props:
    Code (CSharp):
    1. [branch]
    2. if(materialProp1 * materialProp2 > 0)
    3. {
    4.     //some code
    5. }
    2. On a (float) materialProp as index of another (vector) materialProp:
    Code (CSharp):
    1. uint index = materialFloatProp;
    2. float value = materialVectorProp[index];
    3. [branch]
    4. if(value > 0)
    5. {
    6.     //some code
    7. }
    3. And what about embedded ifs?:
    Code (CSharp):
    1. [branch]
    2. if(materialProp1 > 0)
    3. {
    4.     //some code
    5.  
    6.     [branch]
    7.     if(materialProp2 > 0)
    8.     {
    9.         //some code
    10.     }
    11. }
     
  18. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Assuming the various material properties are never written to by the shader code, these can all be handled as "fast" dynamic branches as the values will be constant across the entire draw.

    Again, whether or not they will be depends on the compiler, but they can be.
     
    AshwinMods likes this.
  19. jacketjlzUnity

    jacketjlzUnity

    Joined:
    Jun 21, 2021
    Posts:
    4
    Thank you very much for the quick reply!
     
  20. EricFFG

    EricFFG

    Joined:
    May 10, 2021
    Posts:
    183
    There are also some interesting things in this talk from Jason Booth


    (I was also surprised that Chat GPT knew about a lot of intricate GPU architecture details I would have never expected due to their extreme niche)

    For Branches I do understand that it should be avoided heavily that the branch condition is any expensive, as it might become quickly more expensive than the savings

    So I gather using things like branching by texture masks should be heavily avoided
    And stuff like Normal Direction Y, or Vertex color, or Object position (Or of course simple statics) are viable to branch and remove expensive areas on the material when not needed ?
     
  21. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Funny you link both Jason's video which shows a use case where heavy use of dynamic branches lead to substantially improved performance vs not, and then follow that up with the conclusion that branches should be avoided. Trust Jason, don't trust ChatGPT.

    ChatGPT's main power is the ability to tell lies with plausible authority by giving (mostly) accurate information for a bit before it devolves to total bullshot.
     
    Goularou, AshwinMods and Qriva like this.
  22. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    To be more targeted in my response:
    It's not that they should be avoided, it's that there is a cost to them that should be kept in mind. The terrain shader example in Jason's above video is doing branching based on texture samples, but it saves enough work that the cost of doing that branch is worth it.

    More specifically, you should always test and validate that your changes are improving performance on target platforms! Use branches, see if it helps, try to figure out why if it doesn't.
     
    AshwinMods likes this.
  23. EricFFG

    EricFFG

    Joined:
    May 10, 2021
    Posts:
    183
    It was phrased as a question

    That was completely unrelated to the chat, but it is clear that the savings must exceed the cost of the sample lookup for the branch and the branch can be quite expensive with a sample from what I heard in the office
    If you have a hyper expensive terrain shader (without branching) that might still be very easily worth it, for sure

    But doing a texture mask based branch to cut-off a single sample of sand I must assume is negative value, as the sample check is more expensive than the sample. But on the other hand, doing a World height or normal direction based cutoff could be well worth it, for something as simple as a top layer of sand on a mesh, as the check must be faster than the sample lookup.

    (Its very hard to test these small optimizations, lately I tried to get some performance impact to test some culling and I tried for 15 minutes to make the most expensive shader possible with a ton of parallax stacked and whatnot and the GPUs I have available are barely caring - trying to see a branch of a single texture sample in a profiler - no way)
     
    Last edited: Jun 12, 2023
  24. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Make sure you're profiling the GPU, not CPU. By default Unity's profiler isn't showing GPU times at all. When GPU profiling enabled you can see the cost of individual draws. However yes, it can still be hard to see a difference. But also, if you can't see a difference that should tell you that the branch isn't that expensive in of itself (or the shader compiler decided it wasn't worth it and it didn't compile as a branch at all anyway).

    The "branches are expensive" mentality comes from 20 years ago when the cost of adding 10 instructions to a shader needed to have an engineering meeting to discuss if it was worthwhile. The Standard shader's fragment shader is between 200 and 600 instructions depending on the use case (baked lighting is more expensive!!!), and the HDRP is thousands of instructions.
     
  25. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    Using a texture sample is more costly because you have to sample the texture first before running the branch. Sampling a texture is one of the most expensive things you can do in a shader, not because the shader itself has to do a lot of work, but in fact the opposite, because it has to wait for other bits of hardware on the GPU to do that work. Most of the time the shader compiler will do it's best to reshuffle when things happen in the shader, putting texture samples at the very start of the shader and anything that's not dependent on those texture reads immediately afterwards to hide the latency between calling "tex2D(myTexture, uv)" and that data being accessible.

    This also makes it very hard sometimes to understand how expensive something is in a shader as adding an extra texture sample, or a dozen, could have negligible impact on the performance in an already complex shader, but massive impact on a more simple one.
     
  26. EricFFG

    EricFFG

    Joined:
    May 10, 2021
    Posts:
    183
    Yes always with GPU

    Yes but its very easy to do mistakes with the branch.
    I had made a debugging subgraph. This subgraph has 8 texture samplers for debugging or something inside, and then is branched off. The GPU still has to allocate the memory to initialize them, and also calculate it from what I understand,(as the compilers still compile it all) just then dosn't pass it on. I read that even as static the compilers still compile the branches in shaders.
    So this would be a big mistake, but one might think its all safe and dandy and I thought "oh no problem just branch it". So my 3 sampler shader I spend weeks optimizing would have 8 hidden ones in memory.

    So you can't really say that branching is not expensive, it is not expensive and greatly beneficial if done correctly.
    But it can also be expensive and a waste, like in my example before. Branching off 1 texture sampler by using another texture sampler is a negative in performance. Or you could even branch off a color by using a texture sampler mask, which would be more than a full extra sampler in cost, which would be quite bad, especially if you do these multiple times. So if doing wrong things, your 4 texture sampler shader could be a multitude of the cost in samplers (the samplers are usually the key performance hog in a typical shader) or you'd make a mess for no reason.

    For the Sampler topic, ive heard that you can actually have ""free"" calculations after the sampler (to a degree), as the GPU has to wait anyways for the sampler to pass the cache, so these would essentially be done in the meantime for no real change in cost.
     
    Last edited: Jun 12, 2023
  27. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    The topic of samplers inside branches is a deep and complicated one. The short answer is sampling a texture inside a branch is undefined behavior in the D3D spec. On some GPUs this means any texture sample inside a branch is always sampled anyways and the branch is ignored. Others it respects the branch, but there can be visual anomalies around the edge. And others handle it all gracefully sampling the texture where it's needed and not where it's not. Which behavior you might get depends on the generation of GPU, and it's not as simple as Nvidia vs AMD vs Mali.

    One thing I'll say about Jason's terrain shader is he understands GPUs quite well and he's doing some extra tricks there. The main one is he's making heavy use of texture arrays, and the branches change which array indices and UVs to use, but the number of samples and texture objects being referenced stay constant. In DX12 and Vulkan there are ways to change the texture itself, but that's not something Unity supports as it requires a very different way of handling asset declaration.
     
  28. Goularou

    Goularou

    Joined:
    Oct 19, 2018
    Posts:
    54
    The same Jason Booth made a good blog on branching:
    https://medium.com/@jasonbooth_86226/branching-on-a-gpu-18bfc83694f2

    Thank you @bgolus , so much!
    PS: @bgolus why not writing a book on GPU programming, or doing a blog / I learnt a bunch from your posts, but they are scattered apart, by definition...
     
  29. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    There are a few reasons why I don't write a book. For one, a book usually deals with abstract and artificial use cases. Ones which I would have to come up with and which may not be ones people in the real world actually encounter. I believe addressing specific use cases in which people actually have problems understanding how to achieve some goal or fix some problem or just better insight as to why what they're trying isn't working is more useful than a book that gives general information. My aim is to help people understand shaders, and while I may have answered similar questions multiple times, usually the solutions are different because the goal isn't exactly the same.

    Secondly, books cost money. I don't respond on these forums with the intent to make a profit. Unity certainly doesn't pay me to do it. Sure I could release some PDF for free, but no one will read that. Today people find what they need by searching online via Google or some other search engine. And a book isn't going to be what they click on.

    Third, and the most important one. I'm lazy. A book sounds like a lot of work.
     
    Kmm__, Ben_at_Work, r3dux and 6 others like this.
  30. thang_unity516

    thang_unity516

    Joined:
    Oct 29, 2021
    Posts:
    43
    Hi guys, I came across this topic with branching problem, hope you guys give me some helps.

    As I know, some specific cases of branching will made the shader run slower depending on GPU. Assuming that:
    - I want to run a shader on various GPUs (especially mobile)
    - The logics in each branch are simple and force branching. Example:
    Code (CSharp):
    1. half4 color = input.u > input.v ? half4(0) : half4(1)
    Is turning the branching to a calculation a good solution for it (Reference: https://theorangeduck.com/page/avoiding-shader-conditionals)? After I read this article, I feel that this way kind of magical to me: does the function like max, abs, sign,... don't turn into an if underhand?

    Thanks for your help
     
  31. AshwinMods

    AshwinMods

    Joined:
    Jul 1, 2014
    Posts:
    13
    As UV will be different for all fragments and FAST branching is not the option here,
    Compiler will do what's required to get same output. I don't think much can be saved from writing an equation here.
    (Can't wait to be corrected by someone here)

    I also read some articles similar to this, after which I don't take many functions like Step, Sign, Sin, Abs, etc for granted.
    https://interplayoflight.wordpress....hinking-in-high-level-shading-languages-2023/

    But as always, Everything must be tested on device, as look at the semi-final assembly code.
    or maybe in HLSL section of online compiler websites. https://godbolt.org/
     
  32. thang_unity516

    thang_unity516

    Joined:
    Oct 29, 2021
    Posts:
    43
    To be clear, what I intended to do is convert this
    Code (CSharp):
    1. half4 color = input.u > input.v ? half4(0) : half4(1)
    into something like this
    Code (CSharp):
    1. float compareResult = isHigher(input.u, input.v);  //true return 1 and otherwise
    2. half4 color = compareResult * half4(0) + (1 - compareResult) * half4(1)
    I don't know if this is more performance in this case
     
  33. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    The "is higher" function is the
    step()
    function. It is an if statement.

    The second example "optimized" code is three to four times slower than just using the ternary statement (
    x > y ? A : B
    ) alone, as the second example still has the ternary hidden behind the
    step()
    , and then what is effectively a lerp, which is another few instructions.
     
    AshwinMods and thang_unity516 like this.
  34. Kobix

    Kobix

    Joined:
    Jan 23, 2014
    Posts:
    146
    In picture provided, that node & branch should mean that compilation will 'bake' conditional, aka zero branching/conditional is result?
     

    Attached Files:

  35. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,342
    It shouldn't cause any branching, no. The shader compiler should be smart enough to know the other "branch" can never be executed and that "branch's" code stripped out.
     
    Goularou and Kobix like this.
  36. Kobix

    Kobix

    Joined:
    Jan 23, 2014
    Posts:
    146
    Thank you :D.