Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Performance Issue With Conditionals

Discussion in 'Shaders' started by liqid, Jan 30, 2022.

  1. liqid

    liqid

    Joined:
    Oct 4, 2016
    Posts:
    17
    Hey,

    I'm trying to understand conditionals and branching in shaders:
    I got a shader with many heavy calculations and trying to toggle them on and off depending on bools you can select in the shader GUI.
    So from what I read, there is something called "Statically Uniform Branching" and D3D11+ Graphic cards should evaluate if-conditionals like the one described below, without any performance loss. But in my case I get the same performance no matter if _IsEnabled is true or not, so it seems, that the code inside the conditional gets always executed.

    Code (CSharp):
    1.         [HideInInspector] _IsEnabled ("_IsEnabled", Float) = 0
    2.         float _IsEnabled;      
    3.         void surf(Input IN, inout SurfaceOutputStandard o)
    4.         {  
    5.             // UNITY_BRANCH  makes no difference
    6.             if(_IsEnabled) {
    7.                 // do heavy computation
    8.             }
    9.         }

    But in my case, using preprocessor conditionals, which creates shader variants and selects the one based on which keywords are enabled, is the only way to get the desired result:

    Code (CSharp):
    1.         #pragma shader_feature _IS_ENABLED
    2.          void surf(Input IN, inout SurfaceOutputStandard o)
    3.         {  
    4.             #if _IS_ENABLED
    5.                 // do heavy computation
    6.             #endif
    7.         }
    Actually, this is the same as just doing nothing, if _IS_ENABLED is not enabled.:

    Code (CSharp):
    1.          void surf(Input IN, inout SurfaceOutputStandard o)
    2.         {  
    3.             // do nothing
    4.         }

    Sadly, the first code snippet, the one that I want to actually use is not working as expected, in the sense that it's as slow as basically executing everything, Am I missing something? Can you only eliminate code from being executed using if-conditionals that can be evaluated on compile time?
    I was under the impression that shaders got only problems when conditional values change on a per-pixel basis and so aren't uniform across the whole wavefront.
    Is the first snippet above not a dynamically uniform expression? https://www.khronos.org/opengl/wiki/Core_Language_(GLSL)#Dynamically_uniform_expression
     
  2. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    Preprocessor
    #if
    will obviously absolutely ensure that the code is skipped, because the code will not be in the shader at all when it's disabled.

    Dynamic branches can be a little more finicky.

    Some questions & thoughts:
    What GPU / OS are you testing on? You mention Direct3D 11, but also linked to OpenGL documentation.

    How are you testing the performance? Unity's in-editor profiling tools aren't amazing for checking GPU performance, even when enabling GPU profiling. You'll generally get better information from an external profiling tool like Nvidia Nsight, or Radeon GPU Profiler, or Intel GPU.

    Does anything change if you switch
    float _IsEnabled;
    to
    bool _IsEnabled;
    ? It shouldn't, though the later option is preferable anyway as otherwise the
    float
    has to be converted to a
    bool
    by the GPU.

    Have you looked at the generated shader to confirm that the compiled shader is producing an branch? In the compiled shader you should see an
    if
    and then the next few lines indented, if you're looking at the D3D compiled shader. Other targets are a little harder to read.

    If you're targeting OpenGL, branching is a little dicey as it depends entirely on the compiler being nice to you.
    UNITY_BRANCH
    or
    [branch]
    won't do anything if you're using OpenGL. You can try Vulkan instead in that case which, assuming the D3D shader is outputting an
    if
    , the compiled Vulkan shader should as well.
     
    liqid likes this.
  3. liqid

    liqid

    Joined:
    Oct 4, 2016
    Posts:
    17


    Thanks for your reply!

    I'm using a 970GTX and a 940MX, both with Win10. I linked the OpenGL doc because I thought the concept is the same.

    I used the in-editor profiling tool with GPU profiling enabled and https://assetstore.unity.com/packag...ate-fps-counter-stats-monitor-debugger-105778 to display the average FPS.

    Switching the float to a bool didn't make any difference on the 940MX. Haven't tested it on the 970GTX yet.

    The performance difference is so severe that on my 940MX I get 50FPS with the heavy calculations inside the preprocessor
    #if
    .
    In the uniform dynamical
     if()
    I get on average 30FPS, which is the same FPS that I get if I don't have an
    if()
    at all and always execute the heavy calculations. :(
     
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,238
    One very important thing to know about Unity's profiling tools, including any third party tools that run inside of Unity:

    They do not show the GPU side fps, they only show the CPU side fps.

    This is an important distinction because it can often be difficult to gauge the actual cost of something just looking at the "fps" numbers. Unity itself does actually support getting the raw GPU timings, at least in the editor, but the numbers you can see in the profiler are sometimes weirdly misleading. And they don't work in standalone Windows builds unless your rendering to a VR HMD which is extremely frustrating.

    That said the Maxwell GPUs you have (the Nvidia GPU architecture the 940MX and 970GTX use) should be entirely capable of efficient dynamic branching. They were released about the time I started switching to making more frequent use of it in my own shaders as most GPUs people would be using / buying around 2015 could. And yes, both OpenGL and Direct3D can make use of this kind of "statically uniform branching" (again, with the caveat that it's less predictably enabled for OpenGL).

    So the main thing is going to be making sure the compiled shader is actually using an
    if
    , because having an
    if
    in the written shader is no guarantee the compiled shader will as well. There's also no guarantee what code will be in the compiled shader's
    if
    statement even if it is there, meaning the shader compiler, or sometimes the graphics API itself, will move some parts outside of the if statement. Texture sampling for example can often end up getting moved outside of if statements, meaning if the expensive operations you're doing involve sampling a texture a lot, that might still be happening no matter what.