Search Unity

Step vs. >= operator

Discussion in 'Shaders' started by AlexWige, Dec 29, 2018.

  1. AlexWige

    AlexWige

    Joined:
    Mar 30, 2015
    Posts:
    5
    Hi everyone,

    I know that, for performance reasons, we should avoid if statement in Cg. So as one does, I use the step function quite a lot, but I've also seen the > and < operators used for the same kind of things.

    Code (Cg):
    1. // Outputs 0 or 1
    2. float test1 = (x > y);

    So I was wondering, does this count as branching? In other words, can use those operators instead of step() and have the same performance? I just think it is so much clearer at first glance, instantly readable and offers much more options, since you can use >, <, <= or >=.
     
  2. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,620
    If I remember correctly, step() is being expanded to >= by Unity's shader generator. So it doesn't seem to make a difference in unity if you use step or >.

    Just fire up a profiler like renderdoc/pix/instruments/etc and look at the shader and compare performance to be sure.
     
  3. Przemyslaw_Zaworski

    Przemyslaw_Zaworski

    Joined:
    Jun 9, 2017
    Posts:
    328
    Comparison:

    upload_2018-12-29_16-12-7.png

    Code (CSharp):
    1.             float4 SetPixelShader (float4 vertex:POSITION, float2 uv:TEXCOORD0) : SV_TARGET
    2.             {
    3.                 float k = uv.x >= 0.5;
    4.                 return k.xxxx;
    5.             }
    Code (CSharp):
    1. -- Hardware tier variant: Tier 1
    2. -- Fragment shader for "d3d11":
    3. // Stats: 2 math, 1 temp registers
    4. Shader Disassembly:
    5. //
    6. // Generated by Microsoft (R) D3D Shader Disassembler
    7. //
    8. //
    9. // Input signature:
    10. //
    11. // Name                 Index   Mask Register SysValue  Format   Used
    12. // -------------------- ----- ------ -------- -------- ------- ------
    13. // SV_Position              0   xyzw        0      POS   float      
    14. // TEXCOORD                 0   xy          1     NONE   float   x  
    15. //
    16. //
    17. // Output signature:
    18. //
    19. // Name                 Index   Mask Register SysValue  Format   Used
    20. // -------------------- ----- ------ -------- -------- ------- ------
    21. // SV_TARGET                0   xyzw        0   TARGET   float   xyzw
    22. //
    23.       ps_4_0
    24.       dcl_input_ps linear v1.x
    25.       dcl_output o0.xyzw
    26.       dcl_temps 1
    27.    0: ge r0.x, v1.x, l(0.500000)
    28.    1: and o0.xyzw, r0.xxxx, l(0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000)
    29.    2: ret
    30. // Approximately 0 instruction slots used
    -----------------------------------------------------------------------------------------------------------------

    Code (CSharp):
    1.             float4 SetPixelShader (float4 vertex:POSITION, float2 uv:TEXCOORD0) : SV_TARGET
    2.             {
    3.                 float k = step (0.5, uv.x);
    4.                 return k.xxxx;
    5.             }
    Code (CSharp):
    1. -- Hardware tier variant: Tier 1
    2. -- Fragment shader for "d3d11":
    3. // Stats: 2 math, 1 temp registers
    4. Shader Disassembly:
    5. //
    6. // Generated by Microsoft (R) D3D Shader Disassembler
    7. //
    8. //
    9. // Input signature:
    10. //
    11. // Name                 Index   Mask Register SysValue  Format   Used
    12. // -------------------- ----- ------ -------- -------- ------- ------
    13. // SV_Position              0   xyzw        0      POS   float      
    14. // TEXCOORD                 0   xy          1     NONE   float   x  
    15. //
    16. //
    17. // Output signature:
    18. //
    19. // Name                 Index   Mask Register SysValue  Format   Used
    20. // -------------------- ----- ------ -------- -------- ------- ------
    21. // SV_TARGET                0   xyzw        0   TARGET   float   xyzw
    22. //
    23.       ps_4_0
    24.       dcl_input_ps linear v1.x
    25.       dcl_output o0.xyzw
    26.       dcl_temps 1
    27.    0: ge r0.x, v1.x, l(0.500000)
    28.    1: and o0.xyzw, r0.xxxx, l(0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000)
    29.    2: ret
    30. // Approximately 0 instruction slots used
     
    bgolus and Peter77 like this.
  4. bgolus

    bgolus

    Joined:
    Dec 7, 2012
    Posts:
    12,352
    Not by Unity’s shader generator, that’s how step() is implemented by HLSL & GLSL. It would compile to the same code regardless of using Unity or not.

    Not really, no. I often refer to step() and similar in-line >= comparisons as “fast branches”, but really they’re just comparisons and aren’t branches at all. There’s no control flow; there’s no divergent code paths, it’s just choosing one value or another. In the vast majority of situations even more complex if statements get turned into these kinds of comparisons with both sides of a conditional running 100% of the time and the GPU simply choosing the appropriate results afterwards.

    An actual branch would show an if_z or if_nz in the compiled shader, unlike the above examples from @Przemyslaw_Zaworski which show a ge.
     
    Khaeops, Bovine, mercy_zero and 7 others like this.
  5. ReadyPlayGames

    ReadyPlayGames

    Joined:
    Jan 24, 2015
    Posts:
    49
    Interesting! I didn't know this myself.