Hi everyone, I know that, for performance reasons, we should avoid if statement in Cg. So as one does, I use the step function quite a lot, but I've also seen the > and < operators used for the same kind of things. Code (Cg): // Outputs 0 or 1 float test1 = (x > y); So I was wondering, does this count as branching? In other words, can use those operators instead of step() and have the same performance? I just think it is so much clearer at first glance, instantly readable and offers much more options, since you can use >, <, <= or >=.
If I remember correctly, step() is being expanded to >= by Unity's shader generator. So it doesn't seem to make a difference in unity if you use step or >. Just fire up a profiler like renderdoc/pix/instruments/etc and look at the shader and compare performance to be sure.
Comparison: Code (CSharp): float4 SetPixelShader (float4 vertex:POSITION, float2 uv:TEXCOORD0) : SV_TARGET { float k = uv.x >= 0.5; return k.xxxx; } Code (CSharp): -- Hardware tier variant: Tier 1 -- Fragment shader for "d3d11": // Stats: 2 math, 1 temp registers Shader Disassembly: // // Generated by Microsoft (R) D3D Shader Disassembler // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float // TEXCOORD 0 xy 1 NONE float x // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_TARGET 0 xyzw 0 TARGET float xyzw // ps_4_0 dcl_input_ps linear v1.x dcl_output o0.xyzw dcl_temps 1 0: ge r0.x, v1.x, l(0.500000) 1: and o0.xyzw, r0.xxxx, l(0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000) 2: ret // Approximately 0 instruction slots used ----------------------------------------------------------------------------------------------------------------- Code (CSharp): float4 SetPixelShader (float4 vertex:POSITION, float2 uv:TEXCOORD0) : SV_TARGET { float k = step (0.5, uv.x); return k.xxxx; } Code (CSharp): -- Hardware tier variant: Tier 1 -- Fragment shader for "d3d11": // Stats: 2 math, 1 temp registers Shader Disassembly: // // Generated by Microsoft (R) D3D Shader Disassembler // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float // TEXCOORD 0 xy 1 NONE float x // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_TARGET 0 xyzw 0 TARGET float xyzw // ps_4_0 dcl_input_ps linear v1.x dcl_output o0.xyzw dcl_temps 1 0: ge r0.x, v1.x, l(0.500000) 1: and o0.xyzw, r0.xxxx, l(0x3f800000, 0x3f800000, 0x3f800000, 0x3f800000) 2: ret // Approximately 0 instruction slots used
Not by Unity’s shader generator, that’s how step() is implemented by HLSL & GLSL. It would compile to the same code regardless of using Unity or not. Not really, no. I often refer to step() and similar in-line >= comparisons as “fast branches”, but really they’re just comparisons and aren’t branches at all. There’s no control flow; there’s no divergent code paths, it’s just choosing one value or another. In the vast majority of situations even more complex if statements get turned into these kinds of comparisons with both sides of a conditional running 100% of the time and the GPU simply choosing the appropriate results afterwards. An actual branch would show an if_z or if_nz in the compiled shader, unlike the above examples from @Przemyslaw_Zaworski which show a ge.