math.select's performance vs. inline-if

5argon · May 20, 2018

It has been highlighted multiple times in talks that math.select would be faster than if conditional when burst compiled. When looking at the source code in GitHub it reveals that `select` maps to in-line if.

So, if I use inline-if instead of `if` I could get the same maximum performance? Personally I think inline-if is easier to read. (I am worrying about whether that attribute have any effect or not that makes using `select` more preferable.)

Freddx · May 21, 2018

Actually the [MethodImplOptions(AggressiveInlining)] increase the performance, telling the compiler that should inline the method.

But well, I think you can use Stopwatch to test the execution time.

rz_0lento · May 21, 2018

You could implement both in some minimal class and then just examine the generated assembly with Burst Inspector for both and diff those yourself

LennartJohansen · May 21, 2018

Freddx said: ↑

Actually the [MethodImplOptions(AggressiveInlining)] increase the performance, telling the compiler that should inline the method.

But well, I think you can use Stopwatch to test the execution time.
Click to expand...

if you write the inline-if in the function direct is it not inlined automatic?
Does not the [MethodImplOptions(AggressiveInlining)] just make sure the math.select function itself is inlined?

andrzej_cadp · May 22, 2018

Some architectures have SIMD select instruction, so if you put your code in Job, burst will be able to vectorize your code and avoid branching (even if they don't have it, you can emulate its behavior with set of vectorized instructions). It's probably easier for Burst to catch explicitly used "select" instead of trying to understand the code. This convention also helps us to avoid mistakes leading to poor performance. I'm not aware of such code examples, but I can easily imagine scenario, where over complicated inline if would create branches, while "select" instruction makes sure this won't happen.

xoofx · May 22, 2018

A single math.select(float, float, bool) compare to a similar if/else should generate similar code.

The main interest for using select is on SIMD types and more specifically on float4, as it is more naturally mapped to a SIMD register. It will typically result in using dedicated instructions instead of performing conditional move on each components.

For example, a math.select(float4, float4, bool4) without SIMD would generate the following scalar code (note that you can't generate currently this code in burst using math.select as it will always generate the following SIMD version instead):

Code (CSharp):

push rsi

cmp byte ptr [r8], 0

lea r9, [rdx + 4]

lea r11, [rdx + 8]

lea r10, [rdx + 12]

cmove rdx, rcx

lea rax, [rcx + 4]

lea rsi, [rcx + 8]

lea rcx, [rcx + 12]

movss xmm0, dword ptr [rdx]

cmp byte ptr [r8 + 1], 0

cmovne rax, r9

movss xmm1, dword ptr [rax]

cmp byte ptr [r8 + 2], 0

cmove r11, rsi

movss xmm2, dword ptr [r11]

cmp byte ptr [r8 + 3], 0

cmove r10, rcx

fld dword ptr [r10]

pop rsi

ret

while having float4 SIMD generates the following code (default in burst):

Code (CSharp):

movups xmm2, xmmword ptr [rcx]

movups xmm1, xmmword ptr [rdx]

pmovzxbd xmm3, dword ptr [r8]

movabs rax, .LCPI1_0

pand xmm3, xmmword ptr [rax]

movabs rax, .LCPI1_1

pand xmm3, xmmword ptr [rax]

pxor xmm0, xmm0

pcmpeqd xmm0, xmm3

blendvps xmm1, xmm2, xmm0

movaps xmm0, xmm1

ret

But as you can see in the scalar version, it is still using cmove/cmovne for if/else, so a single if/else on a scalar will not be more efficient than a traditional if/else.

5argon · May 22, 2018

It is clear now. Thank you so much! I will use the Burst debugger more to see into my jobs from now on.

deplinenoise · May 22, 2018

It's worth pointing out that
bool4
performs much worse if you store it in memory somewhere. Normally when you use a select, the mask is available in the correct format for SSE/AVX masking (all ones) and LLVM will replace the entire select with a single
blendv
instruction.

rz_0lento · May 22, 2018

deplinenoise said: ↑

Normally when you use a select, the mask is available in the correct format for SSE/AVX masking (all ones)
Click to expand...

@deplinenoise, since you mentioned SSE/AVX, does Burst compile both in same executable or is there compile time switch for it? Obviously we'd want more optimized AVX instructions for most PC computers for example but there are still bunch of CPUs that are not capable and would need fallback. So my real question is, how does Burst handle this?

xoofx · May 23, 2018

rizu said: ↑

@deplinenoise, since you mentioned SSE/AVX, does Burst compile both in same executable or is there compile time switch for it? Obviously we'd want more optimized AVX instructions for most PC computers for example but there are still bunch of CPUs that are not capable and would need fallback. So my real question is, how does Burst handle this?
Click to expand...

Yes, the goal will be to have a dynamic switch at runtime based on the CPU. This feature is not yet available from our work on burst AOT, but it will come soon after the first preview for 2018.2

sebas77 · Jun 10, 2018

jumping in this thread to ask if [MethodImplOptions(AggressiveInlining)] is supposed to actually work with the current unity compiler. My test seems to point out it doesn't work, it's maybe a hint for IL2CPP and Burst only?

5argon · Jun 10, 2018

sebas77 said: ↑

jumping in this thread to ask if [MethodImplOptions(AggressiveInlining)] is supposed to actually work with the current unity compiler. My test seems to point out it doesn't work, it's maybe a hint for IL2CPP and Burst only?
Click to expand...

I think it is for Burst only and not with IL2CPP. Burst takes IL (straight?) to optimized assembly code depending on target platform. IL2CPP takes IL to CPP waiting to become (less optimized?) assemblies depending on whatever platform that can do CPP. If both take IL as an input then there should be no overlap in pipeline?

Search Unity

math.select's performance vs. inline-if

5argon

Freddx

rz_0lento

LennartJohansen

andrzej_cadp

xoofx

Unity Technologies

5argon

deplinenoise

Unity Technologies

rz_0lento

xoofx

Unity Technologies

sebas77

5argon

Search Unity

Unity ID

Useful Searches

math.select's performance vs. inline-if

Unity Technologies

Unity Technologies

Unity Technologies