My Burst Feature Wish List

Mortuus17 · Jan 7, 2022

Disclaimer: I claim to be a power user of Burst, using every single feature on a regular basis which aid in optimizing small and large functions alike, which enables me to abstract them away in order to reuse them, while knowing that the code gen will be fantastic. Burst has come a long way, particulaly with Intrinsics and, more recently, the
Constant.IsConstantExpression<T>
compiler features. YOU ARE DOING GREAT WORK!
But there are a few things that I - as a power user - would still like to have, which I have compiled in the following list.

1:
TResult ForceCompileTimeEvaluation<TParams, TResult>(Func<TParams, TResult> code, TParams parameterObj)
This comes from a bug report I submitted, where a pure function with constant arguments that contained 4 if statements was constant evaluated but not when I added a fifth if statement - which was very frustrating. This issue has not been fixed yet, although I was told that it should be fixed with Burst 1.7. I can live with that - maybe that will be fixed with the next one
But why did it have 4 and then 5 branches? Because it was originally a loop, which I tried to force to be evaluated at compile time, since I know that that particual loop can only run for let's say up to 32 iterations, whereas a compiler cannot, necessarily (the Halting Problem). Please give us a way to pass code and parameters into a Burst specific function that evaluates that code before passing it to LLVM. This would be both a huge performance and productivity gain! This is by far my most requested feature - as the newer C++ versions have it built into the language.

2: Compile time access to
FloatMode
and
FloatPrecision
With Intrinsics we have static readonly properties like
IsAvxSupported
which always return false in C# land. Can we also have similar properties that return the current
FloatMode
and
FloatPrecision
, please?
Why? Let's look at some code:
x * math.rsqrt(y)
with
FloatMode.Fast
compiles to what you would expect in X86; An SSE rsqrt instruction followed by some Newton-Raphson and a multiply. At
FloatMode.Strict
, it is... strict and performs an SSE sqrt FOLLOWED BY A DIVISION AND A MULTIPLICATION, whereas my intent was that in that case it would divide x by the square root of y, only.
There is no use in abstracting away anything that calls
math.rsqrt
or
math.rcp
in its own function, when the code has to be rewritten for each job, depending on the
FloatMode
. And maintenance is a nightmare, in case one changes the
FloatMode
and cares about even minimal changes in performance.
As a little teaser: It only gets worse with stuff like
float fourthroot(float x) => rsqrt(rsqrt(x));
vs
sqrt(sqrt(x))
And
FloatPrecision
is less important but can probably be done "while you're at it" (same goes for
OptimizeFor
). One example I know of that comes to mind is a recently published fast algorithm for the calculation of the inverse cube root, which comes with code for varying levels of precision. Again: writing it only once would be nice, with a fake switch statement in the function itself.

3: Support for generics -
typeof(T1)
==
and
!=
typeof(T2)
Another C++ compile time feature:
decltype
. It would be nice to be able to write generic (job) structs with the ability to compile time evaluate the type of the generic argument. Even in C# land
typeof(T)
is compile time evaluated

4:
[FieldOffset(0)]
for unmanaged generic types in structs
Consider the following simple union as an example:

Code (CSharp):

struct Union<T1, T2> where T1: unmanaged where T2: unmanaged

{

[FieldOffset(0)] T1 Item1;

[FieldOffset(0)] T2 Item2;

}

Currently this fails in Burst, where it says that an explicit
FieldOffset
is not allowed for generic types. Although I don't get why - can we make an exception for
FieldOffset(0)
specifially? (I see a pattern here -> ) It would be nice to only write the Union type once using generics.

5: Questions out of interest:
5.1: Are there any plans on exposing more intrinsics? You have
umul128
but
imul128
would also be nice... And there are other nice... X86 instructions like let's say...
bit test
,
rdrand
/
rdseed
, but also LLVM instrinsics that read from the flags register (notoriously difficult to force otherwise)... And of course the big one: AVX512 (at least AVX512 foundation)!!! AMD Zen4 is now confirmed to support AVX512.
5.2: Any chance of inline assembly (as a string passed into a Burst function)?
5.3: Is there going to be a software implementation of ARM Neon intrinsics?
5.4: Are there any plans on intrinsics support for IL2CPP builds? - Meaning e.g.
IsAvxSupported
would appropriately evaluate to true or false during an IL2CPP build? After all, at the very least we know that every single X86-64 CPU supports SSE2.

6: Finally... Two rather small Questions regarding generated X86 code (you might or might not have any influence on)
6.1: Division of two non compile time constant Unity.Mathematics (u)int vectors is performed in a scalar fashion. Isn't int -cvt-> double -divide-> -cvt-> int 100% safe? At the very least it's faster, even if you have to convert 2 int4s to 4 double2s, for example.
6.2: In SIMD we have boolean masks being generated from comparisons - they're either all ones or all zeros. To convert such a mask to a C# boolX vector for storing it in memory, the code I've seen uses a mask of 0x01010101... which get's AND-ed with the SIMD mask. Isn't the "abs" instruction much, much better here? (100% no cache miss, also 1 clock cycle latency)
abs(0xFF)
is
0x01
This is a little too long but I hope that I didn't overwhelm anyone who read this. As a final note I can yet again only say that the Burst team, imo, does great work and is the very foundation DOTS is built upon. Thank you!

DreamingImLatios · Jan 7, 2022

Mortuus17 said: ↑

2: Compile time access to
FloatMode
and
FloatPrecision
Click to expand...

Something I've always wondered but haven't gotten around to testing is if it is possible to override these for a specific function? So if I have a static class function library which need strict floats to work properly, but want to allow callers to use weaker constraints outside of the functions, is that possible?

Mortuus17 · Jan 8, 2022

DreamingImLatios said: ↑

[...] is that possible?
Click to expand...

I tried...:

Code (CSharp):

[BurstCompile( FloatMode = FloatMode.Fast)]

public struct Test : IJob

{

public NativeArray<float> output;

[BurstCompile(FloatMode = FloatMode.Strict)]

float DoIt(float x)

{

return rsqrt(x);

}

public void Execute

{

output[0] = DoIt(output[0]);

}

}

... in hopes that a (legal, after all)
BurstCompileAttribute
on an IJob method would override that of its parent IJob, but even that is not the case. Looking at the assembly output, it returns the
FloatMode.Fast
version (You could compare it with the same job without the attribute above the
DoIt
method).

The only ways I see for it to work are to either make that use of the Attribute work for methods (even outide of jobs) or to have that related wish of mine fullfilled PLUS the static readonly property not being... readonly - you being able to set (and reset) the
FloatMode
and/or
FloatPrecision
in your methods.
I prefer the Arribute working as I thought it should work instead, though.

sheredom · Jan 10, 2022

DreamingImLatios said: ↑

Something I've always wondered but haven't gotten around to testing is if it is possible to override these for a specific function? So if I have a static class function library which need strict floats to work properly, but want to allow callers to use weaker constraints outside of the functions, is that possible?
Click to expand...

You cannot override them per function - while for some of these we could make that work with LLVM, for things like float precision that is a job-wide override (so we couldn't easily have low precision cos in one function, and high precision cos in another).

Mortuus17 said: ↑

This is a little too long but I hope that I didn't overwhelm anyone who read this. As a final note I can yet again only say that the Burst team, imo, does great work and is the very foundation DOTS is built upon. Thank you!
Click to expand...

This is a great list btw, as always with feedback from our users we've all read it, filed tasks for anything new you've proposed, and linked this post with any previous asks for similar features.

We're hard at work on compile time improvements since 1.5 released (1.6, 1.7, and our next version are all primarily about getting your code compiled faster), but we hope we'll be able to make some time post the next version being released to address exactly these kind of requests!

Mortuus17 · Jan 10, 2022

Thank you @sheredom, as long as the motivation behind these wishes is understood and agreed upon and they are thus put into your backlog, I couldn't be happier (almost ).

And yes, tackling Bursts compilation performance first serves more users than these "advanced" features and should be tackled before introducing additional complexity. Again - great work!

Search Unity

My Burst Feature Wish List

Mortuus17

DreamingImLatios

Mortuus17

sheredom

Unity Technologies

Mortuus17

Search Unity

Unity ID

Useful Searches

My Burst Feature Wish List

Mortuus17

DreamingImLatios

Mortuus17

sheredom

Unity Technologies

Mortuus17