Search Unity

Calling burst code directly from C#? (2021.2)

Discussion in 'Burst' started by Kichang-Kim, Feb 18, 2021.

  1. Kichang-Kim

    Kichang-Kim

    Joined:
    Oct 19, 2010
    Posts:
    1,012
    Hi. I found that Unity 2021.2 supports calling burst function from C# without FunctionPointer.

    Here is part of release notes:
    https://unity3d.com/jp/unity/alpha/2021.2.0a5
    But I can't find any details for this in Burst's documentation. Any information yet?
     
    PhilSA likes this.
  2. jasons-novaleaf

    jasons-novaleaf

    Joined:
    Sep 13, 2012
    Posts:
    181
    It is transparent to the developer. You just call the functions normally. The magic/difficulty is ensuring that the target code is compiled with burst (Use the BurstCompile attribute, make sure burst is on, etc)

    However, there is additional marshaling overhead calling C++ from C#. So you should not make high-frequency calls to the burst compiled code. Just call it once per update, etc.
     
    Last edited: Feb 18, 2021
  3. sheredom

    sheredom

    Unity Technologies

    Joined:
    Jul 15, 2019
    Posts:
    300
  4. jasons-novaleaf

    jasons-novaleaf

    Joined:
    Sep 13, 2012
    Posts:
    181
    Doc page here: https://docs.unity3d.com/Packages/c...ang.html#directly-calling-burst-compiled-code

    @sheredom The docs don't mention any marshaling / performance costs. Given the math examples is it "fine" to call burst code like this at high frequency? Or better to stick in managed land? Or does it not matter under certain circumstances, like if your code is compiled via IL2CPP?
    Code (CSharp):
    1.     [BurstCompile]
    2.     public static void BurstCompiled_MultiplyAdd(in float4 mula, in float4 mulb, in float4 add, out float4 result)
    3.     {
    4.         result = mula * mulb + add;
    5.     }
     
  5. jasons-novaleaf

    jasons-novaleaf

    Joined:
    Sep 13, 2012
    Posts:
    181
    I took a look and some surprising results from running the code example here: https://github.com/keijiro/BurstDirectCallTest

    fyi I am new to Unity, but a basic IL2CPP compile with unity's built-in profiler shows the following timings:

    NoiseTexture.Update() timings:
    • Burst Enabled (the code in github): 18.5ms
    • Burst Disabled: 97ms
    • Burst Enabled, Inline Unsafe: 16.8ms

    The "Burst Enabled, Inline Unsafe" was done by passing the texture directly (as a NativeArray<Uint>) via a uint*. main code rewritten to:

    Code (CSharp):
    1.     [BurstCompile]
    2.     public unsafe static void FillTexturePtrInline(float time, int _size, uint* buffer)
    3.     {
    4.         var offs = 0;
    5.  
    6.         for (var y = 0; y < _size; y++)
    7.             for (var x = 0; x < _size; x++)
    8.             {
    9.                 //buffer[offs++] = TextureGenerator.GetPixel(x, y, time);
    10.                 var pos = math.float3(x, y, time) * math.float3(0.008f, 0.008f, 0.5f);
    11.                 var f32 = noise.snoise(pos) * 0.4f + 0.5f;
    12.                 var un8 = (uint)(math.saturate(f32) * 255);
    13.                 buffer[offs++] = un8 | un8 << 8 | un8 << 16 | 0xff000000;
    14.             }
    15.     }
    with main code:
    Code (CSharp):
    1.         //optimized
    2.         unsafe
    3.         {
    4.             var temp = _texture.GetRawTextureData<uint>();
    5.             var p_texture = (uint*)temp.GetUnsafePtr();
    6.             TextureGenerator.FillTexturePtrInline(time, _size, p_texture);
    7.             _texture.Apply();
    8.         }
    I am surprised that the "Inline Unsafe" is only slightly faster. putting the loop in the burst function seems to have no effect. Basically this is saying don't bother with unsafe nor high-frequency inlining.

    PS: something I learned: "using" the NativeArray disposes it at the end of the function.
     
    Last edited: Feb 19, 2021
    recursive likes this.
  6. Kichang-Kim

    Kichang-Kim

    Joined:
    Oct 19, 2010
    Posts:
    1,012
    Thanks for sharing information! It looks cool.
     
  7. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    285
    I'm not sure that's true. Given that 18ms is more time than you have to update the whole frame, shaving 2ms off that sounds like a pretty good deal :D That said, the body of the bursted function is quite heavy, and the ~65k bursted calls probably don't have a huge overhead.

    From my testing it appears that calling bursted functions is fast enough that it's often worth it to burst-compile even small math utilities (eg. float3/quaternion SmoothDamp funcs). Obviously, the larger block you're burst-compiling the better, but low-hanging fruit optimizations are always nice.
     
    sheredom likes this.
  8. sheredom

    sheredom

    Unity Technologies

    Joined:
    Jul 15, 2019
    Posts:
    300
    I think you'd see a perf uplift with nearly all math functions, but as @apkdev says the bigger the block of work the better (especially if you are calling a Burst function in a loop - it is always better to do the loop in Burst because we can potentially vectorize better).
     
  9. jasons-novaleaf

    jasons-novaleaf

    Joined:
    Sep 13, 2012
    Posts:
    181


    Switching to Burst resulted in a drop from 97ms to 18ms and was basically "free".

    The main problem I see is to shave off that 2ms you need to drop into unsafe and rewrite your code.

    Sure having unsafe+inline as a tool of last resort is great, but there is almost always another way of vastly improving performance without resorting to unsafe.

    In this example's circumstance, better time investment would be in:
    • Update half of the pixels ever frame (further 50% perf improvement!)
    • Decrease resolution / Interpolate pixels
    • do in a ParallelFor job
    • do all of the above
     
    Razmot likes this.