Search Unity

Resolved Using Unity.Mathematics without Burst much slower than Mathf

Discussion in 'Burst' started by vectorized-runner, Oct 11, 2021.

  1. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    Hi, I've been using Unity.Mathematics in a non-bursted code in a project (where using Dots is not possible), and did some simple benchmarks using matrices (comparing float4x4.TRS and Matrix4x4.TRS etc) and float4x4 is around 5 times slower than Matrix4x4. float3 seems to be slower than Vector3 as well. So I wonder if I shouldn't use Unity.Mathematics at all if I'm not going to be able to Burst compile it. Any suggestions on this?

    Also why is math so slow when not burst compiled?
     
  2. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792
    I read somewhere that Unity.Mathematics is actually slower and should only be used in conjunction with Burst or IL2CPP.
     
  3. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    I'm indeed using IL2CPP. Perhaps I should do some benchmarks on builds.
     
  4. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    There are many things that make it slower.

    First, the methods you mentioned are actually extern functions called via interop. They are essentially "Burst" compiled (actually natively compiled) already, most likely using SIMD code.
    I don't KNOW this but it may also be the case that when you benchmark a specific function in a loop, the native function may be cached, resulting in interop barrier (~40 assembly instructions) only being called once or twice. (PLS if anyone knows more...)

    Secondly and more importantly,
    float
    arithmetic in Mono runtime is extremely inefficient; All arithmetic is done in
    double
    precision, with scalar conversions back and forth. With a latency of 2 clock cycles of
    float
    ->
    double
    and a throughput of 1 per cycle, the conversion of 16 floats takes 16 + 2 clock cycles at least.
    double
    ->
    float
    takes 4 cycles, therefore converting back takes 16 + 4 cycles; 38 clock cycles in total. This may already be the latency of the interop barrier alone.

    Then there is the fact that SIMD code can multiply/add 4 floats at once, maybe even using fused multiply add instructions.
    Additionally, since 16 float registers are used, this may result in register spilling onto the stack and reloads instead of SIMD vector shuffle instructions in register. Depends on the Mono JIT.

    If you use IL2CPP, though, the performance will be almost identical I'd assume.
     
  5. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792
    That would apply to both versions (float3 and Vecto3) in mono without burst.
    2021.2 uses an updated mono and should now also treat floats internally as float.
     
  6. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,724
    If you haven't tested on builds, you haven't tested IL2CPP, since the editor always uses Mono.

    Mathematics in mono is VERY slow. (in our tests, in editor, it was like 50% slower for super simple stuff)

    Mathematics in IL2CPP is ever so slightly faster than Mathf. (in the same above test, in a build, Mathematics was up to 5% faster, maybe less, practically identical, but it was always slightly faster)
     
  7. Mortuus17

    Mortuus17

    Joined:
    Jan 6, 2020
    Posts:
    105
    Not if the function using the Vector3 is a native DLLImport function.

    Cool! Let's hope 2021 LTS isn't too far away.
     
  8. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    398
    I just took an Android build with IL2CPP and now on the profiler it looks like float4x4 is a lot faster, good reminder to benchmark only on builds
     
  9. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792
  10. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792
  11. mgear

    mgear

    Joined:
    Aug 3, 2010
    Posts:
    9,408
  12. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792