Search Unity

In parameters in Burst

Discussion in 'Burst' started by R2-RT, Nov 25, 2020.

  1. R2-RT

    R2-RT

    Joined:
    May 8, 2019
    Posts:
    38
    I am opening discussion about https://blogs.unity3d.com/2020/11/25/in-parameters-in-burst/ and copying my comment I've left there.

    I've been playing around `in` parameters few months back and in my case it turned out to add more assembly lines (probably overhead?) in jobs.
    I tend to have many small, utility functions that get used commonly in codebase [e.g. `float CalculateTriangleArea(float4 a, float4 b, float4 c)`].
    It is also common to have some of their parameters hardcoded in job.

    I've pressumed that this is the case of compiler being forced to take address of local temporary that is compile-time constant and not investigated further.

    Example with inlined function:

    Code (CSharp):
    1.  
    2. [BurstCompile]
    3. public struct MyJob : IJob
    4. {
    5.     public readonly struct SomeStruct
    6.     {
    7.         public readonly float3 Position;
    8.         public readonly float4x4 Rotation;
    9.  
    10.         public SomeStruct(float3 position, float4x4 rotation)
    11.         {
    12.             Position = position;
    13.             Rotation = rotation;
    14.         }
    15.     }
    16.  
    17.     public SomeStruct InDataA;
    18.     public float3 OutData;
    19.  
    20.     private static float3 DoSomething(SomeStruct a, SomeStruct b)
    21.     {
    22.         return math.rotate(a.Rotation, a.Position) +
    23.                 math.rotate(b.Rotation, b.Position);
    24.     }
    25.  
    26.     public unsafe void Execute()
    27.     {
    28.         OutData = DoSomething(InDataA, new SomeStruct(math.float3(1,2,3), float4x4.identity));
    29.     }
    30. }
    31.  
    emits:

    Code (CSharp):
    1.  
    2. vmovsd        xmm0, qword ptr [rcx]
    3. vbroadcastss        xmm1, xmm0
    4. vmulps        xmm1, xmm1, xmmword ptr [rcx + 12]
    5. vpermilps        xmm0, xmm0, 213
    6. vmulps        xmm0, xmm0, xmmword ptr [rcx + 28]
    7. vaddps        xmm0, xmm1, xmm0
    8. vbroadcastss        xmm1, dword ptr [rcx + 8]
    9. vmulps        xmm1, xmm1, xmmword ptr [rcx + 44]
    10. vaddps        xmm0, xmm0, xmm1
    11. vaddps        xmm0, xmm0, xmmword ptr [rip + __xmm@0000000040400000400000003f800000]
    12. vmovss        dword ptr [rcx + 76], xmm0
    13. vextractps        dword ptr [rcx + 80], xmm0, 1
    14. vextractps        dword ptr [rcx + 84], xmm0, 2
    15. ret
    16.  

    whereas when we add `in` we get an extra `vinsertps` instruction.

    Thus for:

    Code (CSharp):
    1. private static float3 DoSomething(in SomeStruct a, in SomeStruct b){...}
    we get:

    Code (CSharp):
    1.  
    2. vmovsd        xmm0, qword ptr [rcx]
    3. vinsertps        xmm1, xmm0, dword ptr [rcx + 8], 32
    4. vbroadcastss        xmm2, xmm0
    5. vmulps        xmm2, xmm2, xmmword ptr [rcx + 12]
    6. vpermilps        xmm0, xmm0, 213
    7. vmulps        xmm0, xmm0, xmmword ptr [rcx + 28]
    8. vaddps        xmm0, xmm2, xmm0
    9. vpermilps        xmm1, xmm1, 234
    10. vmulps        xmm1, xmm1, xmmword ptr [rcx + 44]
    11. vaddps        xmm0, xmm1, xmm0
    12. vaddps        xmm0, xmm0, xmmword ptr [rip + __xmm@0000000040400000400000003f800000]
    13. vmovss        dword ptr [rcx + 76], xmm0
    14. vextractps        dword ptr [rcx + 80], xmm0, 1
    15. vextractps        dword ptr [rcx + 84], xmm0, 2
    16. ret
    17.  

    Second example, for which I cannot determine which assembly is "better": `MultiplyRefJob` vs `MultiplyInRefJob` vs `MultiplyInRef2Job`:

    Code (CSharp):
    1.  
    2. [MethodImpl(MethodImplOptions.NoInlining)]
    3. public static float4 Multiply(float4 vec, float v)
    4. {
    5.     return vec * v;
    6. }
    7.  
    8. [MethodImpl(MethodImplOptions.NoInlining)]
    9. public static float4 MultiplyIn(in float4 vec, in float v)
    10. {
    11.     return vec * v;
    12. }
    13.  
    14. [MethodImpl(MethodImplOptions.NoInlining)]
    15. public static float4 MultiplyIn2(in float4 vec, float v)
    16. {
    17.     return vec * v;
    18. }
    19.  
    20. [BurstCompile]
    21. public struct MultiplyRefJob : IJob
    22. {
    23.     public NativeArray<float4> x;
    24.     public NativeArray<float4> y;
    25.  
    26.     public unsafe void Execute()
    27.     {
    28.         UnsafeUtility.ArrayElementAsRef<float4>(y.GetUnsafePtr(), 0) =
    29.             Multiply(UnsafeUtility.ArrayElementAsRef<float4>(x.GetUnsafePtr(), 0), 1000f);
    30.     }
    31. }
    32.  
    33. [BurstCompile]
    34. public struct MultiplyInRefJob : IJob
    35. {
    36.     public NativeArray<float4> x;
    37.     public NativeArray<float4> y;
    38.  
    39.     public unsafe void Execute()
    40.     {
    41.         UnsafeUtility.ArrayElementAsRef<float4>(y.GetUnsafePtr(), 0) =
    42.             MultiplyIn(UnsafeUtility.ArrayElementAsRef<float4>(x.GetUnsafePtr(), 0), 1000f);
    43.     }
    44. }
    45.  
    46. [BurstCompile]
    47. public struct MultiplyInRef2Job : IJob
    48. {
    49.     public NativeArray<float4> x;
    50.     public NativeArray<float4> y;
    51.  
    52.     public unsafe void Execute()
    53.     {
    54.         UnsafeUtility.ArrayElementAsRef<float4>(y.GetUnsafePtr(), 0) =
    55.             MultiplyIn2(UnsafeUtility.ArrayElementAsRef<float4>(x.GetUnsafePtr(), 0), 1000f);
    56.     }
    57. }
    58.  

    MultiplyRefJob has bigger stack (48) than MultiplyInRefJob (32), but `Multiply` second parameter 1000f have been hardcoded into the function body:

    Code (CSharp):
    1.  
    2. "BurstFoos.Multiply(Unity.Mathematics.float4 vec, float v)_49D95762617E63CF":
    3.         vbroadcastss        xmm0, dword ptr [rip + __real@447a0000]
    4.         vmulps        xmm0, xmm0, xmmword ptr [rcx]
    5.         ret
    6.  
    Probably the winner is `MultiplyInRef2Job`, which has 32-bytes stack and hardcoded 1000f as second parameter. (Assuming optimization for speed, not size, right?).

    Thus, it seems, that having `in float v` disables hardcoding 1000f into `MultiplyIn` function. I have no idea how it propagates though function chain calls.

    Anyway, my conclusion: it is better to not use `in` parameters for utility functions. Maybe it is valueable for big functions, but I try to avoid such in jobs.

    PS. Burst Inspector converts `in` into `ref` in its call name, what suggests that `in` == `ref readonly` is not supported.
    PPS. I've used the newest Burst package version 1.4.1