Search Unity

Cannot vectorize any (tried many online examples)

Discussion in 'Burst' started by Trindenberg, Jul 30, 2021.

  1. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    396
    Hi,

    I've tried getting this to work, but in the inspector it just doesn't want to work for any. The most common is 'loop control flow is not understood by the vectorizer'. Is there some setting somewhere I haven't checked? I've tried turning safety off. I added Burst to my 2021.1 project, do I need to add some other package?

    Some example off the net that should be working:

    *Call instruction can't be vectorized
    Code (CSharp):
    1. [BurstCompile]
    2. struct MyJob2 : IJobParallelFor
    3. {
    4.     [ReadOnly] public NativeArray<float> Input;
    5.     [WriteOnly] public NativeArray<float> Output;
    6.  
    7.     public void Execute(int index)
    8.     {
    9.         Output[index] = math.sqrt(Input[index]);
    10.     }
    11. }
    *Loop control flow is not understood by the vectorizer
    Code (CSharp):
    1. [BurstCompile(CompileSynchronously = true)]
    2. public struct MyJob : IJob
    3. {
    4.     [ReadOnly]
    5.     public NativeArray<float> Input;
    6.  
    7.     [WriteOnly]
    8.     public NativeArray<float> Output;
    9.  
    10.     public void Execute()
    11.     {
    12.         float result = 0.0f;
    13.         for (int i = 0; i < Input.Length; i++)
    14.         {
    15.             result += Input[i];
    16.         }
    17.         Output[0] = result;
    18.     }
    19. }
    *Value that couldn't be identified as reduction is used outside the loop
    Code (CSharp):
    1. [BurstCompile]
    2. public struct ReallyToughJob : IJob
    3. {
    4.     public void Execute()
    5.     {
    6.         float value = 0f;
    7.         for (int i = 0; i < 50000; i++)
    8.         {
    9.             value = math.exp10(math.sqrt(value));
    10.         }
    11.     }
    12. }
     
  2. HellGate94

    HellGate94

    Joined:
    Sep 21, 2017
    Posts:
    132
    its just info that it cant optimize it into simd calls. the main problem is that it doesnt know how big your array is going to be (maybe hints could help here idk) but else you could do something like

    Code (CSharp):
    1.     float result = 0.0f;
    2.     for (int i = 0; i < Input.Length; i+=4)
    3.     {
    4.         result += Input;
    5.         result += Input[i + 1];
    6.         result += Input[i +  2];
    7.         result += Input[i + 3];
    8.     }
    9.     Output[0] = result;

    this should work but only works if the array size is a multiple of 4
     
  3. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    396
    I don't think that's the issue, there are many working examples out there, and I remember this worked years ago when I messed around with ECS although not sure if there was a Burst Inspector back then! I get the feeling it's something very simple. Maybe I need to make a new project and see if it works, unless someone knows.
     
  4. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    Using Unity 2020.3.8f1 with burst 1.6, default settings for BurstCompile until noted otherwise, AVX2 assembly inspector

    The first example job, sqrt loop, vectorizes for me if you make it an IJob instead of IJobParallel. It unrolls and SIMDifies it to get to 4x vsqrtps ops per loop, or 32 sqrts.

    Adding an exp10 to that to get a loop of out = exp10(sqrt(in)), also vectorizes, but does 8 at a time (with a whole bunch of fmadd's etc presumably doing the exp10).

    The sum loop doesn't vectorize or unroll for me using default settings. Also doesn't warn in the diagnostics.
    If I add
    [BurstCompile(FloatMode = FloatMode.Fast, OptimizeFor = OptimizeFor.Performance)]
    to the sum loop job, it does vectorize into what seems to be 32 floats / loop with a backup scalar loop. Only FloatMode or only OptimizeFor isn't enough.
    The problem with the float sum loop is that these two math operations do not give the same result for all inputs:
    out1 = ((((a + b) + c) + d) + e) + f
    out2 = (a + b) + (c + d) + (e + f)
    the easy loop has the semantics of out1, while the SIMD version needs something like out2. FloatMode.Fast allows this change.
     
    apkdev likes this.
  5. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    396
    Thank you, I have been getting closer on the understanding of SIMD and the logic involved. Think I read somewhere that IJobForParallel will be obsolete at some point, as IJob which can be scheduled in parallel anyway. Also using structs of 4/8 x Type is something I want to try on top of NativeArray, as this is sending a vectorized struct in.