Search Unity

Question Why this loop doesn't get vectorized?

Discussion in 'Entity Component System' started by mikaelK, Jun 25, 2022.

  1. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    So started experimenting with the burst expect vectorized and found out that a very simple case doesn't work as expected.
    .
    The code.

    Code (CSharp):
    1. public void Execute(ArchetypeChunk batchInChunk, int batchIndex, int indexOfFirstEntityInQuery)
    2. {
    3.     NativeArray<TargetInternalOptimized2>.ReadOnly targets =
    4.         batchInChunk.GetNativeArray(this.tHandle).AsReadOnly();
    5.  
    6.     for (int index = batchInChunk.Count - 1; index >= 0; index--)
    7.     {
    8.      
    9.         Unity.Burst.CompilerServices.Loop.ExpectVectorized();
    10.         var target = targets[index];
    11.     }
    12. }
    Component that breaks vectorization.

    Code (CSharp):
    1. public struct TargetInternalOptimized2 : IComponentData
    2. {
    3.     //xyz position, w entityQueryIndex
    4.     public float4 positionAndQueryIndex;
    5. }
    Its odd. Just a component with float4 inside it.
     

    Attached Files:

    Last edited: Jun 25, 2022
  2. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    5,989
    This doesn‘t perform any work. I‘d expect the compiler to optimize this away. Nothing to vectorize.
    Try to assign to another array of same type and length.
     
  3. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    Thanks for the reply, I changed the code to this and it still claims it can't be vectorized.


    Code (CSharp):
    1.  
    2. [BurstCompile]
    3. public struct InitChunksFromTargets2 : IJobChunk
    4. {
    5.     [ReadOnly] public ComponentTypeHandle<TargetInternalOptimized2> tHandle;
    6.     [WriteOnly] public ComponentTypeHandle<TargetInternalOptimized2> tHandle2;
    7.     internal int4 key;
    8.     internal float4 positionAndHalfChunkSize;
    9.     private TargetChunk targetChunkToAdd;
    10.  
    11.     public void Execute(ArchetypeChunk batchInChunk, int batchIndex, int indexOfFirstEntityInQuery)
    12.     {
    13.         NativeArray<TargetInternalOptimized2>.ReadOnly targets = batchInChunk.GetNativeArray(this.tHandle).AsReadOnly();
    14.         NativeArray<TargetInternalOptimized2> targets2 = batchInChunk.GetNativeArray(this.tHandle2);
    15.        
    16.         for (int index = 0; index < batchInChunk.Count; index++)
    17.         {
    18.             Unity.Burst.CompilerServices.Loop.ExpectVectorized();
    19.             targets2[index] = targets[index];
    20.         }
    21.     }
    22. }
    upload_2022-6-25_13-31-13.png
     
  4. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    966
    Someone else has to get into more detail why that happens.
    I could not get float4 to vectorize. Float on the other hand works. Maybe the 512 register doesn't have auto vec support?

    On the other hand. Getting the pointers and a memcpy would be much faster.
     
    mikaelK likes this.
  5. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    And the math package says that we should use float4 just to be safe.
     
    Last edited: Jun 25, 2022
  6. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    Yea its faster, but does it help if you need to change things in between?
     
  7. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    966
    Not really, it's only a solution when we are talking about straight up copying.
     
  8. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    Yea, thanks for replying.
    I tried to strip down all the code to the minimum to find out the issue. But no matter what I dot it wont get vectored
     
  9. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    966
    Some further tips, not necessarily for this case but what I learnt. Burst doesn't know what to do with structs, so casting to simple type pointers like float4* or Reinterpret<T> an array helps.
     
    mikaelK likes this.
  10. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,270
    Burst has two modes of vectorization. The first mode is loop-vectorization. The second is instruction vectorization. The ExpectVectorized() intrinsic only checks the first, but using the math types like float4 causes Burst to switch to the second mode.
     
    randomdragon and mikaelK like this.
  11. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    5,989
    Also the missing StructLayout attribute on the struct could make a difference.
     
  12. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,761
    I actually was experimenting with this yesterday

    Code (CSharp):
    1.  
    2.     Check.Assume(schema.BaseValue.Length % 4 == 0);
    3.     Check.Assume(schema.BaseValue.Length == modifiers.Length);
    4.     Check.Assume(schema.BaseValue.Length == result.Length);
    5.  
    6.     var min = schema.Min.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    7.     var max = schema.Max.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    8.     var baseValue = schema.BaseValue.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());
    9.  
    10.     var added = modifiers.Added.Reinterpret<int4>(UnsafeUtility.SizeOf<int>());
    11.     var increased = modifiers.Increased.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    12.     var reduced = modifiers.Reduced.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    13.     var more = modifiers.More.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    14.     var less = modifiers.Less.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    15.  
    16.     var stats = result.Reinterpret<float4>(UnsafeUtility.SizeOf<float>());
    17.  
    18.     for (var index = 0; index < baseValue.Length; index++)
    19.     {
    20. // #if UNITY_BURST_EXPERIMENTAL_LOOP_INTRINSICS
    21. //                 Unity.Burst.CompilerServices.Loop.ExpectVectorized();
    22. // #endif
    23.    
    24.         var addedResult = baseValue[index] + added[index];
    25.         var additiveResult = 1 + increased[index] - reduced[index];
    26.         var multiplicativeResult = more[index] * less[index];
    27.  
    28.        stats[index] = math.clamp(addedResult * additiveResult * multiplicativeResult, min[index], max[index]);
    29.     }
    30.  
    The code generated looks near perfect yet still fails the check.

    upload_2022-6-26_7-35-52.png

    So yeah, I think it is as Dreaming says. However, I did manage to make burst generate the exact same code without the Reinterpret (after some additional attributes) and it still failed.
     
  13. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    Hi I'm still lost with the vectorization stuff.
    Could you share with us the code for schema, modifiers and results ?
    It would help me understand the structs layout and their reinterpretation.
     
  14. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,761
    Exactly what you see. They're just NativeArray int/float which I'm reinterpretting to int4/float4
     
  15. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    Ok so schema.Min is a native array and schema.Max another native array.
    Do we have to split them or is there a way to have a native array of range struct with a min float and a max float and somehow reinterpret the min floats as float 4 for vectorization ?
     
  16. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    Now after a while figuring out the burst inspector my code seems similar. No interpret in use.
    Also in your screenshot the loop part is not vectored same as me. If the burst just check the loop part and returns its not vectored and doesn't care about if the other part is. Then it makes sense.

    I wonder if there is any way the loop can be vectored
     
  17. mikaelK

    mikaelK

    Joined:
    Oct 2, 2013
    Posts:
    284
    Ok interesting. Just for laughs I converted the loop to use int4.
    upload_2022-6-30_19-32-5.png
     
    Last edited: Jun 30, 2022