Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Why Jobs are slower in this case

Discussion in 'Entity Component System' started by Micz84, Aug 21, 2018.

  1. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    I was wondering what is better to have two jobs that do same small tasks one after another (one with if statement) or is better to combine them into one. From my test, it turned out that two jobs are slightly better despite I go thru all the elements twice. But just for comparison, I have recreated the same logic in a standard for loop and the outcome have surprised me. A single-threaded code was 5 times faster. Here is my code:
    Code (CSharp):
    1.  
    2. using Unity.Collections;
    3. using Unity.Jobs;
    4. using UnityEngine;
    5.  
    6. public class TEST : MonoBehaviour
    7. {
    8.     public int Count = 5000000;
    9.     public bool one = true;
    10.     public bool normal = false;
    11.     private NativeArray<float> _floats;
    12.     private float[] _array;
    13.  
    14.     private void Start()
    15.     {
    16.         _floats = new NativeArray<float>(Count, Allocator.Persistent);
    17.  
    18.         var job = new InitJob()
    19.         {
    20.             counters = _floats,
    21.             init = 3
    22.         }.Schedule(Count, 64);
    23.         job.Complete();
    24.         _array = _floats.ToArray();
    25.     }
    26.  
    27.     private struct InitJob : IJobParallelFor
    28.     {
    29.         public NativeArray<float> counters;
    30.         public float init;
    31.  
    32.         public void Execute(int index)
    33.         {
    34.             counters[index] = init;
    35.         }
    36.     }
    37.  
    38.     private struct DecreaseJob : IJobParallelFor
    39.     {
    40.         public NativeArray<float> counters;
    41.         public float deltaTime;
    42.  
    43.         public void Execute(int index)
    44.         {
    45.             counters[index] -= deltaTime;
    46.         }
    47.     }
    48.  
    49.     private struct DecreaseAndReset : IJobParallelFor
    50.     {
    51.         public NativeArray<float> counters;
    52.         public float deltaTime;
    53.         public float init;
    54.  
    55.         public void Execute(int index)
    56.         {
    57.             counters[index] -= deltaTime;
    58.             if (counters[index] < 0)
    59.                 counters[index] = init;
    60.         }
    61.     }
    62.  
    63.     private struct ResetJob : IJobParallelFor
    64.     {
    65.         public NativeArray<float> counters;
    66.         public float init;
    67.  
    68.         public void Execute(int index)
    69.         {
    70.             if (counters[index] < 0)
    71.                 counters[index] = init;
    72.         }
    73.     }
    74.  
    75.     private void Update()
    76.     {
    77.         if (normal)
    78.         {
    79.             var deltaTime = Time.deltaTime;
    80.             for (int i = _array.Length - 1; i >= 0; i--)
    81.             {
    82.                 _array[i] -= deltaTime;
    83.                 if (_array[i] < 0)
    84.                     _array[i] = 3;
    85.             }
    86.         }
    87.         else
    88.         if (one)
    89.             One();
    90.         else
    91.             Two();
    92.     }
    93.  
    94.     public void Two()
    95.     {
    96.         var jobhandle = new DecreaseJob()
    97.         {
    98.             counters = _floats,
    99.             deltaTime = Time.deltaTime
    100.         }.Schedule(Count, 64);
    101.  
    102.         var second = new ResetJob()
    103.         {
    104.             counters = _floats,
    105.             init = 3
    106.         }.Schedule(Count, 64, jobhandle);
    107.  
    108.         second.Complete();
    109.     }
    110.  
    111.     public void One()
    112.     {
    113.         var jobhandle = new DecreaseAndReset()
    114.         {
    115.             counters = _floats,
    116.             deltaTime = Time.deltaTime,
    117.             init = 3
    118.         }.Schedule(Count, 64);
    119.  
    120.         jobhandle.Complete();
    121.     }
    122.  
    123.     private void OnDestroy()
    124.     {
    125.         _floats.Dispose();
    126.     }
    127. }
    Am I doing something wrong in my jobs?

    I am using Unity 2018.2.3f1.

    edit:
    I have added burst compiler and now things have changed.
    A normal way is 5 times slower, two jobs vs one are comparable.
    But still why without burst it is that much slower it uses 8 cores so that alone should give a nice boost.
     
    Last edited: Aug 21, 2018
  2. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    Last edited: Aug 21, 2018
    danidina330 likes this.
  3. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    I have tested it in the editor, but there was no burst so I could not disable safety checks. It is not an ECS it is MonoBehaviour with jobs. I will disable burst and will profile in build and see what will be the result.
     
  4. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    3,356
    You are completing the jobs immediately after scheduling which just forces them to run on the main thread.

    You should have a JobHandle field and then in update you Complete() that then schedule it again.

    You still get burst because burst is tied to a job context not a thread.
     
  5. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    You should use IJobParallelForBatch for this. But this doesn't change alot (~5-10%) which makes me think @5argon is right.

    Replacing your execute code in DecreaseAndReset by this:
    Code (CSharp):
    1.  
    2. var current = counters[index];
    3. current -= deltaTime;
    4. if (current < 0)
    5.     counters[index] = init;
    6. else
    7.     counters[index] = current;
    and removeing one index access makes it ~30% faster. Seems to be nearly 100% safty checks of NativeArray job usage.

    Some Tests:

    Editor:
    Normal: 25ms
    One and Two: somewhere around 100ms

    Editor Burst:
    Normal: 25ms
    One: 2.7ms
    Two: 4.0ms

    Debug Build Mono:
    Normal: 25ms
    One: 19ms
    Two: 22ms

    Release Build Mono:
    Normal: 25ms
    One: 19ms
    Two: 22ms

    ReleaseBuild Mono + Burst:
    Normal: 25ms
    One: 1.4ms
    Two: 2.4ms

    Release Build IL2CPP:
    Normal: 9.5ms
    One: 1.44ms
    Two: 2.5ms

    ReleaseBuild IL2CPP + Burst:
    Normal: 9.5ms
    One: 1.4ms
    Two: 2.3ms

    Conclusion: Mono Release is still slower than expected, probably because of function call overhead. IL2CPP get's pretty much what you would expect (4core + HT in this case) probably due to inlining and much more optimized native code. Burst doesn't change that much for IL2CPP in this particular case but fixes the bad code generated by Mono.
     
    Last edited: Aug 21, 2018
    Micz84 likes this.
  6. julian-moschuering

    julian-moschuering

    Joined:
    Apr 15, 2014
    Posts:
    529
    He uses IJobParallelFor. The main thread will wait until everything is done but it still uses all CPU cores.
     
  7. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    Exactly I have expected that this alone will give good bust.
    But as always newer jump to conclusion when profiling in the editor is a good advice :)

    I did not use IJobParallelForBatch because I do not have access to it. Is it in ECS package?
    I was just testing jobs performance and the main point of this tests was to check if two smaller jobs (one without if statement) will be better than one in this case.

    Thx for your help :)
     
  8. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    3,356
    Never actually tested that but it makes sense.
     
  9. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    It is distributed across all cores, it can be seen in profiler. I've used Complete on an update to make a more fair comparison.
     
  10. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    I think the safety check is on `NativeArray` struct and so it does not matter if you use ECS system or MonoBehaviour.

    Also like @julian-moschuering said when building to device if you are using Android + Mono (for faster build) you still have something more to gain by using IL2CPP because it enables "fast path" ( https://forum.unity.com/threads/nat...itude-slower-than-arrays.535019/#post-3526924 )

    The "Jobs > Leak Detection and/or Enable Burst Safety Check" seems to be designed to disable this and enable you to profile truthfully from editor, but I remembered turning them on-off and see no difference. Not sure if it works in the current version now or not.
     
  11. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    It only affects the bursted code. So if you have a lot of loops on the main thread accessing NativeArray there is no way to make that fast in the editor.

    Of course in reality why would you use NativeArray and then write main thread code?
     
    5argon likes this.
  12. mike_acton

    mike_acton

    Unity Technologies

    Joined:
    Nov 21, 2017
    Posts:
    110
    There is no answer to this question as a blanket rule. It totally depends on the data you're working with. As a rule of thumb, this most straightforward approach is to start with simpler split jobs and merge them as you discover the data access can be shared.

    If you want to micro-benchmark a specific case like this you'll want to make sure the code is Burst-compiled unless you're specifically looking at the performance of mono (or il2cpp).

    You can also create a PerformanceTest, which is very much like a UnitTest which you can run from the editor.

    e.g. A similar test to yours:
    Code (CSharp):
    1. using Unity.Burst;
    2. using Unity.Collections;
    3. using Unity.Collections.LowLevel.Unsafe;
    4. using Unity.Jobs;
    5. using Unity.Mathematics;
    6. using Unity.PerformanceTesting;
    7.  
    8. namespace Unity.Entities.PerformanceTests
    9. {
    10.     public class NativeArrayIterationPerformanceTests
    11.     {
    12.         [BurstCompile(CompileSynchronously = true)]
    13.         struct AddDeltaAndReset : IJobParallelFor
    14.         {
    15.             public NativeArray<int> Source;
    16.             public int Delta;
    17.             public int ResetThreshold;
    18.  
    19.             public void Execute(int index)
    20.             {
    21.                 var projectedValue = Source[index] + Delta;
    22.                 Source[index] = math.@select(0, projectedValue, projectedValue < ResetThreshold);
    23.             }
    24.         }
    25.        
    26.         [BurstCompile(CompileSynchronously = true)]
    27.         unsafe struct AddDeltaAndResetPtr : IJobParallelFor
    28.         {
    29.             [NativeDisableUnsafePtrRestriction]
    30.             public int* Source;
    31.             public int Delta;
    32.             public int ResetThreshold;
    33.  
    34.             public void Execute(int index)
    35.             {
    36.                 var projectedValue = Source[index] + Delta;
    37.                 Source[index] = math.@select(0, projectedValue, projectedValue < ResetThreshold);
    38.             }
    39.         }
    40.        
    41.         [BurstCompile(CompileSynchronously = true)]
    42.         struct AddDelta : IJobParallelFor
    43.         {
    44.             public NativeArray<int> Source;
    45.             public int Delta;
    46.  
    47.             public void Execute(int index)
    48.             {
    49.                 var projectedValue = Source[index] + Delta;
    50.                 Source[index] = projectedValue;
    51.             }
    52.         }
    53.        
    54.         [BurstCompile(CompileSynchronously = true)]
    55.         struct Reset : IJobParallelFor
    56.         {
    57.             public NativeArray<int> Source;
    58.             public int ResetThreshold;
    59.  
    60.             public void Execute(int index)
    61.             {
    62.                 var value = Source[index];
    63.                 Source[index] = math.@select(0, value, value < ResetThreshold);
    64.             }
    65.         }
    66.            
    67.         void SingleIterationWork(NativeArray<int> source, int delta, int resetThreshold)
    68.         {
    69.             var addDeltaAndResetJob = new AddDeltaAndReset
    70.             {
    71.                 Source = source,
    72.                 Delta = delta,
    73.                 ResetThreshold = resetThreshold
    74.             };
    75.             var addDeltaAndResetJobHandle = addDeltaAndResetJob.Schedule(source.Length, 1024);
    76.             addDeltaAndResetJobHandle.Complete();
    77.         }
    78.        
    79.         unsafe void SingleIterationWorkPtr(NativeArray<int> source, int delta, int resetThreshold)
    80.         {
    81.             var sourcePtr = (int*)source.GetUnsafePtr();
    82.             var addDeltaAndResetJob = new AddDeltaAndResetPtr
    83.             {
    84.                 Source = sourcePtr,
    85.                 Delta = delta,
    86.                 ResetThreshold = resetThreshold
    87.             };
    88.             var addDeltaAndResetJobHandle = addDeltaAndResetJob.Schedule(source.Length, 1024);
    89.             addDeltaAndResetJobHandle.Complete();
    90.         }
    91.        
    92.         void SplitIterationWork(NativeArray<int> source, int delta, int resetThreshold)
    93.         {
    94.             var addDeltaJob = new AddDelta
    95.             {
    96.                 Source = source,
    97.                 Delta = delta
    98.             };
    99.             var addDeltaJobHandle = addDeltaJob.Schedule(source.Length, 1024);
    100.             var resetJob = new Reset
    101.             {
    102.                 Source = source,
    103.                 ResetThreshold = resetThreshold
    104.             };
    105.             var resetJobHandle = addDeltaJob.Schedule(source.Length, 1024, addDeltaJobHandle);
    106.             resetJobHandle.Complete();
    107.         }
    108.  
    109.         [PerformanceTest]
    110.         public void SingleVsSplitIterationJob()
    111.         {
    112.             var count = 10 * 1024 * 1024;
    113.             var source = new NativeArray<int>(count, Allocator.TempJob);
    114.             var delta = 1;
    115.             var resetThreshold = 1;
    116.  
    117.             // Mask sure Burst is compiled.
    118.             SingleIterationWork(source, delta, resetThreshold);
    119.             SingleIterationWorkPtr(source, delta, resetThreshold);
    120.             SplitIterationWork(source, delta, resetThreshold);
    121.            
    122.             var sampleSingle = new SampleGroupDefinition("SingleIteration");
    123.             var sampleSinglePtr = new SampleGroupDefinition("SingleIterationPtr");
    124.             var sampleSplit = new SampleGroupDefinition("SplitIteration");
    125.  
    126.             using (Measure.Scope(sampleSingle))
    127.             {
    128.                 SingleIterationWork(source, delta, resetThreshold);
    129.             }
    130.            
    131.             using (Measure.Scope(sampleSinglePtr))
    132.             {
    133.                 SingleIterationWorkPtr(source, delta, resetThreshold);
    134.             }
    135.            
    136.             using (Measure.Scope(sampleSplit))
    137.             {
    138.                 SplitIterationWork(source, delta, resetThreshold);
    139.             }
    140.                
    141.             source.Dispose();
    142.         }
    143.     }
    144. }
    145.  
    The asmdef in this case includes the following references:
    Code (JavaScript):
    1.     "references": [
    2.         "Unity.PerformanceTesting",
    3.         "Unity.Entities",
    4.         "Unity.Mathematics",
    5.         "Unity.Jobs",
    6.         "Unity.Burst",
    7.         "Unity.Collections"
    8.     ],
    When you run the test, you'll see the output timing. On my particular machine, the above looks like:
    Code (CSharp):
    1. SingleIteration 3.07 Millisecond
    2. SingleIterationPtr 3.15 Millisecond
    3. SplitIteration 6.54 Millisecond
    And the win as expected in this *particular* case is with a single iteration. (Also tested here is a comparison of NativeArray versus raw pointer - which are, also as expected, basically the same.)
     
  13. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    I did not know that there is a performance test tool good to know for the future :).
    I have added Unity.PerfrmanceTesting to asmdef file but I get an error :
    Assembly has reference to non-existent assembly 'Unity.PerformanceTesting' (Assets/Tests/Tests.asmdef)

    Do I have to add some package? I have Burst 0.2.4.-preview.25 and Entities 0.0.12-preview.8.
     
  14. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    You need the performance testing package. This is the one we use internally for ECS:
    "com.unity.test-framework.performance": "0.1.31-preview",
     
  15. gintautass

    gintautass

    QA Minion Unity Technologies

    Joined:
    Oct 27, 2015
    Posts:
    46

    Including it in packages manifest.json should fix the issue. Afterwards you will need to manually edit assembly definition files to include this Unity.PerformanceTesting assembly.

    You can find some documentation on package readme.
     
  16. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448

    I have added package to the manifest and Unity.PerformanceTesting to asmdef file but Unity.PerformanceTesting namespace is not available.
     
  17. gintautass

    gintautass

    QA Minion Unity Technologies

    Joined:
    Oct 27, 2015
    Posts:
    46
    Add perf package to testables:

    Code (CSharp):
    1.       "testables": [
    2.             "com.unity.test-framework.performance"
    3.       ],

    if it still gives some errors then add these modules to manifest.json dependencies

    Code (CSharp):
    1.         "com.unity.modules.jsonserialize": "1.0.0",
    2.         "com.unity.modules.unitywebrequestwww": "1.0.0",
    3.         "com.unity.modules.unitywebrequest": "1.0.0",
    4.         "com.unity.modules.vr": "1.0.0"
     
    Micz84 likes this.
  18. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    448
    Ok I managed to make it work :) thx.
    I like the fact that I can just wrap all sample groups in a loop and get things like median, min, max :)