Search Unity

Job is slower than standard code. Am I doing something wrong?

Discussion in 'Burst' started by DejaMooGames, Sep 21, 2022.

  1. DejaMooGames

    DejaMooGames

    Joined:
    Apr 1, 2019
    Posts:
    108
    I am digging into the jobs system and I am having some problems getting the performance gains that I am expecting. I don't know if I am approaching this wrong or if the task I am trying to jobify isn't appropriate. Here is what I have so far.

    This is the standard code.
    Code (CSharp):
    1.                 foreach (Cell cell in connectingCellWorkingList)
    2.                 {
    3.                     float cost = Vector3.Distance(position, cell.Coordinate);
    4.                     cost += Vector3.Distance(cell.Coordinate, connectedSection.EuclideanCenter);
    5.                
    6.                     if(cost >= lowestCost)
    7.                         continue;
    8.  
    9.                     bestCell = cell;
    10.                     lowestCost = cost;
    11.                 }
    Code (CSharp):
    1. public Cell GetBestConnectingCell(Vector3 startPoint, Vector3 endPoint, List<Cell> potentialCells)
    2.         {
    3.             //Setup data for the jobs to process
    4.             float3 origin = startPoint.ConvertToFloat3();
    5.             float3 terminus = endPoint.ConvertToFloat3();
    6.             NativeArray<float3> input = new (potentialCells.Count, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
    7.             NativeArray<int> output = new(1, Allocator.TempJob, NativeArrayOptions.UninitializedMemory);
    8.            
    9.             for (var i = 0; i < potentialCells.Count; i++)
    10.             {
    11.                 input[i] = potentialCells[i].Coordinate.ConvertToFloat3();
    12.             }
    13.            
    14.             //Build the job
    15.             var evaluateConnectionOriginJob = new EvaluateConnectionOriginsJob
    16.             {
    17.                 StartPoint = origin,
    18.                 EndPoint = terminus,
    19.                 Input = input,
    20.                 Output = output
    21.             };
    22.  
    23.             //Schedule the job
    24.             JobHandle jobHandle = evaluateConnectionOriginJob.Schedule();
    25.             jobHandle.Complete();
    26.  
    27.             //Get result
    28.             int cellIndex = output[0];
    29.            
    30.             //Cleanup native arrays
    31.             input.Dispose();
    32.             output.Dispose();
    33.            
    34.             //Return result
    35.             return potentialCells[cellIndex];
    36.         }
    37.  
    38.     [BurstCompile(CompileSynchronously = true)]
    39.     public struct EvaluateConnectionOriginsJob : IJob
    40.     {
    41.         [ReadOnly] public float3 StartPoint;
    42.         [ReadOnly] public float3 EndPoint;
    43.         [ReadOnly] public NativeArray<float3> Input;
    44.         [WriteOnly] public NativeArray<int> Output;
    45.  
    46.         public void Execute()
    47.         {
    48.             var lowestCost = float.MaxValue;
    49.             var lowestIndex = int.MaxValue;
    50.            
    51.             for (var i = 0; i < Input.Length; i++)
    52.             {
    53.                 float cost = math.distance(StartPoint, Input[i]) + math.distance(Input[i], EndPoint);
    54.                 if(cost >= lowestCost)
    55.                     continue;
    56.  
    57.                 lowestCost = cost;
    58.                 lowestIndex = i;
    59.             }
    60.  
    61.             Output[0] = lowestIndex;
    62.         }
    63.     }
    I tried doing an IJobParrallelFor but I was getting worse results than just doing this single job.

    Standard code executes in between 20-35 ticks
    Jobified code execute in between 300-500 ticks
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    First, what is a "tick"?
    Second, are you sure that the time to copy the data into native containers for the job is faster than doing the work in the main thread. That's a fairly light O(n) workload.
    Third, it seems like you are including allocating the arrays as part of the job timing. Is this true?
    Fourth, are you scheduling or running the job? Jobs sacrifice latency to gain concurrency. If you are immediately completing the job after scheduling, use Run() instead.
     
  3. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    6,005
    You're doing a lot of ConvertToFloat3() on the main thread. If potentialCells is a Vector3 array you could simply use ReinterpretCast.

    Check that Burst Compilation is enabled.
    Remove the if (cost >= lowerCost) from the for loop. Do a second loop afterwards that does the compare and update value over a costs array. It is likely to be faster because Burst can vectorize the first loop by removing if.
     
  4. DejaMooGames

    DejaMooGames

    Joined:
    Apr 1, 2019
    Posts:
    108
    1. A tick is a 100 nanoseconds see here for more info.
    2. About half the processing time is the setup. So just doing the work on the main thread is likely the best bet here.
    3. Yes I am.
    4. I am having the job complete immediately. I will try using Run and compare them.
    Thanks for responding.

    1. I have never used ReinterpretCast before I will give that a shot.
    2. Burst compilation is enabled.
    3. I will try this and compare.

    Thanks for responding.
     
  5. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    398
    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using System.Diagnostics;
    4. using Unity.Burst;
    5. using Unity.Collections;
    6. using Unity.Jobs;
    7. using Unity.Mathematics;
    8. using UnityEngine;
    9. using Debug = UnityEngine.Debug;
    10.  
    11.  
    12. public class BurstSmallestDistance : MonoBehaviour
    13. {
    14.     public int
    15.         cellSize = 1,
    16.         iterations = 64,
    17.         batchSize = 32;
    18.  
    19.     private void Start()
    20.     {
    21.         var test = new BurstDistance();
    22.         BurstDistance.batchSize = batchSize;
    23.  
    24.         List<BurstDistance.PotentialCell> somePotentialCells = new(cellSize);
    25.         BurstDistance.PotentialCell[] somePotentialCellsArr = new BurstDistance.PotentialCell[cellSize];
    26.  
    27.         for (int i = 0; i < cellSize; i++)
    28.         {
    29.             var rand = UnityEngine.Random.insideUnitSphere * 100f;
    30.             somePotentialCells.Add(new BurstDistance.PotentialCell(rand));
    31.             somePotentialCellsArr[i] = new BurstDistance.PotentialCell(rand);
    32.         }
    33.  
    34.         Debug.Log("\nCells: " + cellSize);
    35.  
    36.         BurstDistance.PotentialCell cell;
    37.         double us;
    38.  
    39.         test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    40.         test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    41.         test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    42.  
    43.  
    44.         Stopwatch sw = new Stopwatch();
    45.         sw.Restart();
    46.         sw.Stop();
    47.  
    48.         sw.Restart();
    49.         for (int i = 0; i < iterations; i++)
    50.             test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    51.         sw.Stop();
    52.  
    53.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    54.         Debug.Log("List\nCells ForEach [Avg Ticks]: " + us / iterations);
    55.  
    56.  
    57.         sw.Restart();
    58.         for (int i = 0; i < iterations; i++)
    59.             test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    60.         sw.Stop();
    61.  
    62.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    63.         Debug.Log("\nCells SThread [Avg Ticks]: " + us / iterations);
    64.  
    65.  
    66.         sw.Restart();
    67.         for (int i = 0; i < iterations; i++)
    68.             test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCells, out cell);
    69.         sw.Stop();
    70.  
    71.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    72.         Debug.Log("\nCells MThread [Avg Ticks]: " + us / iterations);
    73.  
    74.  
    75.  
    76.         sw.Restart();
    77.         for (int i = 0; i < iterations; i++)
    78.             test.GetBestConnectingCell(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
    79.         sw.Stop();
    80.  
    81.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    82.         Debug.Log("Array\nCells ForEach [Avg Ticks]: " + us / iterations);
    83.  
    84.  
    85.         sw.Restart();
    86.         for (int i = 0; i < iterations; i++)
    87.             test.GetBestConnectingCellSThread(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
    88.         sw.Stop();
    89.  
    90.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    91.         Debug.Log("\nCells SThread [Avg Ticks]: " + us / iterations);
    92.  
    93.  
    94.         sw.Restart();
    95.         for (int i = 0; i < iterations; i++)
    96.             test.GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), somePotentialCellsArr, out cell);
    97.         sw.Stop();
    98.  
    99.         us = 1000.0 * 1000 * sw.ElapsedTicks / Stopwatch.Frequency;
    100.         Debug.Log("\nCells MThread [Avg Ticks]: " + us / iterations);
    101.  
    102.     }
    103. }
    104.  
    105. public class BurstDistance
    106. {
    107.     public static int batchSize = 8;
    108.  
    109.     public struct PotentialCell
    110.     {
    111.         public float3 coordinate;
    112.  
    113.         public PotentialCell(float3 coord)
    114.         {
    115.             coordinate = coord;
    116.         }
    117.     }
    118.  
    119.     [BurstCompile(CompileSynchronously = true)]
    120.     public void GetBestConnectingCell(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    121.     {
    122.         cell = potentialCells[0];
    123.         var lowestCost = float.MaxValue;
    124.  
    125.         foreach (PotentialCell item in potentialCells)
    126.         {
    127.             float cost = math.distance(origin, item.coordinate) + math.distance(terminus, item.coordinate);
    128.  
    129.             if (cost < lowestCost)
    130.             {
    131.                 cell = item;
    132.                 lowestCost = cost;
    133.             }
    134.         }
    135.     }
    136.  
    137.     [BurstCompile(CompileSynchronously = true)]
    138.     public void GetBestConnectingCell(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    139.     {
    140.         cell = potentialCells[0];
    141.         var lowestCost = float.MaxValue;
    142.  
    143.         for (int i = 0; i < potentialCells.Length; i++)
    144.         {
    145.             float cost = math.distance(origin, potentialCells[i].coordinate) + math.distance(terminus, potentialCells[i].coordinate);
    146.  
    147.             if (cost < lowestCost)
    148.             {
    149.                 cell = potentialCells[i];
    150.                 lowestCost = cost;
    151.             }
    152.  
    153.         }
    154.     }
    155.  
    156.  
    157.     [BurstCompile(CompileSynchronously = true)]
    158.     public void GetBestConnectingCellSThread(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    159.     {
    160.         //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'
    161.  
    162.         using var input = new NativeArray<PotentialCell>(potentialCells.ToArray(), Allocator.TempJob);
    163.         using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
    164.         using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
    165.      
    166.         //Build the job
    167.         var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
    168.         {
    169.             StartPoint = origin,
    170.             EndPoint = terminus,
    171.             Input = input,
    172.             lowestCost = lowestCost,
    173.             lowestIndex = lowestIndex
    174.         };
    175.  
    176.         //Schedule the job single threaded
    177.         evaluateConnectionOriginJob.Run(potentialCells.Count);
    178.  
    179.         //Return result
    180.         cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    181.     }
    182.  
    183.  
    184.     [BurstCompile(CompileSynchronously = true)]
    185.     public void GetBestConnectingCellSThread(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    186.     {
    187.         //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'
    188.  
    189.         using var input = new NativeArray<PotentialCell>(potentialCells, Allocator.TempJob);
    190.         using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
    191.         using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
    192.  
    193.         //Build the job
    194.         var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
    195.         {
    196.             StartPoint = origin,
    197.             EndPoint = terminus,
    198.             Input = input,
    199.             lowestCost = lowestCost,
    200.             lowestIndex = lowestIndex
    201.         };
    202.  
    203.         //Schedule the job single threaded
    204.         evaluateConnectionOriginJob.Run(potentialCells.Length);
    205.  
    206.         //Return result
    207.         cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    208.     }
    209.  
    210.  
    211.  
    212.  
    213.  
    214.     [BurstCompile(CompileSynchronously = true)]
    215.     public void GetBestConnectingCellMThread(float3 origin, float3 terminus, in List<PotentialCell> potentialCells, out PotentialCell cell)
    216.     {
    217.         //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'
    218.  
    219.         using var input = new NativeArray<PotentialCell>(potentialCells.ToArray(), Allocator.TempJob);
    220.         using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
    221.         using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
    222.      
    223.         //Build the job
    224.         var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
    225.         {
    226.             StartPoint = origin,
    227.             EndPoint = terminus,
    228.             Input = input,
    229.             lowestCost = lowestCost,
    230.             lowestIndex = lowestIndex
    231.         };
    232.      
    233.         // Schedule the job multi-threaded
    234.         evaluateConnectionOriginJob.Schedule(potentialCells.Count, batchSize).Complete();
    235.  
    236.         //Return result
    237.         cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    238.     }
    239.  
    240.  
    241.  
    242.     [BurstCompile(CompileSynchronously = true)]
    243.     public void GetBestConnectingCellMThread(float3 origin, float3 terminus, in PotentialCell[] potentialCells, out PotentialCell cell)
    244.     {
    245.         //Setup temp data for the jobs to process. Will be disposed automatically due to 'using'
    246.  
    247.         using var input = new NativeArray<PotentialCell>(potentialCells, Allocator.TempJob);
    248.         using var lowestCost = new NativeReference<float>(float.MaxValue, Allocator.TempJob);
    249.         using var lowestIndex = new NativeReference<int>(int.MaxValue, Allocator.TempJob);
    250.  
    251.         //Build the job
    252.         var evaluateConnectionOriginJob = new EvaluateConnectionOriginJob
    253.         {
    254.             StartPoint = origin,
    255.             EndPoint = terminus,
    256.             Input = input,
    257.             lowestCost = lowestCost,
    258.             lowestIndex = lowestIndex
    259.         };
    260.  
    261.         // Schedule the job multi-threaded
    262.         evaluateConnectionOriginJob.Schedule(potentialCells.Length, batchSize).Complete();
    263.  
    264.         //Return result
    265.         cell = input[evaluateConnectionOriginJob.lowestIndex.Value];
    266.     }
    267.  
    268.  
    269.  
    270.     [BurstCompile(CompileSynchronously = true)]
    271.     public struct EvaluateConnectionOriginJob : IJobParallelFor
    272.     {
    273.         [ReadOnly] public float3 StartPoint;
    274.         [ReadOnly] public float3 EndPoint;
    275.         [ReadOnly] public NativeArray<PotentialCell> Input;
    276.  
    277.         [NativeDisableParallelForRestriction]
    278.         public NativeReference<float> lowestCost;
    279.  
    280.         [NativeDisableParallelForRestriction]
    281.         public NativeReference<int> lowestIndex;
    282.  
    283.         public void Execute(int i)
    284.         {
    285.             var coord = Input[i].coordinate;
    286.  
    287.             float cost = math.distance(StartPoint, coord) + math.distance(EndPoint, coord);
    288.  
    289.             if (cost < lowestCost.Value)
    290.             {
    291.                 lowestCost.Value = cost;
    292.                 lowestIndex.Value = i;
    293.             }
    294.         }
    295.     }
    296. }
     
    Last edited: Oct 11, 2022
  6. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    398
    For a small job, anything below 128 items seems pointless, and multithread seems better than singlethread above 8192. I should delete the list part which converts to an array in the method but does show how expensive using lists is from the start if you are converting to an array to put in a job. At 65536 items (I like using base 2 numbers), the difference is about 25x faster.

    upload_2022-10-11_21-38-9.png upload_2022-10-11_21-38-32.png
    upload_2022-10-11_21-39-16.png upload_2022-10-11_21-40-22.png
     
    Last edited: Oct 11, 2022
  7. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    398
    If however (in normal terms) you were going to reference the main array multiple times in a frame, you could then take the 'using' statement outside of the method, which takes NativeArray<PotentialCell> as an input:

    using (var ptr = new NativeArray<PotentialCell>(somePotentialCellsArr, Allocator.TempJob))
    for (int i = 0; i < iterations; i++)
    GetBestConnectingCellMThread(float3.zero, math.float3(50, 50, 50), ptr, out cell);
    ...
    public static void GetBestConnectingCellMThread(float3 origin, float3 terminus, in NativeArray<PotentialCell> potentialCells, out PotentialCell cell)

    And get even more speed. It all depends on allocating memory and how often you access it before it's disposed. ('using' automatically disposes after the enclosed statement, so no need for Dispose).
    upload_2022-10-11_22-18-46.png
     
  8. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    398
    Getting the performance even higher, unwrapping the distance into a method:


    public static void GetDistanceADD(in float IN, in float3 A, in float3 B, out float OUT)
    {
    OUT = IN + math.sqrt((A.x - B.x) * (A.x - B.x) + (A.y - B.y) * (A.y - B.y) + (A.z - B.z) * (A.z - B.z));
    }

    and using it like:

    float cost = 0;
    GetDistanceADD(in cost, A, Input.coordinate, out cost);
    GetDistanceADD(in cost, B, Input.coordinate, out cost);

    Makes the normal method 2x faster, and the job 4x faster on a large loop, so the 46x faster than without a job. Anyway I like messing with code to make it faster, Burst seems extreme!

    upload_2022-10-12_7-33-4.png
     
  9. Trindenberg

    Trindenberg

    Joined:
    Dec 3, 2017
    Posts:
    398
    Forget about the in out part, was testing, no difference in performance from:


    public static float GetDistance(in float3 A, in float3 B)
    {
    return math.sqrt((A.x - B.x) * (A.x - B.x) + (A.y - B.y) * (A.y - B.y) + (A.z - B.z) * (A.z - B.z));
    }
    ...
    float cost = GetDistance(A, Input.coordinate) + GetDistance(B, Input.coordinate);