Search Unity

Heavily optimizing a* pathfinding algorithm inside a JobSystem.

Discussion in 'Entity Component System' started by MintTree117, Mar 3, 2020.

  1. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Yes thats what I was thinking about the MinHeap. And I am satisfied with the performance, I just wanted to see how far I could push it. I am moving on. Going to make this hierarchical and the ability to follow a graph in highly blocked areas.
     
  2. thebanjomatic

    thebanjomatic

    Joined:
    Nov 13, 2016
    Posts:
    36
    I think part of the reason my version was slower is that I'm implementing the ability to change the key and this happens a fair number of times. I went back to the drawing board and have adapted this to work on a NativeArray and with my existing interface: https://github.com/NikolajLeischner/csharp-binary-heap

    The notes in the readme mention a strategy for avoiding the decrease-key, and I have abstracted that to be part of the public DequeueMin() function using this ClosedList flag on the node to identify nodes that have already been visited.
    https://github.com/thebanjomatic/Te...ter/Assets/Scripts/FasterHeapOpenList.cs#L136

    This version actually does perform better than an array, so I'm happy leaving it at this for now.
    Timings are inconsistent with the past run as I'm on battery power at the moment, but relative differences are what's important anyway:

    Array: 6.48ms
    MinHeap1: 7.87ms
    MinHeap2: 4.93ms

    [Edit]
    Updated burst to 1.3.0-preview.4 and now
    Array: 7.2ms
    MinHeap1: 9.66ms
    MinHeap2: 1.7ms

    Pleasantly surprised by that last burst of performance.
     
    Last edited: Mar 7, 2020
  3. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Very nice! There is there is definitely a speed up in my algorithm when there are alot of obstacles, thanks. 100 units can search a 50x50 grid in no more that 5ms!
     
  4. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    What do you think of this? (Go To: PathfindingSystem)

    https://github.com/Tree37/Hierarchical-HPA-pathfinding-with-Unity-Jobs-and-BurstCompiler

    I converted everything to arrays and upgraded the system to be hierarchical.

    Right now its set up to be 900x900 nodes, with an abstract layer of clusters. Each cluster is 30x30 nodes, making it 30x30 clusters.

    Right now 400 units can traverse the entire graph in about 90ms. Do you think it can get any better? Did I do SIMD right?
     
    Last edited: Mar 21, 2020
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    You caught me a couple weeks too early! :eek:;)

    Well, your job doesn't have Burst, so I would start with that. Without Burst, you are not going to get any SIMD.
     
  6. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Hahaha I keep doing that. I keep taking off burst so I can do some logs to console. It has burst, I updated. I don't know how to check if my CPU had SIMD, because its an older one.

    I'm pretty satisfied with it now, but if you want to take a deeper look at it when you have time, feel free.
     
  7. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    SSE instructions have been around for quite a while. I'm pretty sure Unity requires SSE2 to even start the editor. If you go to the Burst Inspector with Enhanced Sidassembly checked and Auto for the dropdown and look for the darker blue instructions, if they have a 'v' at the start of them, that means you have AVX. Otherwise, you have SSE.

    You can tell if an instruction is simd because it will either start with 'p', start with 'vp' or end with 'ps' or 'pd'.
     
    MintTree117 likes this.
  8. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Oh wow, I'm surprised there are quite a few of those instructions. But I have no idea which parts of my code they refer to.

    I have alot of lists of mov instructions, preceeded by a single movups. I don't know how they relate.
     
  9. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Burst 1.3 will show your the C# source where different parts come from. It jumps around a bit because Burst likes to rearrange things. I personally use preview 3 because later versions crash on my machine, but use whatever version you get working.
     
  10. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Is this what you mean by using SoA?

    Code (CSharp):
    1.             int4 parentValue = new int4( -1 , -1 , -1 , -1 );
    2.             int4 gCost = new int4( int.MaxValue , int.MaxValue , int.MaxValue , int.MaxValue );
    3.             bool4 falseVector = new bool4( false , false , false , false );
    4.             int i = 0;
    5.             int vectorLoopLength = graphClusterSize * graphClusterSize - 4;
    6.             for ( ; i < vectorLoopLength; i += 4 )
    7.             {
    8.                 int4 loopIndex = new int4(
    9.                     i ,
    10.                     i + 1 ,
    11.                     i + 2 ,
    12.                     i + 3 );
    13.                 int4 localCol = new int4(
    14.                     loopIndex.x / graphClusterSize ,
    15.                     loopIndex.y / graphClusterSize ,
    16.                     loopIndex.z / graphClusterSize ,
    17.                     loopIndex.w / graphClusterSize );
    18.                 int4 localRow = new int4(
    19.                     loopIndex.x % graphClusterSize ,
    20.                     loopIndex.y % graphClusterSize ,
    21.                     loopIndex.z % graphClusterSize ,
    22.                     loopIndex.w % graphClusterSize );
    23.                 int4 localIndex = new int4(
    24.                     localCol.x + localRow.x * graphClusterSize ,
    25.                     localCol.y + localRow.y * graphClusterSize ,
    26.                     localCol.z + localRow.z * graphClusterSize ,
    27.                     localCol.w + localRow.w * graphClusterSize );
    28.                 int4 graphCol = new int4(
    29.                     localCol.x + graphClusterSize * clusterPosition.x ,
    30.                     localCol.y + graphClusterSize * clusterPosition.x ,
    31.                     localCol.z + graphClusterSize * clusterPosition.x ,
    32.                     localCol.w + graphClusterSize * clusterPosition.x );
    33.                 int4 graphRow = new int4(
    34.                     localRow.x + graphClusterSize * clusterPosition.y ,
    35.                     localRow.y + graphClusterSize * clusterPosition.y ,
    36.                     localRow.z + graphClusterSize * clusterPosition.y ,
    37.                     localRow.w + graphClusterSize * clusterPosition.y );
    38.                 int4 graphArrayIndex = new int4(
    39.                     graphCol.x + graphRow.x * graphCellLength ,
    40.                     graphCol.y + graphRow.y * graphCellLength ,
    41.                     graphCol.z + graphRow.z * graphCellLength ,
    42.                     graphCol.w + graphRow.w * graphCellLength );
    43.  
    44.                 localIndexArray[ localIndex.x ] = localIndex.x;
    45.                 localIndexArray[ localIndex.y ] = localIndex.y;
    46.                 localIndexArray[ localIndex.z ] = localIndex.z;
    47.                 localIndexArray[ localIndex.w ] = localIndex.w;
    48.  
    49.                 graphIndexArray[ localIndex.x ] = graphArrayIndex.x;
    50.                 graphIndexArray[ localIndex.y ] = graphArrayIndex.y;
    51.                 graphIndexArray[ localIndex.z ] = graphArrayIndex.z;
    52.                 graphIndexArray[ localIndex.w ] = graphArrayIndex.w;
    53.  
    54.                 parentArray[ localIndex.x ] = parentValue.x;
    55.                 parentArray[ localIndex.y ] = parentValue.y;
    56.                 parentArray[ localIndex.z ] = parentValue.z;
    57.                 parentArray[ localIndex.w ] = parentValue.w;
    58.  
    59.                 gCostArray[ localIndex.x ] = gCost.x;
    60.                 gCostArray[ localIndex.y ] = gCost.y;
    61.                 gCostArray[ localIndex.z ] = gCost.z;
    62.                 gCostArray[ localIndex.w ] = gCost.w;
    63.  
    64.                 openArray[ localIndex.x ] = falseVector.x;
    65.                 openArray[ localIndex.y ] = falseVector.y;
    66.                 openArray[ localIndex.z ] = falseVector.z;
    67.                 openArray[ localIndex.w ] = falseVector.w;
    68.  
    69.                 closedArray[ localIndex.x ] = falseVector.x;
    70.                 closedArray[ localIndex.y ] = falseVector.y;
    71.                 closedArray[ localIndex.z ] = falseVector.z;
    72.                 closedArray[ localIndex.w ] = falseVector.w;
    73.             }
    74.             for ( ; i < graphClusterSize * graphClusterSize; i++ )
    75.             {
    76.                 int localCol = i / graphClusterSize;
    77.                 int localRow = i % graphClusterSize;
    78.                 int localIndex = localCol + localRow * graphClusterSize;
    79.                 int graphCol = localCol + graphClusterSize * clusterPosition.x;
    80.                 int graphRow = localRow + graphClusterSize * clusterPosition.y;
    81.                 int graphArrayIndex = graphCol + graphRow * graphCellLength;
    82.  
    83.                 localIndexArray[ localIndex ] = localIndex;
    84.                 graphIndexArray[ localIndex ] = graphArrayIndex;
    85.                 parentArray[ localIndex ] = -1;
    86.                 gCostArray[ localIndex ] = int.MaxValue;
    87.                 openArray[ localIndex ] = false;
    88.                 closedArray[ localIndex ] = false;
    89.             }
     
  11. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    SoA is the way you define your data layout, your struct definition. Code that exists in methods cannot influence whether something is SoA or AoS. With that said, the code appears to be processing over an SoA.
     
  12. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Did I u roll that loop correctly? It's not generating and SIMD instructions.
     
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    No you did not.

    When working with the int4, float4, and bool4 types, use the operator overloads directly instead of doing each math operation for each component.
     
  14. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    How can I assign multiple values to an array in one instruction? Should the arrays be vector types? You said this is non-trivial I'm still trying to learn it.
     
  15. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    NativeArray has ReinterpretLoad and ReinterpretStore methods. Don't forget to have your scalar loop to clean off the last 1, 2, or 3 values.
     
  16. nijnstein

    nijnstein

    Joined:
    Feb 6, 2021
    Posts:
    78
    the most important thing is to keep all data tidy and close to eachother when needed.

    - use indices into your data and not the data
    - store data used to determine if a node is passable close to its neighbours, make sure you can perform the checks fast
    - premap neighbours indices for all tiles, make them accessable by index * movement_direction_count (only if getting neighbours is kinda hard) do this once in your program
    - dont allocate, find out maximum path length for a map and restrict your frontier list to that, no further allocations should be needed
    - store working data in a tidy format, i use an int4 and put distance, previous/from, h and frontier state in that int4, i didtn profile but on average it shaved of lot of the processing time (i think the cache is used better), it s a lot faster then using seperate arrays
    - use a min heap, integrate it into your algorithm closely using integers only

    on 112000 hextiles in worst case with 2.8 ms on a single thread, average case is about a 0.1 ms (in editor, with dots/burst, single thread)

    the code is actually quite simple (mostly based on catlike codings hex pathfinding tutorial), it depends alot on how your map is organized, make sure all neighbouring nodes are close in memory

    Code (CSharp):
    1.  
    2. [BurstCompile]
    3. public unsafe struct calculate_path : IJob
    4. {
    5.     const int min_movement_cost = 2;
    6.     /// <summary>
    7.     /// int4 like:
    8.     /// x = distance
    9.     /// y = from
    10.     /// z = heuristic
    11.     /// w = state : 0 = not reached, 1 = in frontier, 2 = out of frontier
    12.     /// >>!! by using int4 instead of 4 arrays data is more likely to be on same cache line
    13.     ///
    14.     /// </summary>
    15.     public NativeArray<int4> data;
    16.     public NativeList<int> frontier;
    17.     [NativeDisableUnsafePtrRestriction]
    18.     public hextile* start;
    19.     [NativeDisableUnsafePtrRestriction]
    20.     public hextile* end;
    21.  
    22.     [MethodImpl(MethodImplOptions.AggressiveInlining)]
    23.     int h(int i)
    24.     {
    25.         // min_movement_cost acts as the heuristic in the search
    26.         // a larger value will result in less tiles searched but possibly a less optimal solution
    27.         // for an always optimal solution the heuristic should be <= min_movement_cost
    28.         return data[i].x + data[i].z * min_movement_cost;
    29.     }
    30.  
    31.     // minheap method
    32.     void Enqueue(int index)
    33.     {
    34.         frontier.Add(index);
    35.         var child = frontier.Length - 1;
    36.         var parent = (child - 1) / 2;
    37.         while (child > 0 && h(frontier[parent]) > h(frontier[child]))
    38.         {
    39.             var swap = frontier[parent];
    40.             frontier[parent] = frontier[child];
    41.             frontier[child] = swap;
    42.             child = parent;
    43.             parent = (child - 1) / 2;
    44.         }
    45.     }
    46.     int Dequeue()
    47.     {
    48.         int r = frontier[0];
    49.         // Remove the first item if neighbour will only be 0 or 1 items left after doing so.
    50.         if (frontier.Length <= 2)
    51.         {
    52.             frontier.RemoveAt(0);
    53.         }
    54.         else
    55.         {
    56.             // Remove the first item and move the last item to the front.
    57.             frontier.RemoveAtSwapBack(0);
    58.             // heapify
    59.             MinHeapifyDown(0);
    60.         }
    61.         return r;
    62.     }
    63.     void MinHeapifyDown(int current)
    64.     {
    65.         int l;
    66.         while ((l = 2 * current + 1) < frontier.Length)
    67.         {
    68.             // identify smallest of parent and both children
    69.             var min = h(frontier[l]) < h(frontier[current]) ? l : current;
    70.          
    71.             var r = l + 1;
    72.             if (r < frontier.Length && h(frontier[r]) < h(frontier[min])) min = r;
    73.             // if nothing to swap, ... then the tree is a heap
    74.             if (current == min) break;
    75.             // swap smallest value up
    76.             int swap = frontier[current];
    77.             frontier[current] = frontier[min];
    78.             frontier[min] = swap;
    79.             // follow swapped value down and repeat until the tree is a heap
    80.             current = min;
    81.         }
    82.     }
    83.     public void Execute()
    84.     {
    85.         hextile* current = null;
    86.         hexmapdata* map = start->map;
    87.         int searched_tile_count = 0, elim = 0;
    88.         // clear array with initial data
    89.         // distance  would not have to be cleared as we only check what we know we reached
    90.         // also could maintain counter on frontier state and avoid clearing that
    91.         // however.. we always need to clear from and h... so there will always be a big memory clear
    92.         int4 clearer = new int4(int.MaxValue, -1, 0, 0);
    93.         UnsafeUtility.MemCpyReplicate(data.GetUnsafePtr(), &clearer, sizeof(int4), map->tile_count);
    94.         // starting tile = distance 0, index, h = 0, w = 1
    95.         data[start->index] = new int4(0, start->index, 0, 1);
    96.         Enqueue(start->index);
    97.         // keep running while we have some frontline of search
    98.         while (frontier.Length > 0)
    99.         {
    100.             // take out of frontier
    101.             current = &map->tiles[Dequeue()];
    102.             if (current == end)
    103.             {
    104.                 // shortest path reached
    105.                 break;
    106.             }
    107.             // mark as taken out of frontier (w = 2)
    108.             int4 cd = data[current->index];
    109.             data[current->index] = new int4(cd.x, cd.y, cd.z, 2);
    110.             // handle each direction
    111.             for (hexdirection d = hexdirection.NE; d <= hexdirection.NW; d++)
    112.             {
    113.                 searched_tile_count++;
    114.                 hextile* neighbour = map->GetNeighbour(current->index, d);
    115.                 // determine if neighbour is reachable
    116.                 if (neighbour == null)
    117.                 {
    118.                     continue;
    119.                 }
    120.                 int4 neighbour_data = data[neighbour->index];
    121.                 if(neighbour_data.w == 2)
    122.                 {
    123.                     // already know shortest path to this neighbour\
    124.                     elim++;
    125.                     continue;
    126.                 }
    127.                 if (neighbour->IsUnderwater())
    128.                 {
    129.                     continue;
    130.                 }
    131.                 hextileedgetype edge = map->GetEdgeType(current, neighbour);
    132.                 if (edge == hextileedgetype.Cliff)
    133.                 {
    134.                     continue;
    135.                 }
    136.                 // cannot move through walls
    137.                 if (current->walls != neighbour->walls)
    138.                 {
    139.                     continue;
    140.                 }
    141.                 // apply any cost of moving to distance (base = 10)
    142.                 int current_distance = data[current->index].x;
    143.                 if (current->HasRoadThroughEdge(d))
    144.                 {
    145.                     current_distance += min_movement_cost;
    146.                 }
    147.                 else
    148.                 {
    149.                     current_distance += edge == hextileedgetype.Flat ? 5 : 10;
    150.                 }
    151.                 // if (neighbour_distance == int.MaxValue)
    152.                 if(neighbour_data.w == 0)
    153.                 {
    154.                     // tile not yet reached (never in frontier)
    155.                     data[neighbour->index] = new int4(
    156.                         current_distance,
    157.                         current->index,
    158.                         neighbour->coordinates.DistanceTo(end->coordinates),
    159.                         1);
    160.                     Enqueue(neighbour->index);
    161.                 }
    162.                 else if (current_distance < neighbour_data.x)
    163.                 {
    164.                     // update distance to shortest path
    165.                     data[neighbour->index] = new int4(
    166.                         current_distance,
    167.                         current->index,
    168.                         neighbour_data.z,
    169.                         neighbour_data.w);
    170.                     MinHeapifyDown(neighbour->index);
    171.                 }
    172.             }
    173.         }
    174.     }
    175. }
    176.  
    177.  

    you could then get the path if needed via:

    Code (CSharp):
    1.             int cost = data[end->index].x;
    2.             if (cost != int.MaxValue)
    3.             {
    4.                 // reuse frontier list to reconstruct the path
    5.                 frontier.Clear();
    6.                 frontier.Add(end->index);
    7.  
    8.                 int k = data[end->index].y;
    9.                 while (k != start->index)
    10.                 {
    11.                     frontier.Add(k);
    12.                     k = data[k].y;
    13.                 }
    14.  
    15.                 path.cost = cost;
    16.                 path.length = frontier.Length + 1;
    17.                 path.Allocate(allocator);
    18.                 path.path[0] = start->index;
    19.                 for (k = frontier.Length - 1; k >= 0;)
    20.                 {
    21.                     path.path[path.length - k - 1] = frontier[k];
    22.                     k--;
    23.                 }
    24.             }
    after which path.path[] is an array of integer indices into the map data
     
    MintTree117 likes this.
  17. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    sadly link doesn't work anymore
     
  18. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Yeah sorry this was quite old so all the syntax was out of date.
     
  19. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    Do you plan to share it again in the future?
     
  20. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Yea sure let me take a look I probably have it somewhere on my pc.
     
    Krajca likes this.
  21. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    https://github.com/Tree37/TempPathfinding

    I found some files here's a GitHub link. Its pretty convoluted and my current path-finding system is different but maybe you can find some useful stuff in there. I wrote this system alongside my learning of a* algorithm and path-finding in general + my naive understanding of DOTS at the time. Properly implemented the code should be cut in half lol.
     
  22. JustTiredOfEverything

    JustTiredOfEverything

    Joined:
    Aug 4, 2022
    Posts:
    80
    Would you happen to have this still? I wanted to try and apply all the optimizations from this thread here to my own 3d astar jobs pathfinder. Being able to take a look at the projects code would be a huge help.