Search Unity

Heavily optimizing a* pathfinding algorithm inside a JobSystem.

Discussion in 'Entity Component System' started by MintTree117, Mar 3, 2020.

  1. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Hello all.

    I have just finished implementing an algorithm using a Burst-Enabled parallel job. Its quite fast, but not fast enough. Currently 50 units can traverse an empty 100X100 grid in about 16-21 ms, in one frame. I do distribute these requests along multiple frames.

    However, there are some key parts of my algorithm which I know are bottlenecks.

    1) Inside my job, I currently do a deep copy of a 1-d NativeArray representing the map so I can work with a copy and write to it, with each index being a node with pos, costs, indexes of parent nodes, etc.

    I don't how how to represent the nodes without doing the deep copy. It seems to me I would have to either create a bunch of arrays for all the fields of each node, or some other expensive operation anyway.
    Basically, I am asking here how I can very cheaply get a representation of my grid/nodes to my jobs, which all run in parallel, so they cannot write to the same representation.

    2) I am still using NativeLists for my open/closed sets.

    I want to implement a priority queue, however I split my grid into 10x10 sections. I don't think the pq will fare better than my lists with such a small number of nodes. Please correct me if I am wrong.


    I am more than happy to post a link to the source code, but its rather long, so let me know.

    Also, any other tips/suggestions regarding anything I can do to increase the performance of my job, or in general.

    https://github.com/Tree37/A-Pathfinding-Jobs

    Thanks.
     
    Last edited: Mar 3, 2020
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    1) Only do a deep copy of the data you actually need to mutate.
    2) Hard to tell without seeing code.

    When trying to optimize a job like this, it can be useful to break up the job into a chain of small jobs and profile each piece. But more importantly, smaller jobs make the Burst inspector a fair bit more readable, so you can pick a job and really drill down performance. If you need help optimizing a particular job using the Burst inspector, post code on GitHub and I can dig into it.

    Some things I've learned:
    1) Iterating by columns is terrible.
    2) Branches are terrible. Avoid && and || and try to express such statements differently to reduce the number of branches Burst generates.
    3) Some type of shuffling instruction ever other instruction usually means something is wrong.
    4) If you are copying a struct to a new location which is not its final location, you can probably copy an index to the struct's original array instead.
     
    MintTree117 likes this.
  3. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,779
    How do you copy array?
    Do you really need copy whole array?
    What is the purpose of copying, to be able work on it.
    Multiple jobs can read from same array at the same time.
     
    MintTree117 likes this.
  4. palex-nx

    palex-nx

    Joined:
    Jul 23, 2018
    Posts:
    1,748
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    I cringe every time I see someone mention atomics as being non-locking. Atomics are not that much faster than mutexes, primarily because mutexes are usually just a pair of atomic instructions. Atomics do absolutely horrible things to cache lines and instruction execution pipes and should be used sparsely if not avoided altogether.
     
  6. palex-nx

    palex-nx

    Joined:
    Jul 23, 2018
    Posts:
    1,748
    Well, what would you suggest then?
     
  7. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
  8. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    I copy my array like this:
    Code (CSharp):
    1.             NativeArray<PathNode> pathNodeArrayCopy = new NativeArray<PathNode>( pathNodeArray , Allocator.Temp );
    I don't need to copy the whole array, but if I didn't I would need to make a bunch of others arrays to copy data anyway.

    The purpose of copying it is so I have data of the grid positions of nodes, their h,g,f costs, and data wether a node is walkable or not.
     
  9. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    1) What do you mean by columns?
    2) I do have branches but they are part of the a* algorithm.
    3) What do you mean by shuffling?
    4) How can I do this?
     
  10. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Rework your code so that two threads don't have to write to the same data address.

    So you have 34 bytes per node, and only 16 of those bytes you actually have to modify. So that's a solid 50% of your memcpy performance right there, possibly even higher. In addition, I don't think you even need to copy the data you write to as much as you could just allocate new empty arrays for them and populate them with data. SoA could help you out a lot here.

    These were general tips with Burst not specific to any particular algorithm. Though I will go through them anyway:
    1) If you have a 2D array represented as a 1D array with the index accessed as col * rowWidth + row, and you wanted to iterate using two for loops, you want to have the outer for loop increment columns and the inner loop increment rows. I didn't see any obvious issue with this in your code, but I might have missed a spot.
    2) LineOfSight could definitely use some improvement in this regard.
    3) Shufflings are simd instructions that reorder the elements in a simd register. I don't think any of your code is using simd at the moment but I would have to dump it into Burst.
    4) You are already kinda doing this with your open and closed lists.

    Another thing you may want to try is instead of using two lists for your open and closed list, just use a bool array specifying whether something is open or closed. The reason for this is you are doing lots of Contains() calls that you wouldn't need to do if it was a bool check at the index instead. There is a tradeoff of you having to sift out the closed list nodes in GetLowestCostFNodeIndex, but I suspect that tradeoff will be a win.

    Try those first and then if you still need more performance we can dive into the asm, although I will need to know your target platform by that point.
     
    Enzi and MintTree117 like this.
  11. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Thank you very much for your help. I am implementing now.
     
  12. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Hi Latios. So I implemented your changes, and updated the Git

    No more closed list. Only data which I really need to mutate is copied. I moved the NativeLists that were in the update loop to OnStartRunning(), and I just Clear them every update.

    Right now, without the path smoothing:

    On a 10x10: 1 unit = 0.6ms , 20 units = 2-4 ms
    On a 50x50: 1 unit = ~1ms , 20 units = 3-5ms

    I know that with a modern CPU these time would be cut way more than half, which is pretty good. But, I really want to try to get it even lower, lol.

    I am going to watch the Unite talk about SIMD, and try to use it here if possible at all. I know what it is, but I have zero experience with it.

    If possible I am trying to get it to 0.1ms time.

    I am targeting Windows platform right now. These timings are all in editor. AMD A10 8700, 8 gb DDR3. Job debugger on.
     
    Last edited: Mar 4, 2020
  13. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    I have eliminated most of my copies and it helped, thank you.
     
    Antypodish likes this.
  14. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    These numbers mean nothing if you don't know how fast the original was with the same grid size and unit count.

    Besides measuring the existing improvements so we can tell how bandwidth-bound vs CPU bound we are, the next thing worth doing is commenting out code from the end of the job and running the job. The job results become meaningless, but the profiling time differences with or without the code commented out can give us an idea which blocks are the most expensive. Pick an expensive block and I can walk you through the X86 assembly for that part.
     
    Enzi likes this.
  15. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Ok. Here are the timings starting with the first region, then adding each of the other 3.
    Note the gridNodeArray is always 100x100, its just for the tests I clicked within 10x10 and 50x50

    DataSetup:
    - 1 unit 10x10 is 0.3-0.7ms
    - 1 unit 50x50 is same
    - 20 units 10x10 is ~2.6ms
    - 20 units 50x50 is same

    DataSetup + FindPath
    - 0.3 - 0.7ms
    - 0.3 - 0.7ms
    - 2 - 5ms
    - 18 - 22ms

    DataSetup + FindPath + TracePath
    - 0.4 - 0.7ms
    - 3 - 3.4 ms
    - 20-30ms
    - 20-30ms

    Also, is SoA service-oriented architecture?
     
  16. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,685
    Structure Of Arrays
     
    MintTree117 likes this.
  17. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Ok. It seems there are a couple of things you are struggling with. Its going to be a really slow process for me to help you if I can't run the benchmarks myself. If you can set things up such that your single system in OnUpdate instantiates the entities and grid, runs the benchmark for a couple different combinations, and then tears things down again, I can drop that into my test suite and really dig in.

    After all, optimization is really a game of educated guess and check. If you aren't doing the checking, you aren't optimizing.
     
    andrew-lukasik, MintTree117 and Enzi like this.
  18. andrew-lukasik

    andrew-lukasik

    Joined:
    Jan 31, 2013
    Posts:
    249
    Hi. Small tip from me here - optimization is not all about IJobs and Burst. In my limited experience it's important to visualize your input&output data and simply measure how many nodes your given implementation tests while solving for path.
    For example, compare results of these two implementations:
    A https://i.imgur.com/np3guuA.gif
    B https://i.imgur.com/vW5bVeQ.gif
    I was stuck there writing memory optimizations (/diminishing returns) but time investment put in visualizing all this this helped me spot what must be really wrong in A. Thanks to that B is 10 times faster - just by fixing heuristic function, tweaking it's arguments and also changing input data (under reported factor imo) a bit (less slopes). All this thanks to humble visualization window (src).
     
    Last edited: Mar 4, 2020
  19. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Ok, I've uploaded a bench-marking class. Just import the regiment component and grid class as well (they are small).

    You can play with the Constants at the top of the Benchmarking class to change how many units to spawn, how many should find a path, and the grid size. As well, you can change the pathfinding desination. To run the benchmark in playmode, just press "Space".

    https://github.com/Tree37/A-Pathfinding-Jobs

    To test parts of the job, you can just comment out the different #regions in the job. The only thing you have to do to make it work is always return a NativeList, even an empty one. So just return a NativeList at the end.
     
    Last edited: Mar 4, 2020
  20. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    What changes did you make to your heuristic? What did you change about your input data?
     
  21. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Ok, wow!

    I simply removed a Calculate_HCost function from inside the neighbour check that I missed. It wasn't supposed to be there. Now 20 units only take 0.5ms to traverse the whole grid!

    I also edited my heuristic thanks to
    andrew-lukasik

    This is a massive boost. I don't know if I should even go through the trouble of writing a native container for a priority qeue anymore, but if you think it will make it run even faster I'll try.

    I was thinking about pre-computing the heuristic for every cell to every other cell and storing it, so that I don't have to do this every time I find a path. This will remove Calculate_HCost n-times where n is the number of nodes.

    However, I don't know if its a good idea to store a NativeArray inside a struct? When I pass in the pathnode array to the job, am I passing a pointer or a copy?
     
    Last edited: Mar 5, 2020
  22. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Not doing work that you don't need to do is always a good thing! I'm not a pathfinding expert, so I am really glad @andrew-lukasik was able to pitch in. I wouldn't bother with the priority queue for now. There's probably a bunch of other lower-hanging fruits to tackle first.

    This is absolutely an optimization worth doing! It is perfectly fine to pack several arrays into a struct. I do it all the time for my acceleration structures. That's what leads to SoA layouts, which I still think is going to help you a lot.

    Am I correlating these numbers correctly? It might just be me being optimistic, but I actually think this is attainable. Anyways, if you decide you want to keep pushing this algorithm further, keep pushing to that repo and I'll get a test harness going that will tabulate all the optimizations and help track future ones.
     
  23. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    56
    Your code suffers from very serious algorithmic scaling issues. https://github.com/Tree37/A-Pathfin...6f045dea464/ParrallelPathfinding.cs#L164-L178 and https://github.com/Tree37/A-Pathfin...6f045dea464/ParrallelPathfinding.cs#L405-L417 both perform linear searches which forces your algorithm to run in quadratic time. See https://en.wikipedia.org/wiki/Dijkstra's_algorithm#Running_time

    You need to use a priority queue to speed up the minimum F node extraction and you shouldn't search a list to determine if a node is open or closed. Use flags and toggle flags instead.
     
    Last edited: Mar 5, 2020
  24. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Yes, but I was being overly optimistic a little. Even if they can do that in 1 ms thats great, because then I can have potentially hundreds doing this every second.

    I have noticed earlier I did a benchmark just after I started my laptop, and the timings we an order of magnitude lower than my previous tests, but my cpu runs hot very quick and the performance goes down so I can't really get realistic results.

    However, the same thing seems to happen when I only run the benchmarker, without any rendering or any other systems. This confuses me.

    When I try to put a NativeArray into my GridPathNode struct, I get this error:

    GridPathNode used in NativeArray<GridPathNode> must be unmanaged (contain no managed types) and cannot itself be a native container type.

    Also, I am not sure where to start with SoA, as I'm unfamiliar.

    In addition, I have removed the membership-test loops for the open set by have each PathNode contain both a bool field for isOpen and an int field for its index in the openset. This yielded some small performance gains, but again, its not really accurate on my laptop.

    I still have to optimize the path smoothing.

    I will push to the repo within 20 or so min.
     
    Last edited: Mar 5, 2020
  25. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    56
    I have a private sample from a while ago that I worked on which can pathfind ~50 units on a ~100x100 grid in about 0.5 ms (total CPU time) on my 2018 MacBook Pro with 6 cores @ 2.9 ghz.

    upload_2020-3-4_19-27-18.png

    Scaling it up to 500x500 with around 50 units is about 15 ms total!

    upload_2020-3-4_19-28-45.png

    There are much faster methods to pathfinding that doesn't require searching a graph which is *even faster* by huge factors (a coworker of mine implemented it but I don't think it's public).
     
  26. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Very cool! Thanks for sharing.
     
  27. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Oh I just saw this, sorry. I actually just removed the membership tests with the flags as you said. The only thing I didn't do yet it the priority queue. The git is updated.

    The benchmarker gets around the same results as you posted. But when I add in the rendering and other systems the time goes up 1 order of magnitude, and I don't understand why.

    Could it be because my entities in my normal program have alot more data attached to them vs in the benchmarker? The find path jobs are exactly the same though.
     
    Last edited: Mar 5, 2020
  28. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Best way to find out is to look at the Profiler Timeline view and compare in game vs benchmark test. Share screenshots if you feel inclined.
     
  29. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Real:

    Screenshot (5).png

    Benchmark: Screenshot (8).png Screenshot (8).png

    I have no idea why this is happening. The only difference between the two are that in the normal version, there is a large cube for a ground, and each unity has 16 sub-units each that are being rendered. The pathfinding does not run on the sub-units.
     
  30. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    I always forget that you can actually send a profile capture to someone and they can open it even without the project. You may want to try that. I need to see the worker threads.
     
  31. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    The files are too large to post here, I pushed them to the repo.
     
  32. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    Hey, if you're interested I've been working on a priority queue for the past few days.
    https://github.com/sarkahn/nativecontainers/tree/master/Assets/NativeContainers

    It's based off this one I found but I modified it to be job-safe, generic, and to use an UnsafeList internally so it will automatically increase capacity as you enqueue nodes. If you're not familiar with writing job-safe native containers it can be annoying so feel free to use it or build off it if you want.

    Thanks for sharing your pathfinding journey!
     
    YakShaver_dc, Krajca and MintTree117 like this.
  33. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Hey thanks alot! I am trying to learn them, and I will use yours to learn as well. If I wanted to use a fixed array instead of a list, is the type void* or int*? I am not familiar with c#/unity pointers yet.

    It looks like I knew less than I thought I did, lol.
     
    Last edited: Mar 5, 2020
  34. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    Either should work I think, you can use UnsafeUtility to perform all your array operations on your pointer.

    Jackson Dunstan has a really nice repo and a detailed set of articles about custom nativecontainers, highly recommended.

    https://jacksondunstan.com/articles/4734
     
    MintTree117 likes this.
  35. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    If I use a priority queue, extraction is faster, but insertion is slower.
     
  36. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    451
    Some time ago I was implementing a flow field in DOTS and big optimisation I got from precalculating all valid neighbours. I have done it like this:
    I have created a MultiHashmap that has int as a key and int as values. A key represents all possible combinations of walkable neighbours as shown in the image below (black means unwalkable node):
    dirsencoding.png
    I values I store an index to the direction vector.
    Direction vector array has 8 int2 vectors [(-1,-1),(0,-1),(1,-1)...]
    So this is how the hash map looks like [key]<values>:
    [0]<0,1,2,3,4,5,6,7>
    [1]<1,2,3,4,5,6,7>
    ...
    [73]<1,2,4,5,7>
    ...
    Each map node has its code calculated during the generation phase. This way I do not have to check for walkability and borders of the map. Maybe this will give you some performance too.
    In a situation like this:
    corner.png
    Node gets code 14 instead of 10. 10 is not needed in my hashmap because there will be no node with this code.
     
  37. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    This is a nice optimization, didn't think of this. Thanks for sharing.

    I want to implement a flow field as well. Do you have any references I could learn from?
     
  38. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    451
    Thanks :). When I have started to look for optimisations I wanted to get rid of branching. Checking for walls and out of bounds area were most common.

    I have watched a talk about porting to jobs. I can't find it now maybe someone else will have a link to it.
    During the talk, the factory example from this repository was taken as base. It was jobified. I used this jobified version as my base and move it to DOTS. Additionally, I have tried to optimise it as much I could. Probably something more can be done. For map 200x200 and about 30% are walls it takes 2-3 ms to generate flowfield. Unfortunately, the cost field can't be done in parallel, and it is the most expensive part.
    Here is my cost field job:
    Code (CSharp):
    1. private struct GenerateCostField : IJob
    2.         {
    3.             [ReadOnly] public MapData map;
    4.             public NativeArray<float> stepField;
    5.             public NativeList<int2> openSet;
    6.             public NativeList<int2> nextSet;
    7.             [ReadOnly] public NativeMultiHashMap<int, int> directions;
    8.             [ReadOnly] public NativeArray<int2> moveDirs;
    9.             [ReadOnly] public NativeArray<float> moveDirsLength;
    10.             [ReadOnly] public int mapWidth;
    11.             public void Execute()
    12.             {
    13.                 while (openSet.Length > 0)
    14.                 {
    15.                     // repeat until out of tiles
    16.                     for (var j = 0; j < openSet.Length; j++)
    17.                     {
    18.                         var tile = openSet[j];
    19.                         var x = tile.x;
    20.                         var y = tile.y;
    21.                         var index = x + y * mapWidth;
    22.                         var existingCost = stepField[index];
    23.                         var dirs = directions.GetValuesForKey(map.tiles[index]
    24.                             .availableDirectionsCode);
    25.                         while (dirs.MoveNext())
    26.                         {
    27.                             var dirIndex = dirs.Current;
    28.                             var moveDir = moveDirs[dirIndex];
    29.                             var newTile = new int2(x + moveDir.x, y + moveDir.y);
    30.                             var mapTile = map[newTile];
    31.                             var moveCost = mapTile.moveCost + mapTile.agents * 0.1f;
    32.                             var newIndex = newTile.x + newTile.y * mapWidth;
    33.                             var cost = moveCost * moveDirsLength[dirIndex] + existingCost;
    34.                             if (!(cost < stepField[newIndex])) continue;
    35.                             stepField[newIndex] = cost;
    36.                             nextSet.Add(newTile);
    37.                         }
    38.                     }
    39.  
    40.                     var temp = openSet;
    41.                     openSet = nextSet;
    42.                     nextSet = temp;
    43.                     nextSet.Clear();
    44.                 }
    45.             }
    46.         }
    Probably I could use index to tiles array insted of int2 coordinate, and then in MultiHashMap instead of index to int2[] array of direction store an offset of an index. It is known during generation of this has map. Maybe I will try this approach. In my implementaion number of agents effects movement cost for a tile.
     
  39. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    I'm not seeing massive spikes like what you showed. Any spikes I do see, the jobs are running without Burst enabled. Remember that without SynchronousCompilation, the first few times the job runs it will run without Burst while the Burst Compiler compiles it.
     
  40. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Well... The slower one wasn't BurstCompiled.

    I couldn't stop laughing. Everything is less than 1 ms now, I think we have achieved the goal. Thank you again for your help! I'm still going to implement a priority queue and do a few other small optimizations and I'll post the final results whenever I finish if anyone is interested.
     
  41. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    56
    Insertion into a priority queue will always be slower compared to appending to the back of list, but insertion is not the critical operation in A*, extracting the minimum is. I would expect that using flags for the closed set and a priority queue for open set to speed up your search dramatically. If it isn't, it's likely an implementation specific issue where you're copying buffers excessively or your heuristic is too conservative and doesn't head toward the destination correctly (which causes the search frontier, aka the open set, to be much larger than it should be).
     
  42. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Yea I was mistaken. I am adding in the priority queue and already have flags.
     
  43. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    How do I pass data into the queue? For example, if I just wanted the queue to store a cost and an index, and return me the index when I dequeue?
     
  44. thebanjomatic

    thebanjomatic

    Joined:
    Nov 13, 2016
    Posts:
    36
    I just tried dropping this into my A* test project. I was curious because in my own test cases, the linear array search still outperforms my priority-queue based solution. I was hoping yours might have been faster than mine, but the results I'm seeing aren't adding up to my expectations though.

    1000x1000 grid, no obstacles. 50 path searches over 8 cores
    Search goes from 0,0 to 999,999 and the maximum size of the open list over the course of the search hits 395 items.

    Linear array search: 1.5ms
    My MinHeap with index map to accelerate updating priorites: 2.35ms
    NativeContainers.NativePriorityQueue: 45ms

    I'm still not sure why either approach should be slower than the linear case, but who knows.
     
  45. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    It's a generic container so just create a struct with whatever you want to store and use that or just a plain unmanaged value as the generic parameter. The node you get back when you dequeue stores your data in Node.value.

    I'm not sure tbh, like I said it's based off a C# version I found so Im not too familiar with it and I haven't had a chance to test it in an algorithm and profile it yet. Only thing I could suggest is to make sure you set the initial capacity high enough so it doesnt need to resize while working. And are you sure you ran it in burst? That seems unusually slow
     
  46. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Try a min-heap that instead of using a b-tree uses a larger number of leaves per node.
     
  47. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    56
    There is no way to know for sure without your code. Can you share it?
     
  48. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
  49. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    It is not a very fast priority queue. In addition, there's always some threshold where below which a linear algorithm is going to beat a fancier data structure. Slower implementations of the fancier data structure will have a much higher threshold. You are well-below that threshold right now.

    Unless you are not satisfied with the performance yet, it may be better worth your time to take a break and focus on the rest of your game. It'll be a good problem to revisit with a little more DOTS practice and more aggressive requirements to meet. I've been crunching pretty hard to meet a personal deadline this month. But after that, I would love to take a real shot at this algorithm, though someone may need to remind me! :D
     
    Krajca and MintTree117 like this.
  50. thebanjomatic

    thebanjomatic

    Joined:
    Nov 13, 2016
    Posts:
    36
    @Dale_Kim
    I had to refactor things to upload when I got home from work. This project should demonstrate the difference:
    https://github.com/thebanjomatic/TestPathfinding

    I changed things around so that I now do a search from (0,0) to (999, 0) with a single wall in the middle from (500, 0) to (500, 998).

    ArrayOpenList 4.52ms
    MinHeapOpenList: 5.68ms

    I haven't done an exhaustive analysis of the minheap implementation, but it appears to be working correctly.