Getting better performance out of AStar using burst

Sarkahn · Nov 12, 2020

So I wrote an astar pathfinding package. I wanted something generic and simple that would work with jobs and burst. It's nothing fancy, just a straightforward implementation of the redblob version of AStar using native containers.

It works well on small and medium maps, but performance gets *really* bad when the maps get very large. I have performance tests set up for profiling, and on a 1000x1000 map it takes 50+ ms to path from one end of the map to the other, even with burst.

Any advice on how I can improve it?

tertle · Nov 12, 2020

Don't except that you can path from 1 end of your map to the other (1k x 1k isn't even large).

There are tons of ways around this problem.

A couple of examples
- You can pre-compute partial routes that go across your world
- Use navmesh instead of a grid. This can greatly reduce the number of nodes you need to visit.
- You can limit number of nodes visited (1000 for example) and if the algorithm doesn't find it in that time return the partial path and start walking along that then compute the rest of the path from the partial path.

And you can do all of these things at once.

Sarkahn · Nov 12, 2020

tertle said: ↑

Don't except that you can path from 1 end of your map to the other (1k x 1k isn't even large).

There are tons of ways around this problem.

A couple of examples
- You can pre-compute partial routes that go across your world
- Use navmesh instead of a grid. This can greatly reduce the number of nodes you need to visit.
- You can limit number of nodes visited (1000 for example) and if the algorithm doesn't find it in that time return the partial path and start walking along that then compute the rest of the path from the partial path.

And you can do all of these things at once.
Click to expand...

I want to keep it as generic as possible so there aren't any restrictions on the kinds of maps you can use with it. And right now I plan on using it for a grid-based game so a navmesh wouldn't work unfortunately.

Adding an option to limit the the number of nodes you can visit per frame is a good idea. It really just hides the actual performance issue but if that's what I have to do, so be it. I was kind of hoping I was maybe just doing something really dumb with my implementation and there's some simple way I could tweak it to make it much faster.

Antypodish · Nov 12, 2020

Try divide your map into chunks. Let's say 10x10 or 20x20.
Each chunk contains "gates" to neighbour chunks.
Initially you lookup for path through chunks, for long distances.
Then when you actually move, you lookup only for a path of nearest chunks, for which you know the path is valid.

DreamingImLatios · Nov 12, 2020

If you want something faster at the low-level, you may want to look into creating specializations. Unfortunately the sneaky optimizations I am aware of don't work in your case because you have interface method invocations and random container access in your inner loop. So vectorization, branchless operations, and bitmasking are all off the table. The only real place to improve would be the priority queue. For that, you could use a wider tree than a binary tree and use simd operations to compare against several children at once when cascading up or down. This also reduces depth significantly so you don't pay for the traversals nearly as much.

Of course the high-level optimizations everyone else is suggesting are valid too.

Sarkahn · Nov 12, 2020

Antypodish said: ↑

Try divide your map into chunks. Let's say 10x10 or 20x20.
Each chunk contains "gates" to neighbour chunks.
Initially you lookup for path through chunks, for long distances.
Then when you actually move, you lookup only for a path of nearest chunks, for which you know the path is valid.
Click to expand...

I was considering a hierarchical solution like that, I think most of that work would be on the map side though. It does make sense it could be a massive improvement for huge maps though!

DreamingImLatios said: ↑

If you want something faster at the low-level, you may want to look into creating specializations. Unfortunately the sneaky optimizations I am aware of don't work in your case because you have interface method invocations and random container access in your inner loop. So vectorization, branchless operations, and bitmasking are all off the table. The only real place to improve would be the priority queue. For that, you could use a wider tree than a binary tree and use simd operations to compare against several children at once when cascading up or down. This also reduces depth significantly so you don't pay for the traversals nearly as much.

Of course the high-level optimizations everyone else is suggesting are valid too.
Click to expand...

So it sounds like the only obvious way to improve the algorithm is to specialize and create restrictions on the inputs then?

RE the Priority Queue - I did try a few other containers, including some I've seen on these forums, and profiling was showing the algorithm was running fastest with my NPQ. Of course that doesn't mean it can't be improved - I will do some reading and see if I can figure out how to implement your suggestions.

Thanks for all the suggestions so far!

DreamingImLatios · Nov 12, 2020

Sarkahn said: ↑

So it sounds like the only obvious way to improve the algorithm is to specialize and create restrictions on the inputs then?
Click to expand...

Yes. That isn't to say that having a generalized solution isn't useful. Many use cases don't need maximum performance.

Sarkahn said: ↑

RE the Priority Queue - I did try a few other containers, including some I've seen on these forums, and profiling was showing the algorithm was running fastest with my NPQ. Of course that doesn't mean it can't be improved - I will do some reading and see if I can figure out how to implement your suggestions.
Click to expand...

I haven't come across a proper simd-optimized priority queue in C#. And the only one I have vetted (C++) is way too platform-specific to share. Outside of AStar, I haven't found a use case for one yet to motivate me to write one.

eizenhorn · Nov 12, 2020

Sarkahn said: ↑

So I wrote an astar pathfinding package. I wanted something generic and simple that would work with jobs and burst. It's nothing fancy, just a straightforward implementation of the redblob version of AStar using native containers.

It works well on small and medium maps, but performance gets *really* bad when the maps get very large. I have performance tests set up for profiling, and on a 1000x1000 map it takes 50+ ms to path from one end of the map to the other, even with burst.

Any advice on how I can improve it?
Click to expand...

Implement hierarchical A* (for reducing dead ends cost and wide spread cost), check your heuristic function (which really matters for search accuracy depends on map type) you can read a paper from this my post, just exclude potential fields part https://forum.unity.com/threads/unity-dots-case-study-in-production.561229/page-2#post-4412335. And one more important optimization - make it calculate not instantly but off the main thread, it's the main way for A* implementation for huge scale map (in addition to all described above in the thread) in most cases you can spread calculation across a couple of frames. For future optimizations use LOS check for agents for trace top-level node gates and traverse through this top-level A* nodes (regions) and in the case when N+1 node not in LOS (Brenham or grid traversal) go straight to N node and just check from time to time if N+1 visible and go straight to this and repeat until you arrive end region or to the region where you can't see next N+1 node and build low-level A* path for this small granular region. In addition, for early outs of path request, you can build one more top-level thing - Area - block of connected regions and every region belongs to some AreaID and in the case when you request path and point you requested in a different area, then path not reachable and you don't need to even try to calculate the path, the similar region-area approach was used in Rimworld and in our Diplomacy is Not an Option.

Search Unity

Unity ID

Useful Searches

Getting better performance out of AStar using burst