Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Performance of Overlap queries in both 3D and 2D drops with higher collider counts. Suggestions?

Discussion in 'Physics' started by xVergilx, May 27, 2020.

  1. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    I've noticed that in both Physx and Box2d related cases performance of OverlapXNonAlloc queries decrease drastically as soon as total physical objects count reach > 1k.

    I'm working on both mobile projects (as a day job) and a pc project (as a hobby), and encountered this behavior.

    Using layer masks doesn't mitigate this issue and has bare minimum performance impact.
    As soon as collider count raises Overlap checks become more and more costly.
    Note that this is also the case with the "cheapest" colliders, such as sphere, or circle.

    Tricks I've done so far involve storing collider objects / behaviours and checking againsts squared distance (for OverlapSphere / OverlapCircle) cases.

    In most cases it is magnitudes faster if the actual seek'd objects count is less than total collider counts.
    (E.g. instead of looking through with OverlapSphere, for players around via mask).

    Boxes are much trickier though. No idea how to approximate boxes (unless drop precision and use same method above).

    And also splitting Overlap queries across multiple frames, done that, works wonders in cases where it is possible, but even then query cost raises like crazy on high counts (>1k colliders?).

    So, my question is, how to bypass / address these limitations?
    Any suggestions about high collider count optimizations and OverlapX query usage?
     
  2. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    11,333
    If layers don't affect it then maybe the overhead is the results container type being used via the script bindings i.e. array?

    I can only speak for 2D physics but ALL the non-alloc stuff is tentatively deprecated. You'll find the equivalent overload by removing the use of the "NonAlloc" suffix. All 2D physics queries and basically everything that returns multiple things including contacts etc (in 2D physics) has an overload which allows a fixed-array or List<T> to be used. We had performance issues coming from the script bindings stuff at one point which was fixed.

    I'd be curious to see what happens if you (re)use the List<T> overload.

    In the end, I'm guessing here. Using LayerMask reduces candidates discovered during broadphase discovery and obviously completely negates them from being tested during the narrow-phase which is more expensive (depends on the collider types). When the results are gathered then there's potential overhead in marshallnig the results and this is where the above comes in.
     
  3. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    I'm using List overloads in most of the 2d cases, and even some of the direct Physics2D API, and there's barely any difference.
    There's also an overhead from creating ContactFilters that I've tried caching, that gave some minor performance boost.

    Plus, 3d API doesn't seems to be evolving the same way as 2D does.
    (Unless something have changed recently, haven't checked for the 2019.x cycle)

    I'm not sure whats the actual problem, whether its marshalling or not.
    But it certanly seems like it comes from the engine internal C++ sources.

    Well, even shifting points from the entity (via direction + normalization) and then checking an overlap based on the distance (sqrMagnitude) is at least twice as fast as testing via OverlapCircle on current gen mobile devices. This is on full set of entities required to be fetched. Which is nuts.


    Also, there's a rising cost of the collider insertion / removal. So I guess its caused by internal data storage.
     
    Last edited: May 28, 2020
  4. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    11,333
    Oh so you're not using the NonAlloc stuff then.

    There's an overhead on creating this struct? Wow. I mean, it's a small struct with a few properties, basically a POD.

    Now I''m confused. You said "Entity". If you mean ECS/DOTS, what has this got to do with the above? Or are you using that term in the abstract to mean "thing"?

    You should not have any costs like that for queries as they are a read-only operation. Or are you talking about another subject here?

    I don't follow whether you mean you're looking at overlaps with results returning 1K+ items or you're talking about doing 1K+ queries which is obviously going to add up.

    Maybe this is an "inevitable" cost with handling so many things? It's hard to say really.
     
  5. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    Constructors do have an overhead still. On larger numbers it becomes more apparent.

    I use both (direct Physics2D API via physics.defaultScene) and in 3D NonAlloc API in different projects.

    Just an abstract thing.

    That is a different case. There's a cost to just enable / disable collider component that rises as number of collider grow. At some point its better to just workaround it.

    However, this might be tied to how internally data is organised.


    Okay, so here's an example:

    I've got 40 dynamic "players" that want to check if something is around them. That is 40 OverlapCircle queries, but the actual 2d collider count is more or less about 1k.

    Here's another example from different game.
    I've got 4 players that do OverlapBox to gather plants and 6-8k 3d colliders as "plants" represented as spheres.

    In both these cases Overlap queries get more and more performance impactful.
    Moreover, increasing total number of colliders make things worse in other cases, like toggling collider components.

    The above examples are from mobile games, on a PC this might be not that much noticeable.

    It may be that physics engines get up to their limits.
    However, I doubt anything lower than 10k should stall physics engine like that.
    It feels like there's something else involved.

    So at this point the only way I found out to by pass this limitation is to not use Physics at all when range checks are involved. Other cases though - no idea.

    Edit:
    One more things forgot to mention. Raycasts doesn't seems to suffer from this regression.
    Which means its more or less related how the queries implemented?

    Although it might be a subjective thing, I haven't ran much of a comparison tests, due to the fact there is no real replacement for it.
     
    Last edited: May 28, 2020
  6. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    11,333
    This is a C# value type. Maybe you're thinking about C++ stuff. If creating a value type is an overhead then you better watch out for Vector2 etc. ;)

    The number of colliders used shouldn't be relevant. What would be relevant would be if their AABB intersect in the broadphase with the AABB of the circle you're querying. If that isn't the case it'll be like they're not there because Box2D won't even consider them as candidates. OverlapCircle just like other shape overlaps takes the shape AABB, asks the broadphase for candidates then checks those candidates.

    When you "disable" a component, stuff like colliders get destroyed. Enabling does the opposite. It's not a disable but more of a "just keep the property state and delete everything else". For 2D physics you can remove the Rigidbody2D, it's colliders, joints etc by using Rigidbody2D.simulated. This removes the overhead of destroying/creating underlying physics data.

    That's certainly an interesting observation but I don't have any idea why. What makes you say this is a regression?

    These numbers you keep referring to though are pretty meaningless. 10k what? 10k queries? That's a lot. 10k colliders in the scene? That has no cost apart from set-up. Performing a query with 10k colliders that don't overlap their AABB and the query only returns 1? That should be fast. It should NOT cost significantly more (performance-wise) in that set-up as the numbers of colliders grow. That's the whole point of the Box2D and PhysX broadphases. If that were not the case then there'd be little point and we'd just check each collider in the scene against the query which would be madness.

    I might expect overhead of returning mor and more collider results but not anything else. If you set-up a scene with 1k circles but make sure they are fairly far apart. Now perform a query which only overlaps one circle should be cheap. This shouldn't change much if you increased it to 10k circles.

    I don't really think there's anything here that can be actioned unfortunately. :(
     
    Last edited: May 28, 2020
  7. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    I know. I'm talking about struct constructor overhead. Yeah it applies to vectors too.

    In tight loops on large numbers it can yield some boosts to just create one before the loop and re-use it.
    Same applies to the ContactFilter. Creating one per behavior task seems more appropriate than recreating it each time. It doesn't changes anyway.
    That sucks. This means I need to decouple physical colliders out of actual entity game objects that are being enabled / disabled. Good to know though.
    Comparing the two, raycasts are way faster. Its not "bug" type kind of regression, my bad.
    Colliders. 10k colliders that was procedurally placed and aren't moving after that point.
    Yet it isn't. Performance decreases with the count of colliders added to the scene.
    This the exact reason why I've created this thread.

    Edit: English isn't my first language, sorry if something sounds awkward.
     
    Last edited: May 28, 2020
  8. MelvMay

    MelvMay

    Unity Technologies

    Joined:
    May 24, 2013
    Posts:
    11,333
    Well if it's the same for 2D and 3D physics which have completely different implementations it sure sounds like it's a case of this isn't "fast enough". It's hard to action that.
     
  9. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    Or, it could be an integration design flaw that persists in both cases. Hard to judge without the sources.

    Perhaps writing a custom "physics" engine on top of existing one with Jobs isn't that bad of an idea for stuff like proximity overlaps or range checks .
    Too bad that will take ages.

    Edit:
    Come to think of it, these cases / entities doesn't even require actual physics applied to them.
    Just some position / bounds checks.
     
    Last edited: May 28, 2020