Search Unity

Help Wanted Is it still worth optimizing square roots in 2021+?

Discussion in 'Scripting' started by iSinner, May 17, 2021.

  1. iSinner

    iSinner

    Joined:
    Dec 5, 2013
    Posts:
    173
    This question is related to modern day advancements. I am not in the loop of the new circuits that modern CPU's and GPU's have, hence the thread.

    Consider intel and amd desktop CPU's made after 2018.
    For GPU's, consider desktop GPU's released after 2018.

    Do they have dedicated 'units' to compute it, or not?

    1. Is it still worth optimizing square roots for work done on modern CPU's?
    2. Is it still worth optimizing square roots for work done on modern GPU's?
     
    Last edited: May 18, 2021
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    20,154
    It depends on the target architecture and runtime library of course, as all computer code runs on the actual CPU where your game is running, not some "theoretical magical global CPU of today" construct.

    But in 2021 a floating point unit (FPU) is pretty standard on most targets.

    All that said, don't do needless work, especially on mobile, as that just burns down battery life.

    If you suspect you have a performance bottleneck, DO NOT START MAKING OPTIMIZATION CHANGES!!

    Instead, start with the Profiler (Window -> Analysis -> Profiler) to confirm you actually DO have a performance issue and to confirm where it actually IS, otherwise you will tear up your codebase and it won't get any faster.
     
    Joe-Censored and Bunny83 like this.
  3. iSinner

    iSinner

    Joined:
    Dec 5, 2013
    Posts:
    173
    Take intel and amd desktop CPU's made after 2018.
    FPU yes, but do they have something dedicated to square roots? i've read somewhere that doing this optimization is not necessary on today's CPU's because it is mostly pointless since it's free because of special units dedicated to it. And i'm trying to find out if this is true.

    As for GPU's, maybe someone can chime in and clarify it.
    Maybe @bgolus can shine his wisdom on this subject?
     
    Last edited: May 18, 2021
  4. GroZZleR

    GroZZleR

    Joined:
    Feb 1, 2015
    Posts:
    2,858
    How many square roots are we talking? The famous Quake III example was used in every single lighting calculation, so it makes sense in their use-case. If you're calculating a lot of distances, I'd say your first step for optimizing would be to simply do those calculations less frequently -- AI doesn't really need to tick every frame, for example -- but Kurt's suggestion of using the profiler first and foremost is where you want to start.
     
    Joe-Censored likes this.
  5. iSinner

    iSinner

    Joined:
    Dec 5, 2013
    Posts:
    173
    Thanks for the tip, but I don't have a specific problem i'm trying to solve, my focus is on the availability of specialized circuits.
     
  6. exiguous

    exiguous

    Joined:
    Nov 21, 2010
    Posts:
    1,618
    Shouldn't you ask this question the hardware vendors then? Seems pretty unrelated to Unity, which this forum is about.
     
  7. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    If you can use sqrtMagnitude instead of distance use it. But its not always possible.

    For example if I want the time it takes for a sound to travel from sound source to player I cant do

    (playerPos - audioPos).sqrMagnitude / speedOfSound^2
     
  8. iSinner

    iSinner

    Joined:
    Dec 5, 2013
    Posts:
    173
    I'm not sure how to ask vendors.

    It is strongly related to programming so it wouldn't be surprising if a lot of people knew this. I don't, unfortunately, and i'm not even sure where to look for it and what to look for.
     
  9. lordofduct

    lordofduct

    Joined:
    Oct 3, 2011
    Posts:
    7,820
    Is there specialized hardware? Yes.

    It still has latency issues because solving a square root is difficult despite it being in hardware or software.

    Here you can find various optimization documents. Document 4 contains the latency table for various math ops including the different sqrt hardware methods from various CPU vendors (oh and yes a CPU can actually have more than one sqrt instructions, 1 being faster but less accurate than the other):
    https://agner.org/optimize/

    Consider that the CPU matters here. Especially with cellphones since they rely on RISC architectures rather than CISC architectures. The thing in that regard is that RISC stands for "reduced instruction set computer", the reduced referring to that it has less hardware instructions available and instead relies on the software to call the appropriate sequence of instructions to perform more complex operations.

    From a theoretical perspective does it exist? Yep! And it has for a while.

    Is it magically fast? Nope! Square roots are difficult. No matter what.

    Should you develop like they are magically fast? That depends. How many are you doing? How accurate do you need them to be?

    As was already stated you generally should just run the profiler and determine where your hot spots are. Square roots are likely not your bottleneck unless it's something that you're running very frequently... very very frequently.

    Does this mean you shouldn't optimize it? Nah. If you could do a sqrMagnitude distance test just as quickly and in the same amount of readable code... do it! I'd argue both of these take just as long to write and are just as readable:

    Code (csharp):
    1. if ((a - b).magnitude < dist) { ...
    Code (csharp):
    1. if ((a - b).sqrMagnitude < dist * dist) { ...
    [addendum]
    Here's how I think about it... and it's how I think about any optimizations from math, to garbage, to and of this crap.

    It's frequency and convenience.

    And what I mean is this.

    Say whenever the user presses the A button I pull a list of all "interactable" objects and filter them down to get the closest one. Thing is this doesn't happen frequently... and yes someone MIGHT repeatedly press the A button, but at the end of the day it can only be pressed ONCE per frame, and the game can processes 60+ frames per second. This is negligible. So are their optimizations I could make here like an octree? Using sqrMagnitude? Avoiding linq? blah blah blah. SURE! But why bother? I know that in my game, in any given scene, there's like 10 or 20 "interactables"... and the most naive approach is to just loop over all 10 or 20 of them and do a distance (sqrt) check. Yeah... that's all I'm going to do. I write it in 5 seconds and I'm good to go.

    Don't get me wrong, if I notice when playing that every time I press A a frame drops, well, I might go "ohhh, I did write that really naive interactable code, lets go check that out with the profiler".

    ...

    Now, lets say I'm writing something to deal with a swarm, talking thousands of enemies. And every frame they're doing raycasts and distance checks and so on and so forth (lots of sqrts and other costly instructions). Going into this I know this is something happening every single frame in magnitudes of thousands. Doesn't matter what instruction I do... it's going to cost a LOT. if statements and addition start to stack up at this point (note, if statements, known as branches, are actually kind of expensive in a weird complicated way due to the intertwined nature of branch prediction and the processors pipeline).

    I likely won't jump into this with naive code off the bat. I might immediately think "hrmm, well I should probably approach this with DOTS and off the bat reducing my instruction costs with things like avoiding costly instructions (sqrt)". I'd write something like this, then profile it, then adjust, then profile again, adjust, toss it on a cellphone or other lower end hardware, profile again, adjust.

    ...

    Then there's like a level generator that runs at the beginning of the game (think like Minecraft generating a map). Here, yeah, a lot is going on that could be optimized. But I don't care... put up a loading screen and just push through. Oh no, you have to wait 3 seconds for the thing to do a thing. Unless that 3 seconds turns into 30 seconds, I probably won't optimize it as I'm more concerned about how good the results look!

    ...

    Point is... once you've been doing this enough, and running that profiler enough, you're going to get experienced with anticipating what causes bottlenecks and what doesn't.

    Like what Justice Potter Stewart said of hard-core pornography:
     
    Last edited: May 18, 2021
    JoNax97, iSinner, MDADigital and 2 others like this.
  10. MDADigital

    MDADigital

    Joined:
    Apr 18, 2020
    Posts:
    2,198
    I think the same. User actions that only can happen once a frame I tend to not care that much about: I even can use Linq sometimes for these if it cleanes up the code. Its the hot code you need to be careful with
     
    Kurt-Dekker likes this.
  11. iSinner

    iSinner

    Joined:
    Dec 5, 2013
    Posts:
    173
    Thank you!

    The link to the document with latency is especially helpful.
     
unityunity