Search Unity

Floating point determinism in Unity: Intel vs AMD

Discussion in 'Scripting' started by Iron-Warrior, Oct 21, 2020.

  1. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    838
    I am working on a game that has an input based replay system, which requires determinism to function correctly. I'm aware of the challenges of getting floating point arithmetic to be consistent in different settings, so this was initially just built to be an internal tool to help capture video footage. Turned out that the replays are entirely deterministic for same build/different machines. Unfortunately, I get desyncs when a replay captured on one CPU vendor is run on a different one (i.e., AMD ←→ Intel). I've tried doing Mono and IL2CPP with no success.

    There is of course a large tech stack between Unity and the machine code, but I am wondering if anyone has any insight on this? I made a small console app to test arithmetic determinacy

    Code (CSharp):
    1. float value = 0.2924150f;
    2.  
    3. value = (float)Math.Sin(value);
    4. value = (float)Math.Cos(value);
    5. value = (float)Math.Tan(value);
    6.  
    7. value = (float)Math.Pow(value, 2.32932f);
    8.  
    9. // numbers.txt contains 200 randomly generated numbers from 0-1.
    10. using (StreamReader file = new StreamReader("numbers.txt"))
    11. {
    12.     string line;
    13.  
    14.     int op = 0;
    15.  
    16.     while ((line = file.ReadLine()) != null)
    17.     {
    18.         float readValue = Convert.ToSingle(line);
    19.  
    20.         if (op == 0)
    21.             value += readValue;
    22.         else if (op == 1)
    23.             value -= readValue;
    24.         else if (op == 2)
    25.             value *= readValue;
    26.         else
    27.             value /= readValue;
    28.  
    29.         op = (op + 1) % 4;
    30.     }
    31. }
    32.  
    33. Console.WriteLine(value.ToString("F10"));
    34.  
    35. byte[] bytes = BitConverter.GetBytes(value);
    36.  
    37. for (int i = 0; i < bytes.Length; i++)
    38. {
    39.     Console.WriteLine(bytes[i]);
    40. }
    41.  
    42. Console.Read();
    and got consistent results on both Intel and AMD, so it could be a configuration issue? From what I've read x86-64 should produce consistent results on modern Intel and AMD, but it's hard to find a straight answer.

    Thanks for any help.
     
  2. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    I worked on a similar project a couple years ago. We wrote a test like yours to verify all math operations we could think of on AMD and Intel when we started to hunt for desync issues. We even tested several combinations of transform hierarchies. The culprit turned out to be the animation system, which was causing the bone transforms to be calculated with incredibly small differences between Intel and AMD, but enough to cause the collides parented to some of the bones to trigger at different frames on each CPU and throw the whole simulation off.

    Since this happens inside Unity's black box, we were forced to devise a workaround. We serialized the object-space translations and rotations of the bones used for collision detection from all key frames of all animations into scriptable objects and coded a simplified animation system to read the data matching the current animation key frame and update the colliders.

    Now, for reasons we didn't investing any further, the desync only affected some animations. We never found what made the others deterministic (we tested sampling key frames from all animations on Intel and AMD to confirm this).

    We didn't use physics either, so that's something else you should verify.
     
    Last edited: Oct 21, 2020
    Joe-Censored likes this.
  3. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    Is it possible for you to explain how exactly you're capturing input? If your system is incremental in nature, that's a poor design, bound to fail. I don't think you're will find a culprit in base x86-64, more likely in extension instructions tied with multimedia and extended floating point architecture, which is impossible to test naively, to my knowledge. I think this is well out of Unity's scope and has to do with virtualization, drivers, and god knows what else that's close to actual hardware. It could be just as well some "hardware accelerator" that comes with the AMD (for example) and installs as a factory default that introduces "optimization calls" for certain games to run better.

    When capturing animation or input, you need to make sure to have fixed timestamps to which you can tie the reproduction in a device-independent manner. Sure you can have as many arbitrary events in such interval-frames, but just like video compression formats work, you need to have regular reality checks, or checkpoints, where any state accumulation due to numeric imprecisions can be flushed.
     
  4. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    838
    That's interesting, I wonder what the was causing the difference. Sounds like cross vendor support may be a rabbit hole I don't want to go down.

    Input is placed in device agnostic structs before being passed into the game logic. So something like:

    Code (CSharp):
    1. public struct Input
    2. {
    3. public bool firePressed;
    4. public bool jumpPressed;
    5. // etc...
    6. }
    ...and all game logic is simulated in a fixed timestep. Animation uses the regular Unity animators, but set to update in the fixed loop. So that's all covered!

    Looking into extension instructions and virtualization, and I see what you mean—this could be way out of Unity's ballpark. Thanks for the input.
     
  5. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    That's great but do you actually measure time, or record the passage of time, or are you simply relying on "ticks" to correlate, because they don't have to, that's what I meant. There could be this ebb & flow, or let's call this a jitter, meaning that if you were to measure the overall passage of time it would appear same, however inside small intervals, such as typical frametimes, the discrepancies can be relatively huge enough so that your events do not actually line up, causing desyncs, that are possibly locally deterministic to an extent and maybe even replayable.

    What you want is a mechanism of absolute regular synchronization to constrain this effect. If by fixed timestep you mean UpdateFixed, that's not reliable at all. You literally want to have accumulators that soak in information from your events, and then a regular but infrequent mass state update to correct state accumulation, similar to how lamp posts pass by a moving car, after a certain locally measured time has passed. You don't care about the precision of reproduction, you care only about the faithful determinism of the simulation, relative to local environment. So this is what you capture in snapshots.

    Yes I get it's perhaps too much work compared to what you have, but this is what virtualization and cross-platform means. You need a robust, device-independent way of tracking in-game state as it unfolds. You really want to save a frame-by-frame movie, but in an optimal way. Considering you already can use a game to act as a faithful player of such a movie, your input recording is philosophically a way to massively compress data, and reduce redundant information to minimum. However, you've reduced it too much, and have no reliable syncing mechanism between two devices.

    And using physics is a headache in itself. Physics is de facto non-deterministic (between architectures) in complex scenarios. It relies on computing collisions with 32-bit floating point values, and pretty much regularly hits the lowest part of the IEEE-754 hardware implementation. Each vendor will handle low-level specifics differently, only agreeing to certain specifications and handling the numerical imprecisions within tolerances, but never guaranteeing the exact outcomes, that might as well lie on the limit of quantum field effects. It is easy to imagine that any physics engine in complex situations would prioritize things deterministically on the same machine, but slightly differently on another chipset, even though it would still be deterministic locally. Thus the net result is not deterministic across the board.

    These issues are pretty much one of the reasons why TCP/IP was invented for cross-machine communication btw.

    Currently your system appears to be an incremental one, and yes while the theoretical discrepancies shouldn't be there in the first place, here you are. The result you're seeing is similar to what is known as a Moire effect. It's a far fetched comparison, I know, but just try to imagine two interspersed time grids, which is what you get when you combine two different clock-based machines with standardized and similar but also different and proprietary instruction sets -- you shave a few cycles here, but you gain a few cycles there. Enough to have one step slightly longer, only to be compensated in the next, shorter one.

    And then imagine a compound effect of all these cumulative errors. I think this is what you're seeing.
     
  6. kaarloew

    kaarloew

    Joined:
    Nov 1, 2018
    Posts:
    360
  7. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    that's nice, but how would he integrate physics with that?
     
  8. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    838
    This is pretty interesting stuff, but I admit some of it is above my head. I am doing frequent error checking (every N frames) on a byte level to see if desynchronization has occurred, and so when the desyncs do happen it can be narrowed down to which object is out of place and roughly when did the divergence occur (can also do per-frame checksumming, which does not require saving out the transform of all objects to the replay metadata, but this won't tell you what object is out of place). So in this case, it shouldn't be necessarily errors that are accumulating over a long period of time. Note that the desyncs that occur cross vendor occur nearly immediately on game start.

    I'm not sure what you mean about FixedUpdate not being reliable—I'm sure it doesn't run exactly at the rate that it advertises, but it doesn't need to for reproducibility. It should simply need to represent a discrete time slice of the logic loop with a fixed delta time. As far as I know barring floating point consistency, this is a pretty tried and true method to build replay systems/lockstep deterministic networked games.

    And yeah, I can see that making floats determ across the entire stack, all the way down to the hardware level, would be challenging even within the same architecture/different vendors.

    Recompile PhysX with it? :D

    If anyone is interested in fixed point physics, Photon Quantum uses a FP 3D physics engine for deterministic physics.
     
    orionsyndrome likes this.
  9. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Standard math instructions are safe. I'm pretty sure SSE2 also behaves the same on both Intel and AMD. You don't need to go all the way into fixed point for that, unless you plan to support cross play with mobile devices.

    The problem are parts of Unity that might use different SIMD extensions on different CPUs. You could even get different results between two Intel CPUs if one is using SSE2 and the other uses AVX. So the only way out of this is to make sure your simulation doesn't depend on any of those parts of Unity that could cause problems. Input-based deterministic simulation is hard and does require sacrifices.

    We had to write a custom serialization system specifically to pinpoint desync sources during development, which used code generation to create serializer methods for the tagged types, fields and properties we wanted to monitor, plus methods we could call anywhere in the simulation to write values to be verified for that frame. Then we had code that could compare these sync logs and detect which value went bad on which frame, at binary level.
     
  10. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    Okay, so you do have some more insight on the issue.
    This is great news actually, it's not as bad as I'm portraying.
    You can fix this still.

    I can't tell though how exactly FixedUpdate works, but I'm pretty sure it's not based on individual physics-update ticks, but timing intervals instead, which is pretty sloppy. All they say is that it's based on a reliable timer, but that doesn't mean much. It would be nice if it would consistently trigger after N physics ticks. Maybe it does this, but then I can't actually explain desyncs, unless there is a slight discrepancy in the initial state, leading to divergent behavior.

    That's a considerable effort. I mean kudos!
    Sorry if I appear to be assuming you have modest understanding of the underlying causes, that's just a PTSD from my overall experience on the forum. I never know with whom I'm talking with.

    edit:
    Ah no you're a different person sorry :D
    I need coffee

    @OP
    So is it primarily initialization? I think we can safely exclude deeper issues with virtualization (unless Unity falls back to certain instructions differently, like KokkuHub explained), though floating point still lingers as a potential culprit.
     
    Last edited: Oct 21, 2020
    Iron-Warrior likes this.
  11. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    This. Great post btw.
     
  12. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    838
    So I made a replay that does full error checking every frame (comparing positions and rotations). First frame:
    Only a couple errors. (I don't have the system setup to check on frame 0 unfortunately, i.e., before any logic is run). Few frames later, when rocks far right contact


    Note that they do not desync. So it looks like lots of PhysX is indeed cross vendor determ...but some parts aren't. Many of the trees have birds (rigidbodies) standing on them (fixed joints) and do not desync, either. Some of the birds do desync, however (probably standing on moving rocks?).

    (Longer replay. Note that the player character does not desync. They have a couple invisible rigidbodies following them at all times).


    Unfortunately I don't have an AMD PC on hand to test (I need to keep enlisting people on my game's Discord haha), otherwise I could build something a bit more comprehensive. Hopefully can get my hands on one sometime soon to dig in a bit deeper.

    If anyone else is interested in this and has access to both CPUs, wouldn't be too hard to setup a scene that bounces some physics objects around and records their positions/orientations (to full precision ofc).
     
  13. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Wait, are you getting a replay to diverge on the same machine that produced it?

    I'm not sure how much determinism PhysX guarantees. There is a project setting for "enhanced" determinism, of course, but the docs are vague on why it's called "enhanced" and not "full" and what that entails, exactly.

    If your game is relying on actual rigid body simulations driven by Unity's own physics you will probably need to rethink either how your game does physics or how you're doing replays. Not sure if it's a thing in Unity, but in Unreal and other C++ engines people usually do their physics using Bullet when they need deterministic simulations.
     
  14. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
    I have a feeling that anything that doesn't depend on collisions from other rigid bodies is less likely to desync, pointing exactly toward the floating point errors.

    I think he has someone to reproduce it for him, but cannot do it as often as he would like. That's how I understood it anyways.

    I've managed to find explanation

    no mention of cross-platform. this is why there is no full.
    the inclusion of this safeguard is nice either way, but this highlights the fact that getting numerical errors consistently across the board is very hard / impossible.
     
  15. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    Also, there is no telling Unity registers rigid bodies and colliders with PhysX in a deterministic order.
     
  16. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,113
  17. Iron-Warrior

    Iron-Warrior

    Joined:
    Nov 3, 2009
    Posts:
    838
    No, the replays do not diverge while on different machines/same build/same CPU vendor (you can see a full playthrough in the link in the OP).

    Since the replays are deterministic with the conditions above, Unity at least does register things in a deterministic order (though of course it's possible CPU vendor modifies that somehow).