Search Unity

  1. Dismiss Notice
  2. We have added a sub-forum for DOTS Graphics. Please use the new space for related discussions.
    Dismiss Notice
  3. All Pro and Enterprise subscribers: find helpful & inspiring creative, tech, and business know-how in the new Unity Success Hub. Sign in to stay up to date.
    Dismiss Notice
  4. Dismiss Notice

DOTS for floating origin

Discussion in 'Data Oriented Technology Stack' started by awesomedata, Jan 10, 2020.

  1. gnostici


    Jul 27, 2013
    Question: Supposing all of your entities' transforms are set relative to the player('s camera), and that everything in the scene is a child of the GameObject called "Root", what's wrong with the following? This works for me just fine, and without much performance hit because it only fires when it needs to. The only gotcha in some circumstances is that the origin is moved back to 0,0,0 on the frame after the root object is translated. But I anticipate no problems, since this can be done long before floating point errors, and all relative transforms should remain intact.

    Note: You see theta and phi because the rectangular x and y are treated as longitude and latitude elsewhere.

    Note 2: I could see this being problematic in an already-well-established project where the math is written, and the requisite relative positioning isn't in place. But in a newer project, where this can be kept in mind, I can't see any issue with it.

    Note 3: Unity really does an amazing job of updating child transforms in a performant way behind the scenes.

    Final Note: This code is a little repetitive, and can be cleaned up a bit. Pardon that please; I only just now banged on the keyboard a bit to post it here.

    Code (CSharp):
    1. using UnityEngine;
    2. using Unity.Entities;
    3. using Unity.Mathematics;
    5. [UpdateInGroup(typeof(InitializationSystemGroup))]
    6. [UpdateBefore(typeof(PlanetaryCoordinatesSystem))]
    7. public class FloatingOriginSystem : SystemBase
    8. {
    9.     Transform rootTransform;
    10.     Transform cameraTransform;
    12.     bool shiftX;
    13.     bool shiftY;
    15.     Vector3 shiftVector;
    17.     protected override void OnStartRunning()
    18.     {
    19.         PlanetaryCoordinates.shiftThreshold.Data = 10000;
    20.         PlanetaryCoordinates.thetaShifts.Data = 0;
    21.         PlanetaryCoordinates.phiShifts.Data = 0;
    23.         shiftVector = new Vector3(0, 0, 0);
    24.     }
    26.     protected override void OnUpdate()
    27.     {
    28.         cameraTransform = Camera.main.transform;
    29.         PlanetaryCoordinates.cameraTheta.Data = cameraTransform.position.x;
    30.         PlanetaryCoordinates.cameraPhi.Data = cameraTransform.position.z;
    31.         PlanetaryCoordinates.cameraR.Data = cameraTransform.position.y;
    33.         shiftX = math.abs(PlanetaryCoordinates.cameraTheta.Data) >= PlanetaryCoordinates.shiftThreshold.Data;
    34.         shiftY = math.abs(PlanetaryCoordinates.cameraPhi.Data) >= PlanetaryCoordinates.shiftThreshold.Data;
    36.         if(shiftX || shiftY)
    37.         {
    38.             rootTransform = GameObject.Find("Root").GetComponent<Transform>();
    40.             if(shiftX)
    41.             {
    42.                 PlanetaryCoordinates.thetaShifts.Data += (int) math.floor(PlanetaryCoordinates.cameraTheta.Data / PlanetaryCoordinates.shiftThreshold.Data);
    44.                 shiftVector.x
    45.                     = PlanetaryCoordinates.shiftThreshold.Data
    46.                     * math.floor(PlanetaryCoordinates.cameraTheta.Data / PlanetaryCoordinates.shiftThreshold.Data);
    48.                 shiftVector.z = 0;
    50.                 shiftVector = rootTransform.TransformVector(shiftVector);
    51.                 rootTransform.position -= shiftVector;
    53.                 PlanetaryCoordinates.cameraTheta.Data = cameraTransform.position.x;
    54.             }
    56.             if(shiftY)
    57.             {
    58.                 PlanetaryCoordinates.phiShifts.Data += (int)math.floor(PlanetaryCoordinates.cameraPhi.Data / PlanetaryCoordinates.shiftThreshold.Data);
    60.                 shiftVector.x = 0;
    62.                 shiftVector.z
    63.                     = PlanetaryCoordinates.shiftThreshold.Data
    64.                     * math.floor(PlanetaryCoordinates.cameraPhi.Data / PlanetaryCoordinates.shiftThreshold.Data);
    66.                 shiftVector = rootTransform.TransformVector(shiftVector);
    67.                 rootTransform.position -= shiftVector;
    68.                 PlanetaryCoordinates.cameraTheta.Data = cameraTransform.position.z;
    69.             }
    70.         }
    71.         else if (rootTransform.position.x != 0 || rootTransform.position.z != 0)
    72.         {
    73.             shiftVector = rootTransform.position;
    75.             foreach(Transform child in rootTransform)
    76.             {
    77.                 child.position += shiftVector;
    78.             }
    80.             rootTransform.position -= rootTransform.position;
    81.         }
    83.         PlanetaryCoordinates.cameraTheta.Data += PlanetaryCoordinates.thetaShifts.Data * PlanetaryCoordinates.shiftThreshold.Data;
    84.         PlanetaryCoordinates.cameraPhi.Data += PlanetaryCoordinates.phiShifts.Data * PlanetaryCoordinates.shiftThreshold.Data;
    85.     }
    86. }
    Last edited: Jun 6, 2020
  2. egable


    May 21, 2014
    Arowx likes this.
  3. snacktime


    Apr 15, 2013
    I applaud the effort but I think there are a number of practical reasons why this is not a good approach.

    Transforming is just not that expensive. And in context moving the origin is not something you generally need to do every frame. So you pay the extra cost of doubles in both time and space always for infrequent transformations?

    Runtime you are likely to hit something that forces you to transform, generally rendering concerns. So if you have to transform anyways doubles just lost most of their value.

    Design time most everything will be floats. Some of it you just have no choice. Some you do like you could bake some data into doubles but space is not a minor concern here.

    Multiplayer will force you to transform even if just for the network. Quantization here is based on only using as many bytes as are actually needed plus other tricks for more space efficient solutions to encoding precision. All of which are thwarted by large numbers.
    Lukas_Kastern likes this.
  4. DreamingImLatios


    Jun 3, 2017
    You necroed this thread to promote a paid product?

    Others may be more picky about this than me, but I don't mind seeing DOTS-related stuff in this subforum, free or paid. But you could have at least started a new showcase thread with a flashy gif or video of your tech in action.

    Sorry if I sound harsh. It is just suspiciously similar to an incident we had here a few months ago.

    Edit: Just realized you do have such thread in the Physics subforum.
    Last edited: Feb 3, 2021
  5. WAYN_Games


    Mar 16, 2019
    Debate aside, any news/docs on Unity support improvements made on that subject since @Joachim_Ante 's post beginning of 2020 ?
    awesomedata, schaefsky and bb8_1 like this.
  6. Arowx


    Nov 12, 2009
    This depends on the speeds in the game as 32 bit floats only give you about 10 km* (from origin) before the error affects cameras, movement and physics.
    • F1 cars about 100 m/s so every 100/200 seconds.
    • Supersonic jets 343 m/s so every 30/60 seconds.
    • International space station 7666.66 m/s so every 1.3/2.3 seconds.
    • 26 km a second speed to get to Mars so every 0.38/0.76 seconds.
    • 299,792,458 m/s for the speed of light (laser beams) every frame.
    10km/20km assuming new origin aligned with flight path.

    That's just for one entity at origin, it gets more complex when you consider a multiplayer or multi-unit simulation where everything moving could be transitioning between 32 bit 10km radius regions.

    Or have origins that move around them then need to be combined when they intersect. When the combined approach speeds would be doubled.

    Worst case scenario would be large modern battles with projectiles that can travel 20km, cruise missiles and jets. Where every projectile could be traversing 10km ranges within seconds and their could be hundreds of them at once.
    And then their is space battles with laser beams at world or solar system scales.

    Unity at 32 bit floats is a great 10 km sand box radius but 64 bit allows Unity to expand to the size of the solar system (about 200,000,000,000,000 km).

    *It's about 8 km for camera shake to start and probably earlier for high precision calculations to gain too much error for accurate physics.
  7. egable


    May 21, 2014
    The game I'm building will require entities to be simulated across an environment larger than a single solar system. Single precision float can't even encompass the required scale and precision necessary for a single planet. There is no space trade off by converting to double precision. I'd either need multiple single precision coordinate systems overlayed on each other (for every single entity) to cover a single solar system simulation or I can just use a single double precision coordinate system for the whole solar system. A single double precision system is less complex and actually results in less overall storage space and possibly even less math work on the CPU to simulate everything. The simplification of code also makes everything easier to read and understand and the whole project likely faster to build with fewer bugs.

    By converting the entire framework to double precision, I can use just two coordinate systems: one for within a solar system and one for within the universe. The solar system coordinates could allow simulation down to millimeter or even micrometer resolution. The universe coordinate system would work on light years and the two combined would allow me to position objects with millimeter or micrometer precision across the entire extents of the observable universe.

    You are also assuming that transformations are infrequent. When you have hundreds or thousands of entities in motion spanning ranges greater than can be represented by a single precision float, every single object must be translated from double-based world coordinates to local single-precision coordinates, have physics applied, then translated back to double-based world coordinates. And this has to happen on every single frame because they are in motion. Also, every translation you make is imprecise and adds additional error. Floating point numbers (single or double precision) can't store every possible value between any two representable values. So just converting to single precision and back again causes motion to occur in your objects which would not otherwise be expected to occur based on its physics properties. Every single mathematical operation you perform at low precision adds more error. There are a host of papers written on the computational errors caused by IEEE 754 floats.

    The reality is that when you account for the multiple coordinate systems necessary for faking it and all the extra math operations and compounding errors resulting from single precision calculations, it's just not worth it. The cost is actually greater in a lot of ways by using such an unsuitable numerical representation.

    My goal is to push the higher precision numerical systems as close to the GPU as possible. I would say that I want to push double precision numbers straight to the GPU rendering systems, but the truth is that it may be unnecessary to push it further than the shaders. If the shaders have a double-precision position of the rendering camera and double-precision position information on the elements which need rendering, it can probably do a camera-origin double to single precision conversion prior to rendering such that it only sends floats out for actual rendering. But that at least eliminates all of the complexity everywhere else in the game engine, as well as significantly reduces all the compounding computational errors.
  8. snacktime


    Apr 15, 2013
    I think you are not correctly accounting for the cost of having to fetch 8 vs 4 bytes everywhere. That's potentially a lot more idle cpu time waiting for data. Less efficient vectorization. Fetching more from slower caches more often. This all of course is very context specific but the potential hit here is rather large.

    Physics you could probably pay the cost there and be ok. Doubles on the gpu I don't see how that ever makes sense.

    Partitioning does not mean you lose precision that matters. Multiple partitions solves that good enough. And in some cases you might only need to partition/scale down not up. There are no compounding precision errors if you do this right.

    You are downplaying the cost of your approach and making partitioning out to be worse then it is.

    Games like Eve Online have been using multiple partitioning for years at this scale. Star citizen chose to use doubles for physics only from what I've read.

    I'm going to stand by my assessment that in practice partitioning and transforming is the better practical solution for most cases. Space FYI is just one use case, and there I can see doubles for physics making sense. But trying to extend doubles everywhere you can no.
  9. egable


    May 21, 2014
    You are overestimating the cost of loading the doubles onto the CPU, as well as the cost of performing the actual mathematical operations. Memory isn't loaded onto the CPU one byte at a time. It comes from RAM into CPU cache in lines of whatever your cache line size is (architecture dependent). Modern CPUs have many MB of cache, so even if using doubles, it's unlikely that any given operation I'm doing will completely fill that CPU cache and require multiple cache fills (in which case the cost of loading doubles would definitively be slower than floats). In addition, the CPU will often fill the cache while it is performing other operations, so whether I use doubles or floats, the CPU is not going to sit idle while the cache fills.

    Reading doubles out of cache and into the CPU registers takes only 1 cycle, just like reading a float. They are 64-bit registers, after all. In terms of which is faster for performing a mathematical operation on a 64-bit wide float vs a 32-bit wide float, it depends heavily on the native architecture and whether the 32 bit or 64 bit modes are emulated by the other. On modern hardware, it's more often the case that 64-bit operations are native and 32-bit is emulated making 64-bit operations slightly faster on an operation-per-operation comparison. In some specific cases where you optimize things such that you can run parallel SSE operations, you can still get nearly twice the 32-bit operational throughput vs 64-bit, however (again architecture and optimization dependent).

    The long and short of this is that I do expect to see a large difference in performance when talking specifically about burst compiled code. This is a specific use case where I expect to see properly optimized SSE / SIMD / vectorized instructions being fed to the CPU for 32-bit operations which could legitimately yield nearly double the performance vs 64-bit. However, short of this specific case, I expect the 64-bit operations to be on par with 32-bit operations in terms of performance. A casual benchmark of double vs float operations in C backs this time and again. There are benchmarks all over the Internet clearly establishing that double precision math on modern CPUs is slightly faster than single precision math. The only case where it doesn't hold is well optimized and vectorized / parallel operations. These benchmarks are done predominantly using C / C++ compilers, and Unity now compiles code to C++ before making the binary, so I expect these benchmarks to be relatively accurate. The physics code in Unity DOTS is also all executed on the CPU. If all the physics math was being run in parallel GPU-based compute operations, things would be different.
  10. snacktime


    Apr 15, 2013
    The absolute cost of fetching memory doesn't matter. It's about where is the cpu spending most of it's time, what is it bottlenecked on.
  11. DreamingImLatios


    Jun 3, 2017
    I'm actually with @egable here. While floating origin makes a ton of sense for games where large distant structures are statically placed, it doesn't handle the situation well when those large structures are moving, which I understand to be the motivation for moving everything to doubles for the project in question. And I can back up the performance of doubles. While it is at worst a 2x hit on the game, there's significantly less code complexity and checks required. And often times the hit is far less because a lot of code is scalar and the vectorized code that does exist is SSE which can easily be swapped with the double-precision AVX counterparts. I don't believe floats are "emulated" so much as they are more easily subject to conversions and alignment bugs in the compiler, but I haven't looked at the transistors of modern CPUs to say with certainty.
    Occuros likes this.
  12. runner78


    Mar 14, 2015
    I'm not quite sure, but I think GPUs only represent geometry with single floats. In such a case, HDRP would be the only option for the pipeline. (Camera center rendering)
  13. DreamingImLatios


    Jun 3, 2017
    HDRP camera-relative rendering is shader-based which only works on floats. When using doubles, you need CPU-based camera-relative rendering. Fortunately that is pretty easy to do in DOTS and works with any SRP. You just calculate the camera-relative LocalToWorld matrices from your double-precision transform representations.
  14. Thygrrr


    Sep 23, 2013
    How would I prevent EndFrameLocalToParentSystem and EndFrameTRSTToLocalToWorldSystem from firing?
    That's the problem with these inbuilt systems. You either work with them because you depend on them, or you intentionally trash them somehow to replace them with basically a rewrite.

    I do find the idea very appealing, though. LocalToParentSystem is basically what I already use for Camera Centric rendering.

    My floating origin solution would also benefit from a simplification through double precision physics, but ONLY if the performance hit is acceptable. The code simplification would be significant, though. About a third of my game code is some brain melting gymnastics dealing with the different scales of the different physics bubbles.
    awesomedata likes this.
  15. DreamingImLatios


    Jun 3, 2017
    I'm assuming you would do the latter. The catch is that the hybrid renderer needs the float precision LocalToWorld. So each entity would end up with both a LocalToWorldDouble and a LocalToWorld component. The rest of the components would only need to be double precision if you have modified your simulation to work with doubles.
    Thygrrr likes this.
  16. jasons-novaleaf


    Sep 13, 2012
    a bit off topic, but any links, for those who want to learn how to architect floating origin / procedural / open world storage/streaming?

    my ideas revolve around this and I would like to not make it up as I go....
    WAYN_Games likes this.
  17. egable


    May 21, 2014
    This is exactly why I decided to do this. I'm thinking through all the physics I have to implement in my game and have run simultaneously at vast distances and all the multiplayer (and multi-server) interactions and I've back-of-the-napkin tallied up all the time and cost of implementing those crazy gymnastics vs rewriting the DOTS framework to just handle it natively, and the DOTS framework rewrite wins hands-down. And once it's done, everyone can benefit from that work. And, since I am doing this as a patch on top of a specific version of the DOTS code, I can just rebase on top of the next release and manually account for the delta between the two releases. It should be pretty easy to maintain going forward.
  18. tertle


    Jan 25, 2011
    Pretty much every component in the Unity Transform has [WriteGroup] on it so that you can override the behaviour of the systems.

    This lets you specifically filter which entities these systems will work on.
    Some of the writegroups you can override for example

    Code (CSharp):
    1. [WriteGroup(typeof(LocalToWorld))]
    2. public struct Parent : IComponentData
    4. [WriteGroup(typeof(ParentScaleInverse))]
    5. public struct Child : ISystemStateBufferElementData
    7. [WriteGroup(typeof(WorldToLocal))]
    8. public struct LocalToWorld : IComponentData
    10. [WriteGroup(typeof(LocalToWorld))]
    11. [WriteGroup(typeof(LocalToParent))]
    12. public struct Translation : IComponentData
    14. [WriteGroup(typeof(LocalToWorld))]
    15. [WriteGroup(typeof(LocalToParent))]
    16. [WriteGroup(typeof(CompositeRotation))]
    17. public struct Rotation : IComponentData
    WriteGroups have been in entities since 0.0 but I feel they are the least understood feature.
    Last edited: Feb 8, 2021
    awesomedata, Sarkahn and xVergilx like this.
  19. xVergilx


    Dec 22, 2014
    I bet that's because of dumb naming. It should be ExcludeGroup or alike.
    awesomedata likes this.
  20. Thygrrr


    Sep 23, 2013
    Not only the naming, the docs are also suuuuper vague.

    Like, how is it determined which system causes which system to exclude entities? The one that runs first?

    And what is the example query the bottom this page even doing, SEMANTICALLY?

    (fortunately, the opening examples are kind of concrete, but nothing is clear about why the filter group works)
    Last edited: Mar 29, 2021
    awesomedata likes this.