Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Feedback There be dragons with New Transform System

Discussion in 'Entity Component System' started by DreamingImLatios, Sep 27, 2022.

  1. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    This is a hot take shortly after the new DOTS 1.0 drop and a few minutes of me beelining towards the code to see what is up with this new system.

    The Good
    So first, this new system is much simpler. And the smaller components lead to better chunk occupancy and faster hierarchy updates due to the reduced bandwidth.

    Issue 1: An Old Bug Re-Manifests
    However, the new system has many of the same bugs as the old system. In short, there's a race condition on the change filter of LocalToWorldTransform. And there is also unnecessary prefetching of child world transforms for temporarily static hierarchies.

    The good news is that the fixes and optimizations I have shared in the past to address these issues will also work here. I'll share those better versions when I have time to write them.

    Issue 2: World Position Might Be a Lie
    When you set the world position of an entity with a parent using TransformAspect, the local position is set based on the ancestors' previous frame's transforms. And consequently, if an ancestor moves, the world position will be different after the transform system updates.

    Personally, I think having a SynchronousTransformAspect and an AsynchronousTransformAspect would be beneficial, where the synchronous version only allows transforms to be touched on a single thread but provides the necessary mechanisms to keep the hierarchy fully in-sync.

    Issue 3: Non-Uniform Scale Was a Massive Innovation Now Gone
    Or more specifically, ParentScaleInverse with NonUniformScale is now gone. This combo made some things possible with squash-and-stretch character animation that has been mainstream in professional animation tools for quite some time now but has been noticeably absent in game engines.

    I propose that scale be handled as follows:
    • A single float uniform inherited scale.
    • An additional 3-float non-uniform non-inherited scale which only affects child positions.

    This would bring the number of floats up to 11. However, you can bring the total size up to an even 48 bytes by adding an integer change version that keeps track of when a ParentToWorld was last updated in a synchronous context.

    This would bring the best of both uniform and non-uniform scale, with the exception of not supporting intentional shear. In particular, non-hierarchical entities could be non-uniformly scaled without issue. Hierarchical entities would still be able to be globally uniformly scaled. And bone entities could be non-uniformly scaled locally to achieve squash-and-stretch and character customization effects.
     
    bb8_1, Occuros, Antypodish and 3 others like this.
  2. joepl

    joepl

    Unity Technologies

    Joined:
    Jul 6, 2017
    Posts:
    85
    Thanks for the feedback on the new transform system! We designed it with simplicity in mind and we wanted to ensure that it had a minimal data representation and was also readable and replaceable with custom implementations if need be.

    We also wanted to get the transform system in for the experimental release - even though we knew that it would undergo changes between now and the 1.0 release. Addressing feedback and bug fixing are our focus in the coming months.

    As far as your specific feedback:
    - Issue 1: An Old Bug Re-Manifests
    We definitely want to address this. Can you link to those fixes and optimizations you had earlier? I'll make sure this is tracked and we look into it.
    - World Position Might Be a Lie
    This has been a big internal question. DOTS is intended to give the best possible performance and recalculating world transforms from arbitrarily changed hierarchies will always be slow. Sometimes, this is indeed what you need though and we are considering options here.
    - Issue 3: Non-Uniform Scale Was a Massive Innovation Now Gone
    This has also been an internal question. NonUniformScale does not work with the physics system and has a host of issues. We are however looking for ways we can support this for leaf nodes and animation though and will take your suggestion under consideration.

    Please do keep the feedback and bug-reporting coming on the new transform system. There is a lot of work being done on it (do not consider the API locked in stone for the experimental release!). Many thanks!
     
    Elapotp, bb8_1, Anthiese and 5 others like this.
  3. thelebaron

    thelebaron

    Joined:
    Jun 2, 2013
    Posts:
    825
    I don’t suppose you could elaborate on why(other than physics) NonUniformScale has a host of other issues as it seems from the outset to be quite an innocuous little component? Just curious about the engineering challenges surrounding it in the greater context of things.
     
    JesOb likes this.
  4. scottjdaley

    scottjdaley

    Joined:
    Aug 1, 2013
    Posts:
    152
    I really like this idea. Or at least something in the API like SetParentAndRecalculateWorldPositionImmediately.

    I have a few uses cases in my game where I need to change the hierarchy of a selected object. In this case, its only a handful of entities so speed isn't that important. It would be nice to have the option for a synchronous and slow version of the transform API that doesn't have any gotchas about when the data is safe to use.
     
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Thread is here.

    https://forum.unity.com/threads/sou...-localtoparentsystem-updatehierarchy.1175111/

    The links at the bottom contain mostly up-to-date versions for 0.50 and 0.51. I have a slightly newer version where I converted the "extreme" breadth-first system to an ISystem which helped a lot. But for the most part, you likely only care about the "Improved" variants which have the minimum changes necessary.

    I also did discover a race condition with the order of the Child buffer, so you will also see a reworked ParentSystem that sorts the values of the multimap associated with each key before operating on the buffers. While that is a little slower, I also made the system run in Burst, so it was a net win.

    I think making the existing behavior more obvious by exposing two different approaches and giving the user a choice makes the most sense. But right now it is unintuitive and will likely catch someone off guard.

    Yeah. Make Physics ignore the non-inherited non-uniform scale. But keep in mind my proposal for this non-inherited scale is not limited to leaf nodes in hierarchies. You want to be able to stretch arms and legs, not fingers and toes.
     
    Occuros and JesOb like this.
  6. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    383
    I also find it weird that they combined all position, rotation, scale into one component, if a system needs only one of it why bring all of them to the cache? Can someone explain this decision to me?
     
  7. iamarugin

    iamarugin

    Joined:
    Dec 17, 2014
    Posts:
    863
    I have the same problem. I almost never use rotation and scale in my project, and I will need to fetch 5 more useless floats on every iteration.
    I hope there will be a way to integrate the old Transform system into 1.0 pipeline.
     
  8. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    As far as I know, the whole chunk is loaded into the cache, not just the components you're fetching, so it shouldn't make any difference.
     
  9. iamarugin

    iamarugin

    Joined:
    Dec 17, 2014
    Posts:
    863
    It would, because less data will fit into chunk.
     
  10. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,653
    Well, technically - no, because before you had more components associated with simple Transform (Translation, Rotation, Scale and LocalToWorld float4x4 which is total 24 floats) and now you have only 8 floats (which already give you additional space in comparison with previous Entities version, very roughly speaking x3 more entities). Moreover chunks already capped to 128 entities per chunk because of enablable components. I don't think it will have much negative difference of merging position, rotation and uniform scale in comparison with previous version, quite opposite.

    EDIT: Ah stop, I've missed that
    LocalToWorld
    is still there. Forgot everything from above (except 128 chunk cap):D

    But still, yeah I believe it wouldn't make much difference in comparison with previous version of Entities, as they still will consume absolutely the same amount of memory in chunk as it were before 1.0
     
    Last edited: Sep 29, 2022
  11. iamarugin

    iamarugin

    Joined:
    Dec 17, 2014
    Posts:
    863
    Well, if ltw is still here, than I will just bring all 0.5 transform systems and components and not use 1.0 ones. I am not using physics or netcode anyway, so should not be issues there I hope.
     
  12. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    This is incorrect. Cache does not work that way at all. This write-up (especially the 2D part) will explain how caching actually affects things in chunks: https://github.com/Dreaming381/Lati...tion Adventures/Part 4 - Command Buffers 1.md

    The reason for merging the components into one is a bit nuanced. But to oversimplify, the 2X bandwidth increase penalty for dragging in data you don't need when chunk iterating is a mild sacrifice compared to the amount of random accesses and random access bandwidth the new approach eliminates when updating hierarchies. It also doesn't seem too difficult to extend or replace this new system, but I'll let you know once I get that far.
     
    Elapotp, lukereeves, bb8_1 and 3 others like this.
  13. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    That's as I understand it based on previous statements from Unity. It is most efficient to write a whole contiguous block of memory in to cache than just selected bits of data, since that would actually be sort of random memory access. As I understand it, unity internally iterates over all the entity data blocks in the chunks, and gives you the data you requested. Since chunks are used for iteration, I strongly assume that the whole chunk is in the cache.
     
  14. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    This is correct.
    This is not.

    The benefit of chunks is that they provide a struct-of-arrays layout which means for a given component, there's a whole bunch of instances lined up in a row in memory. The speed comes from iterating linearly through memory. But the cache is much more dynamic. Again, the write-up I linked goes through what is actually happening, complete with visuals.