Search Unity

Feature Request Hierarchy data as core feature

Discussion in 'Entity Component System' started by Enzi, Oct 19, 2022.

  1. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    Hello all,

    I want to open this topic to get some more eyes on this about something that's bugging me since the first days of working with entities.

    Hierarchy data, as in entity references are saved in chunk memory and my feature idea is to move it out of there and give it it's own dedicated space unrelated to chunk memory.

    There are several problems with how it's currently working.
    a) entity references are quite big, 2 integers - 8 bytes
    b) the amount of entity references needed is largely unknown by the Unity - it could be 0, 1 or a large amount. It's really dependent on the feature that's implemented.

    We have 2 offenders: LinkedEntityGroup and PhysicsColliderKeyEntityPair
    Both are using DynamicBuffers. LEG has a default size of 8 which results in 144 reserved bytes of chunk memory and the physics key pair has a default size of 16 (!!) and 208 reserved bytes of chunk memory.
    A large default cap has been implemented, yet Unity doesn't know how much is actually needed for an entity. It could be a worst case with no child and 1 collider and you'd waste 352 bytes per entity for it.

    Even though we have an entity chunk cap of 128 now, it's mindbogglingly hard to keep the chunk capacity over 32 with how much data is wasted. I don't think I need to explain to anyone how bad a low chunk capacity is. Going under 32 really eats into job overhead mainly from low iteration count and repeated component/buffer access. As fast and good as the job system is, it's still operating better when it can iterate on the largest amount possible.

    As saving hierarchy data somewhere is a very common problem, I think it should get the treatment of being a core feature, just like Enabled components. Hierarchy data or references are rarely used in hot paths. Their access is mostly limited and in most cases important for init/destroy. In case, references are used in hot paths, everyone is free to implement them in chunk memory. I just point this out that the option would still be there.

    If for some reason this isn't deemed as important, please at least consider to NOT HARDCODE [InternalBufferCapacity] in your packages and make it a config value which you can set in the editor and codegens a struct with constants or set it to 0.
     
  2. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    866
    How would you store Hierarchy data? It is one area where ECS is weak. I think the new transform system at least makes it simpler from a scripting perspective for transforms.

    For my hierarchical pathfinder I don't use ECS and have custom data structures.
     
    Last edited: Oct 19, 2022
  3. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    That's largely for Unity to decide and there are quite a lot of options how to implement it.
    From a design standpoint I'd split them into static and dynamic hierarchies. With the former being the most performant of course.

    The reason for the split comes mostly from that the baking workflow which can figure out how much memory is actually required for hierarchy data and is able to allocate the exact size, tightly packed. Similar to blobs. Maybe even blobs under the hood.

    Dynamic hierarchies can reside in a slower, multi hashmap like data structure or even be totally ignored and just left like they are implemented now, with DynamicBuffers.

    API can resemble what we are used to with GetComponentTypeHandle, etc... and strongly typed hierarchy types.

    Going a step further, this could be used as a very generic memory space for entity metadata that is static, like blobs, just without the random access, well, let's just say as close as possible together in a chunk or parallel to it.

    This could all be implemented by ourselves in our own packages, there's nothing stopping us. The problem and also the reason why I bring this up is that Entities and the Physics package (Netcode too?) all rely on this as well so some overarching design has to be put in place. Just winging it with some fixed number on a DB is not cutting it.
     
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    I'm not sure if the custom hierarchy thing is necessary either. I think there are other more general solutions for the problems they most commonly caused.

    But please fix this default DynamicBuffer nonsense!

    LinkedEntityGroup is 144 bytes per instance. 144 * 128 = 18,432
    18432 > 16 kB.

    The mere presence of LinkedEntityGroup alone prevents maximum chunk capacity. The default capacity not being 0 is just dumb. Only when you know the typical length of a dynamic buffer should you ever set it to something else.

    @Enzi if you have the time, it may be worth it to modify package source and share before and after profiler captures in your project. Really, Unity should do that themselves and share their findings, but they set a stupidly aggressive deadline for themselves and things are on fire...
     
    WAYNGames, Luxxuor, TWolfram and 7 others like this.
  5. chemicalcrux

    chemicalcrux

    Joined:
    Mar 16, 2017
    Posts:
    720
    I was just thinking about this a day or two ago -- I was trying to scrape by with only child->parent relationships, since I was concerned about the cost of all of those dynamic buffers for parent->child relationships.

    As OP noted, I only needed the hierarchy in very limited circumstances. Fortunately, it turned out that the performance gains from making the link bi-directional vastly outweighted any costs :p

    However, I'm definitely miffed by the default LinkedEntityGroup size right now. All of my prefabs are either single entities, or only contain a few children, so that's a ton of wasted space!
     
    Krajca likes this.
  6. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    866
    I tend to either have heavy weight Entities like vehicles and characters or light weight entities such as projectiles, building and trees.

    The light weight Entities generally outnumber the heavy weight ones. In some cases the light weight Entities don't even need a hierarchy. When you need a hierarchy such as destroyable buildings a simple static hierarchy with only child to parent relationships would be fine. Where only the bottom most leaf child can be removed.

    The downside would be added complexity. Maybe, and the idea is very ephemeral, different options like nonHierarchical, lightHierarchical, heavyHierarchical would be useful. Probably something for the future though beyond 1.0.
     
    Last edited: Oct 20, 2022
  7. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,769
    You know you can set capacity of dynamic Buffers.

    And resizing length of buffer beyond default capacity, it then is stored outside chunk.
     
  8. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    I think everybody in this thread is very aware of that. The point is we can't do that for internal buffer elements that Unity is using and the answer should not end up in, well, make the packages local and change core code. There has to be a better way.
     
  9. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    I've written a small test to proof my point a bit more. First I tried with physics but that skewed the test too much so I've removed that but still used the keypair buffer from the physics package.

    The setup is 1 million cubes each tagged light or heavy. The difference is that heavy has a PhysicsColliderKeyEntityPair buffer.
    The actual "mechanic" is just some addition and multiplication on a value that's written back.

    Default: LinkedEntityGroup has no internal cap, keypair uses 16
    Chunk capacities and reserved bytes
    light: 48 - 332 B
    heavy: 29 - 540 B
    upload_2022-10-20_3-32-40.png

    LinkedEntityGroup with an internal cap of 0
    light: 80 - 204 B
    heavy: 39 - 412 B
    upload_2022-10-20_3-32-46.png

    LinkedEntityGroup with an internal cap of 0
    PhysicsColliderKeyEntityPair with an internal cap of 0
    light: 80 - 204 B
    heavy: 72 - 220 B
    upload_2022-10-20_3-32-51.png

    As you can see, just changing the internal buffer capacity and with it, increasing the chunk capacity brings substantial increases in performance.
    Often these exact buffers are unused in practice, bringing performance down by default. (yes I made this pun, I'm sorry :D )

    As a frame of reference what the optimized "heavy" entity looks like:
    upload_2022-10-20_6-15-5.png
     
    Last edited: Oct 20, 2022
    bb8_1, OndrejP, daniel-holz and 13 others like this.
  10. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    Amazing! What are the chunk capacities with and without the default capacity change?
     
  11. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    merged into the previous post.
     
    Last edited: Oct 20, 2022
    JesOb and DreamingImLatios like this.
  12. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    We are currently working on the
    PhysicsColliderKeyEntityPair
    buffer issue. Stay tuned!

    Thanks for bringing this issue to our attention and for your patience. :)
     
    Last edited: Jun 8, 2023
    UniqueCode, Occuros, bb8_1 and 4 others like this.
  13. cort_of_unity

    cort_of_unity

    Unity Technologies

    Joined:
    Aug 15, 2018
    Posts:
    98
    Just for completeness, LinkedEntityGroup's default capacity was reduced to 1 as well several months ago.
     
  14. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,759
    (It just takes months to go public :p)

    But huzzah, I am so happy to hear this. I can maybe finally remove my override for making it 0 capacity.
     
    daniel-holz likes this.
  15. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,759
    If you ever need evidence why forcing LEG is bad, I was going to take a crack at optimizing this poorly optimized benchmark that has been going around https://github.com/maskrosen/combat-bees-benchmarks/

    And all I did was install my Core library (https://gitlab.com/tertle/com.bovinelabs.core) which sets up a post baking system to strip it from entities it doesn't belong on (prefabs with 1 element)

    Before


    After


    21.3% faster before I changed or even look at a line of code.



    and a nice memory reduction as well
     
    UniqueCode, Rukhanka, Enzi and 2 others like this.
  16. JesOb

    JesOb

    Joined:
    Sep 3, 2012
    Posts:
    1,109
    Hi thought about dynamic buffers and came with conclusion that we can, potentially, support many default capacities of dynamic buffers so package end users can choose per archetype what default capacity will be.

    How it can be?
    for instance with LEG we need few components like
    struct LinkedEntityGroup - existing one with default capacity 1
    struct LinkedEntityGroup_0 - with default capacity 0
    struct LinkedEntityGroup_2 - with default capacity 2
    struct LinkedEntityGroup_4 - ...
    struct LinkedEntityGroup_8 - ...

    - what we need is ability to state that all those LEG_X structs derived from LEG, may be with attribute
    - then archetype register any LEG_X component: register LEG_X and LEG but make they point into the same array inside archetype chunk. So any query that want to get LEG will get pointer to LEG_X and treat it as LEG and this will work because of how any DynamicBuffer internally works.

    we can create query for exact LEG_2 component and get chunks with only LEG_2 if we want. In some cases it can be helpful.

    This way end users can optimize archetypes for each case separately. and even in runtime if we want.

    Those who know better how archetypes internally work can you say something about this idea? Can it work at all? Can it be helpful in real life?
     
    Last edited: Jun 11, 2023
  17. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    The only difference with the various capacity values is that the buffer occupies a preallocated amount of memory with size equal to the specified
    buffer capacity
    (specified by the
    InternalBufferCapacity(x)
    attribute) times the
    buffer element size
    (plus some small overhead for the buffer header for buffer access and management), which is then embedded within the corresponding archetype the entity belongs to.

    You can then add "capacity" many of these elements (in this case the
    LinkedEntityGroup
    element) to the buffer without causing the buffer to get relocated outside of the archetype and onto the heap.
    So, you can control for up to how many of those elements you want to keep their memory "close" to your entity, inlined with the rest of the entity's components.

    This number totally depends on the use case.

    In some cases you might need many elements in your entities and for many entities, and you also need to access many of these elements for many entities often. So, assuming efficient, sequential access patterns, keeping these elements' memory inline will likely reduce cache misses upon access and consequently improve performance in this particular case.

    But sometimes you don't need any elements associated with your entities or only very rarely you do. In this case, relocating the buffer to the heap always is likely the best option (by setting the capacity to zero). In this case, the few times you do need to access elements in the buffer causes equally few cache misses upon access, yielding a likely insignificant performance hit. But in the higher number of occasions when you don't need to access these elements, you don't have memory waste in your archetype leading to more tightly packing of the relevant data and consequently a higher number of entities in your chunks.
    This can then lead to a significant performance improvement in these "no buffer element access" situations as you can just walk within the chunk in contiguous memory without having to jump over the gaps induced by these buffe relements, giving you better cache coherence.

    So, you are totally right that it would be useful to make this (inlined) buffer capacity configurable in some way, given the very use case dependent performance differences you can get with that parameter.

    In the interim what would be useful as well is to only add the buffer in situations when it is needed which is from what I understand not the case today.
     
    Last edited: Jun 10, 2023
  18. cort_of_unity

    cort_of_unity

    Unity Technologies

    Joined:
    Aug 15, 2018
    Posts:
    98
    Thank you for the benchmark! We knew that reducing the LEG waste it would help, but seeing the magnitude of such a simple change is encouraging nevertheless.

    Since you already have these two data points, I'd be interested in seeing a third one: what happens if you leave the LinkedEntityGroup in place, but add [InternalBufferCapacity(1)]? I assume it would be somewhere in between the two times you provided (since 0 bytes < 24 bytes < 144 bytes), but it would be nice to see exactly how much we're paying by having the empty (smaller) buffer there at all.

    I don't think anyone would argue that having LEGs on prefab instances without a hierarchy is just a waste of space & time. The obstacle we'd have in removing it in those cases (at least in the short term) is that it would change the observable behavior of bakers and potentially break existing code, which is something we have to be mindful of in this exciting new out-of-preview, post-1.0 world. Benchmarks that quantify the benefits of doing so would help inform that decision.
     
    daniel-holz likes this.
  19. cort_of_unity

    cort_of_unity

    Unity Technologies

    Joined:
    Aug 15, 2018
    Posts:
    98
    This is the big piece I see that's missing today; I think we support templated components (I've never tried personally), but definitely not a way to express any relationships between them or reason about their common root. So we can say "does this entity have LinkedEntityGroup_2?" but not "does this entity have LinkedEntityGroup<X> for any X?"

    I'm not saying we couldn't build it, or that it wouldnt be worthwhile! But that's the scale of feature work we'd be signing up for in order to support something like this.
     
  20. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    I appreciate this finally getting proper attention! I'll just add that if you do decide to remove LEG for instances where there is only one entity, then please set the InternalBufferCapacity to 0 instead of 1.
     
    Enzi likes this.
  21. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    959
    I also appreciate that this is looked at more carefully now.
    The changes I want to see with LEG here:

    - reduce the InternalBufferCapacity to 0
    - do NOT add LEG when the entity has no children, this includes prefabs where LEG is always added

    I don't see the value of it being 1. It's not important runtime data or called upon each frame. The data can safely live outside of the chunk and we won't see any performance regressions because of it. Even worse, the first entity would always be itself, so another copy of the entity index/version lives in the chunk which is just redundant data.

    When we want to destroy a root entity we just destroy the entity itself. No path has to go to LEG. In the case the entity has a hierarchy, adding itself to LEG makes sense. Just iterate over LEG and destroy. Perfect.
     
    JesOb likes this.
  22. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    The improvements related to the
    PhysicsColliderKeyEntityPair
    buffer are now available with the newly released 1.0.11 version of Unity Physics.
     
    Enzi, tertle and Harry-Wells like this.