Search Unity

Archetype memory layout alternatives

Discussion in 'Entity Component System' started by Enzi, Dec 8, 2021.

  1. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    962
    Hey!

    The current archetype memory layout is optimized for simd or vectorizable code.
    Not every code can take advantage of simd due to several reasons so those types are pretty much left out in a way.
    My reasoning and example here. Let's say you have an archetype of 20 components. Every component is read in the job. Currently, reading the first IComp (A) will read IComp (A) in the cacheline of the next several entities but the data isn't actually used until the next iterations, leaving the cachelines with worthless data.
    In that case it would make more sense that the memory layout is reversed, instead of a reading the next entities it would read the next components of the same entity so access to those is faster.

    I was discussing this in Discord and someone pointed out Bevy ECS has the approach of archetypes and sparse sets as option which makes exactly this distinction.

    Are you also thinking about giving us this option?
     
    SolidAlloy likes this.
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    Why are you iterating over 20 components at once in a memory-bound hot loop? If those components are never iterated over individually, then it might be worth combining them.

    Bevy's sparse sets is for faster structural changes. Unity is likely going with a different solution for this, specifically Enabled bitmasks.
     
    bb8_1, Krajca and MaNaRz like this.
  3. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    962
    I know too little about bevy. I don't even know if they use 16k chunks, it was just an observation that there's an ECS that supports more than one memory layout.

    Combining is a possibility but also introduces a bunch of other problems. Other jobs that don't operate on the full archetype would read useless data which is exactly why we don't use bloated struct components.
    It would be essential to use a reference type because not every field in the struct changes and writing is very costly.
    Flexibility, modularity and readability goes down but that's something I could ignore.
     
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    You have the same issue when reordering components in memory to not be independent data streams.

    Also I just realized that most L1 caches can fit nearly two chunks, so while prefetching might get a little confused if you read lots of components, you are at least less likely to thrash cache lines.
     
    Enzi likes this.
  5. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    962
    Ah, I wasn't thinking about the L1 cache size. Guess this is a non-issue then or wouldn't change much when it comes to memory layout. Not really explaining my quite costly reads measured in vTune. Must be something else then (or normal).
     
  6. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Our current thinking on the future of archetype layout is this: (This is the current plan of record for Entities 1.0)

    * Adding / removing components at runtime should probably only be something you do rarely do. Specific cases where it makes sense: procedural generation / runtime construction of entities. Or when doing long lasting changes that must be kept modular, eg. main character equips a new weapon that comes from an inventory of a 50 possible components. So the API's will continue to exist as they do today, but the documentation will tell you not to use it at runtime and the samples will no longer do it.

    * What replaces the constant removing / adding is enable bits on IComponentData. So at authoring time you bake out an entity with all the components you want at runtime, some of them disabled to start with. And at runtime you just enable / disable them. The query unterstands these enable bits natively and thus treats a disabled component as if it doesn't exist from the perspective of the query. (HasComponent still returns true for disabled components, matching current GameObject behaviour)

    NOTE: This approach can be prototyped manually today by just putting a bool in your IComponentData and based on that early out in your ForEach or whatever. (Naturally thats harder to maintain & likely slower than something native & batched).

    So the concept of additional sparse storage doesn't feel like a sensible solution to me because it has two implications:
    * The hard gurantee that a run of batches with matching array data layout is broken. You need to have an if for all loops now. This increases generated code / reduces ability to auto-vectorize & optimize codegen.
    * We don't see a reason why adding / removing components frequently at runtime is a useful pattern in the first place.
    * One incredible thing this approach enables is the ability to in a time sliced way, "Realign" enable bits to reduce the number of skipped entities when iterating over batches (By moving entities to different chunks). We could imagine something where you could say every frame i want to spend time to move 100 entities so that their enable bits are next to entities with the same enabled components.

    Happy to hear thoughts if someone wants to challenge this line of thinking.
     
    Last edited: Dec 10, 2021
  7. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    792
    As I understand it, the layout is less about simd/vectorizable and more cache friendly.
    There is an ECS layout concept, that group the components by components, not per entity. But this only works as long as mine iterates over a single component, if you need more than one component, then you have random memory access again.
    There are also ECS frameworks like EnTT for C ++ that don't work with archetypes but are manually grouped components. However, this also has its limitations, e.g. overlapped groups do not work (subsets work a lot like archetypes), and i think you can not change a group afterwards. This then requires much more detailed planning of the layout beforehand.
     
  8. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Yeah sure there is plenty ways to decide how to lay out the data.

    Making it flexible is of course possible, but it comes with real world downsides in terms of codegen bloat & compiler knowing statically what the layout actually is and thus being able to generate better code & less instructions to get to the data.

    We made all our decisions based on optimising for the most optimal & thing used at scale, should give speed of light performance. Meaning lots of entities with small components & systems accessing multiple of those components at the same time.
     
  9. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,770
    @Joachim_Ante a little challange here. Well, rather a question regarding components tags (without data).
    As far I remember correctly, while ago you mentioned, they are handle a bit differently than components with data.
    Actually how differently they are handled.

    Do you consider a bad thing, if deciding to add / remove tag components on entities, when alternation are executed no faster than every seconds/minutes.

    I don't mind spend some fraction of ms, to add /remove entities components tags, while I can save other jobs from processing them in long run.

    Other matter: so instead of using has component, we can use is component enabled / disabled. But that requires entities o have that component, right?

    Just for clarification, I gather tha jobs do understand quieres with component enable / disable state and filters them correctly?
     
  10. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Yeah i think those are tradeoffs. And if we talk about seconds/minutes, then it can definitely make sense.
    It does however lead to possible chunk fragementation. So there is a bunch of tradeoffs to consider. Hence using it by default is a bad idea. (Made worse that in 0.17 there is no builtin alternative of course...)
     
    Antypodish likes this.
  11. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    962
    My question wasn't so much about adding/removing but I see the solid arguments of just having a better data design. Thinking about big structs vs small one field structs that are able to get auto-vectorized makes my head spin how to get a good balance. Having a different chunk memory layout would not solve this so I don't think pushing my initial question will lead to any improvements. :) And I fully much agree that simd'able code and data design should be the leading concept of Entities. I'm in the belief that a sparse set would make read speed of smaller aligned IComps better but there's too much contradictions, too little to gain and I see the problem of supporting these 2 models.

    On topic of adding/removing via enable/disable @Joachim_Ante:
    Will enabling/disabling work in bursted parallel jobs without an entity command buffer?
     
  12. nopacc

    nopacc

    Joined:
    Jul 13, 2017
    Posts:
    5
    @Joachim_Ante The entitie 0.5 will contain enable/disable function will ?
     
    desertGhost_, Tony_Max and bb8_1 like this.
  13. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,684
    :)
     
    Antypodish likes this.
  14. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    No. Thats for 1.0. Most of the code is in there but ifdef'ed out, since there were still corner cases to fix when we branched off 0.5.
     
    Tony_Max and Antypodish like this.
  15. desertGhost_

    desertGhost_

    Joined:
    Apr 12, 2018
    Posts:
    260
    Will it land in an intermediate release between 0.5 and 1.0 (like 0.6) or will this feature not show up until 1.0-preview.01? Will there be releases like 0.6 or will patches just be to 0.5-preview.01, 5.0-preview.02, etc?
     
  16. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    Give us custom sized chunks and I will be more than happy. Many of my entities are spatially grouped using ISharedComponentData, so I have tons of mostly empty chunks.
     
  17. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    @Joachim_Ante I think last year u mention about tiny chunk and mega chunk feature. Will it available at dots 0.5 release?
     
  18. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    Custom sized chunks was discussed 3 years ago and unity stated that they might look into it in the future…I guess that’s still true.
     
  19. JooleanLogic

    JooleanLogic

    Joined:
    Mar 1, 2018
    Posts:
    447
    How is no one talking about this significant change in direction?
    Dynamically adding/removing components has been the very basis of Unity's ecs architecture since the start. It's the only way to process only what's required vs polling.
    You're now strongly touting enabled bits as defacto technique, something that will change how we design and architect our code and entities, but which isn't even available to us anytime soon.

    The advantages of this feature are enticing though ethereal at this point, but given the amount of discussion that's occurred around the add/remove methodology over the years, how are you already so confident that enabled bits is the way forward when it's had no community testing?
    This is surely going to be a significant and disruptive feature drop that will kick off a new wave of design and performance discussion and issues. None of which is bad. But you're saying it's going to drop untested (by community) in v1.0?
    I would have thought this would be an experimental feature update awaiting lots of user feedback before we get to a version 1.
    Isn't this a solution to a fragmentation problem introduced by enabled bits in the first place? Is this not also the exact cost problem of moving entities you were trying to solve?

    On the architecture side, how would you actually achieve this? On which component streams do you defrag by?
    Code (CSharp):
    1. EEEEEEEE
    2. AA-A---A
    3. B--B-BB-
    4. CCCC---C
    You can't defrag on component A without frag'ing the others can you? Perhaps I'm missing something here.
     
  20. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,296
    I think idea is to keep all components in chunk, but have them "disabled".
    Structurally, entities with disabled components will be in the same chunk, whereas now, if you remove component or add component - you'd allocate a new chunk, move entity there. Multiple operations in a row would allocate, move, allocate, move etc.

    This causes issues if you have small structs and add / remove components often, which would lead to chunk fragmentation (small number of entities (or even 1) per chunk). Since chunk is 16KB's each - that's a lot of memory and ops wasted.

    Functionally, I think Enabled / Disabled state bitmask is stored somewhere else (in chunk metadata? idk).
    And querying will be done based on that metadata.
    So technically its the same thing, except without performance impact of structural changes.


    While it looks daunting to refactor, it shouldn't be that big of a deal to swap out calls to AddComponent / RemoveComponent with something like SetComponentState<T>(state). Refactoring wise its trivial.

    However, AddComponent and RemoveComponents are great for runtime entity generation. E.g. procedural entities.
    So I'd like to have them around, but ensure proper API is used. (via manual or attributes?)
     
    Last edited: Dec 14, 2021
  21. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,761
    I don't think this is a huge surprise to many forum regulars. Anyone with decent Entities experience has long realized this was a bad approach and packages like Netcode already require pretty static archetype to work properly.

    My libraries already do very few archetype changes and where they do they are all tagged with " TODO replace with enable bit". Been prepared for this change for over a year!
     
    Last edited: Dec 14, 2021
    SolidAlloy, KarimTA, Micz84 and 3 others like this.
  22. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    962
    I can't speak for @jooleanlogic but if he's thinking like me it's more of, I worked around having no add/remove and I've to restructure fundamental architecture because with 1.0 it's viable and a better solution.

    What I really wonder is what that means for SIMD? How does it work when there are disabled components in between? I would imagine it being incompatible unless some technical wizardry is going on.
     
    JooleanLogic likes this.
  23. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    I'm curious if you have any suggestions to alternatives to reactive systems with or without SystemState components? (SystemState is only required when there is teardown logic.) I don't use reactive systems for events, but I do use them for post-spawn initialization and wiring entities to external data structures.

    It is incompatible in most cases, but so is nearly all other chunk iteration code. Unless you are specifically writing code to autovectorize, you won't notice a difference. And in the case of real autovectorized code, I suspect you will have to manually vectorize it to use conditional movs.
     
    JooleanLogic likes this.
  24. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,761
    Not really, I've just coincidentally use them a lot less these days. Not sure why, might just be the type of work I'm doing.

    Funny enough though I was actually just thinking about this topic of SystemState components and how Unity intends for them to work with a desire for bit fields and static archetypes. Is there an alternative that Unity is looking at that could work nicer with bit fields?

    (I actually have so many questions about this whole bit thingy.)
     
  25. Kmsxkuse

    Kmsxkuse

    Joined:
    Feb 15, 2019
    Posts:
    306
    Yea. Manual vectorization using MaskStore (for AVX2) could be an option. We already have the mask in the form of the bitfield (just extend to a v256 using SLLV).

    I've run some tests and SLLV to generate a vector mask and MaskStore is actually very slightly faster than operating in a for loop over the same 8 ints (or whatever combination). A few microseconds faster on average using mathematics random .NextInt() though. However MaskStore is nowhere near as efficient as simply just storing. Depending on the operation, it roughly in the range of half a millisecond to several milliseconds depending on how many it is iterating though, which is significant.

    Alternatively, does it matter if we actually iterate on disabled components? Disabling does not have to mean "pausing" or "freezing" the component's values. The values of the components can still be operated on just signaled via disabled bit to not actually be used. Re-enabling will of course will require resetting the value somehow on the code side which might be difficult.
     
  26. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    I think I understand the general algorithm and structure behind them (mask, tz, process, shift; and toggle by write handle). But there's a lot of unknowns regarding system state, special components like Prefab, and chunk split batches.

    Sometimes it would, and sometimes it wouldn't. In the cases it wouldn't, autovectorization would still work, but we can actually do better. In AVX2, you can use VPCMPGTB and VPMOVMSKB to get a mask of batches of 8 entities, and then for each batch of 8 entities use SIMD instructions on them. By having the batches as a bitmask, you can use the mask -> tz -> process -> shift algorithm to skip empty batches, which could be helpful when valid entities are sometimes sparse.
     
  27. JooleanLogic

    JooleanLogic

    Joined:
    Mar 1, 2018
    Posts:
    447
    Yes this is correct and highlights the fragmentation dilemma with ecs for which there is no solution, only trade offs.
    When you add new components, those entities move to a new chunk thus making the data contiguous for the new component, but fragmenting all others across chunks.
    With disabled bits, you fragment within the chunk.

    Fragmentation is inevitable cos you have competing (for contiguity) components that can't all be satisifed. It's not so much a problem as something that just is and that you have to decide where to make your tradeoffs.

    I guess Unity have tested and come down on the side of interchunk fragmentation being more performant which I have no problem with at all. I can see other benefits to enabled bits vs add/remove as well.
    Yes I think I was the first to ever raise a thread on it here. I also tested seperate enabled byte streams early on but found mixing that approach with add/remove got too complicated.
    You're right though, I went the add/remove route which had been the default recommendation from the start and then got on with my game. I haven't been following a lot of the other packages where I guess many have progressed to static archetypes already.

    I have no issue with enabled bits, I was purely surprised at how strongly Joachim advocated it as essentially a replacement for add/remove given that no one even has access to it yet and won't for some time. Also that no one else much responded to Joachim's information rich post above.
    Also that he said "Happy to hear thoughts if someone wants to challenge this line of thinking." which makes me think they're still trying to work it out.
    Ditto, hence why I'm surprised not more questioning.
    There's still fragmentation just of a different kind now. What impact is that going to have? How badly does it affect prefetching if at all?
    What issues are going to arise from packing out your archetypes with all possible components? Is that even feasible? Entities can go through a lot of spatial/temporal behaviour changes over their lifetime where some may only be very temporary.
    With densely packed archetypes + bit streams, any moving entities is going to be even more costly.
    What about the confusion of having two different methodologies now? What about future ecs asset packs if some developers use enabled bits and others add components?
    Also state components and reactive systems (which I use) as DreamingImLatios raised.

    None of this is much going to affect my game as it's too small, but I'm very interested in the code design benefits of enabled bits as I find managing adding/removing components and ecbs fairly cumbersome. I think static archetypes are going to be a relief for me to switch to.

    *Edit: Fixed link.
     
    Last edited: Dec 16, 2021
    Enzi likes this.