Search Unity

Benchmarking tag component v2

Discussion in 'Entity Component System' started by 5argon, Dec 19, 2019.

  1. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    I did it again because back then I didn't have the performance testing package.



    Suppose we have many entities but we would like to work on only some of them. There are 2 approaches :
    • "Mark" : You have some kind of check, and if this is true, you work only on this entity. Advantage is that we don't have to pay for chunk movement cost on tagging entities one by one. Disadvantage is that we cannot skip work, we must always iterate through all and check which one to work on.
    • "Tag" : Utilize the chunk unit of query in Unity ECS. When adding a component tag, the entity would change its archetype and cannot stay in the same chunk as those not tagged. Then EntityQuery could bring in only the chunks that contains the tag and we can work on these entities without any if in the code, because we already pay for the if when we decided to tag or not. Disadvantage is that we must pay for chunk movement cost in addition to that if to decide to tag. But an advantage is that we can keep using these tagged chunks for extended period of time where the mark approach must always iterate through all entities.In this test, tagging is queued into EntityCommandBuffer while in bursted main thread Entities.ForEach that contains conditional checks which one should be tagged, then playback out of that, then an another bursted Entities.ForEach work on those tagged entities.
    • "Affected entities" : Entities that must be tagged then worked on, or entities that returns true on conditional checking and will be worked on in the case of marking.


    Analysis
    • If not counting the tagging cost, by iterating through only the tagged entities without worry about checks cost about half cheaper of the marked equivalent. (The work in this test is adding 100 to an int.)
    • If the work target changes that often that it results in this much tagging consistently each frame, you are better off with just conditional on some data. (Anyways, can you design to avoid this and use tag in the first few frames or less often instead?)
    • If you can tag once in a now loading screen or something then keep using the tagged chunks, tagging almost always worth it.
    • You may consider mixed approach, tag some, add SCD to some for EQ filter, then still have inner if for detailed checks. Going all the way to remove all if maybe too much especially those that could be an inline-if :. If the if looks more like a control flow that could be solved with tagging, it is likely that tagging will benefit.
    Notes
    • Using Entities 0.4.0 preview.10, Burst 1.2.0 preview.11. In editor, MacBook Pro Early 2015, Burst on, safety checks off, leaks detection off.
    • I gave an edge to the mark approach by using inline-if where both side returns a value. This results in a better assembly that do not require logic branching. In real program, you may have a real if that costs more than this per entity.
    • Tagging may also cost more if other component that did have data on the entity to be moved is larger. This test is just one int.
    Test code
    https://gist.github.com/5argon/849144dd3e766f415c08526dbfa0dcf6
     
  2. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    I remember it's 23ms -> 0.56ms per tag right? Great work!
     
  3. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    Then, adding tags/components that lasts for less than a frame is literally the worst thing you could possibly do?
    I've been doing plenty of that
     
  4. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    I found you might miss sth. We all know that when there is a structural change, entity would be moved to another chunk. But what if the entity has many components? Cost of copy operation might have a linear relationship with count of components.
     
  5. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    It sounds like you would rather use ECB to perform the work after tagging directly, or use event system solution like NativeStream that tertle made.

    The take away I got is while it sounds like everything in ECS is fast and we are trying to utilize the C, sometimes I forgot that just plainly iterate things also got boosted by Burst, linear array, etc. so it could be just as fast.

    The bool for marking case here has been separated into its own component of just a bool instead of making a bool field to the work componenet. This is to make a linear array of just bools to help the outer if to be as fast as possible while searching them. (Get extra bools in each cache line read instead of getting contents not related to the check, so false case get better performance.)
     
    eterlan likes this.
  6. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    I have been thinking about that too, so on tagging a single entity out this happens

    1. If destination chunk doesnt exist reserve/alloc/reuse some chunk slot. Should be one off unless you completely remove all tags in each run in that case I think you shouldnt tag. Chunk capacity depends on total components so if it has many, we may have to pay more ofte for this due to low capacity.
    2. Just that entity copied to the destination chunk along with all its component. Copy should also depends on size obviously.
    3. The hole that was left got filled by the final element of the chunk. This is an another copy.

    Rerun the test should be easy, I can just add tons of fields. In a moment..
     
    eterlan likes this.
  7. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,779
    Thx for test an analysis.

    I always considered using tagging, only when need them change occasionally. Or maybe during instantiating procedure, which requires multi system to cooperate. Or few system need process data temporary, only on tagged entities, avoiding iteration over unnecessary entities.

    Otherwise for often changes, bool or enum is more convenient, for me.

    I think one thing is always overlooked in similar discussions from the past. Always considering one system to work on.

    But if having multiple systems, which relay on bool marking/tagging for thousands of entities, with tagging we can reduce amount of required processing by quite significant portion.

    However, I am curious about analysis, of lets say tagging entities, with multiple components and buffers. I suppose, buffers would be easier to test, as just require to populate number of elements. As long as it fits in chunk.

    Do you think you could run such test for us?
    Maybe also add some dummy systems, which consider tagging/marking filtering approach. But probably I could just use results, of last two columns, from initial test, and multiply by number of systems?
     
    Last edited: Dec 19, 2019
  8. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    Ok done. The previous case is just 2 int per entity, plus one bool on an another component. To simulate real use case, I am going to attach this while not using it in any way in the test.

    Code (CSharp):
    1.  
    2. struct Junks : IComponentData //<-- add this component
    3. {
    4.     public Junk junk1;
    5.     public Junk junk2;
    6.     public Junk junk3;
    7.     public Junk junk4;
    8.     public Junk junk5;
    9.     public Junk junk6;
    10.     public Junk junk7;
    11.     public Junk junk8;
    12. }
    13.  
    14. struct Junk
    15. {
    16.     public int4 i1;
    17.     public int4 i2;
    18.     public int4 i3;
    19.     public int4 i4;
    20.     public int4 i5;
    21.     public int4 i6;
    22.     public int4 i7;
    23.     public int4 i8;
    24. }
    25.  

    Additional 4*4*8*8 = 1024 bytes added to each entity. Because chunk is 16kB, the chunk capacity now should be around only 15 entities. Maybe this is a bit excessive but let's see. Data wise this is why both mark and tag case would take a hit in performance from this unrelated component :
    • Mark : When we iterate to check bool, remember that each element came from separated chunks. When the chunk capacity is low, we cross chunk border more often. The query that brought you chunks in the first place would have to work more also.
    • Tag : When tagging and moving entity out of chunk to a new one, we pay 2 copy cost for moving it along with all its component away, and copy swap back the last element of old chunk to fill its hole. Also when chunk capacity is low, we may also have to reserve new chunk more often. This is added on top of chunk crossing on iteration like the marking method.
    Also I will do this as well, instead of adding a Junks component, I add that as a wasted large field in the same component we work on instead.

    Code (CSharp):
    1. struct Work : IComponentData
    2. {
    3.     public int id; //<-- use to determine whether to mark or tag.
    4.     public int value; //<-- work on this.
    5.     public Junks junks;
    6. }
    This is bad because as we iterate to work on the value field, the remaining data that we would get for free in one cache line read would be junks instead of next component. Using small component benefits performance because of this.


    Analysis
    • Looks like the marking way take a big performance hit at some threshold. It should be related to something about device's cache size that while iterating it has to go to RAM more often.
    • Both case I think took similar performance hit. It is not that tagging or marking way would scales more well on larger entity.
    All three side by side :

     

    Attached Files:

  9. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Cool! It seems like tag scales better than mark at a large amount level, which is surprising! What I guess before this test is tag would be much slower when there are 100 components on an entity.
    I'm thinking what about using extra description entity-component to handle frequently moving state, like a table. Each state is a HashSet<Entity>, what about that? But as you said, we don't have proper tool to iterate entity array or set extra filter with entity array, what we have is ComponentDataFromEntity.
     
  10. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    827
    Large entities are just slow regardless of how you use them it seems.
    Making more entities instead of making the larger seem to be the way to go.

    What happens if an entity is bigger than the chunk?
     
  11. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,779
    @Guedez do make mi wonder.
    In case of large buffers, they simply will be stored on heap. Not in entity. So no direct impact on moving between chunks.
    But if having many (too many) components, it makes me think now?
     
  12. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,685
    It can't. It will throw you error. -> throw new ArgumentException($"Entity archetype component data is too large after adding {componentType.ToString()}. Previous archetype size per instance {archetype->InstanceSizeWithOverhead} bytes. Attempting to add component size {componentInstanceSize} bytes. Maximum chunk size {chunkDataSize}.");

    Code (CSharp):
    1. [Conditional("ENABLE_UNITY_COLLECTIONS_CHECKS")]
    2.         public void AssertCanAddComponent(Archetype* archetype, ComponentType componentType)
    3.         {
    4.             if (componentType == m_EntityComponentType)
    5.                 throw new ArgumentException("Cannot add Entity as a component.");
    6.  
    7.             if (componentType.IsSharedComponent && (archetype->NumSharedComponents == kMaxSharedComponentCount))
    8.                 throw new InvalidOperationException($"Cannot add more than {kMaxSharedComponentCount} SharedComponent to a single Archetype");
    9.  
    10.             var componentTypeInfo = GetTypeInfo(componentType.TypeIndex);
    11.             var componentInstanceSize = GetComponentArraySize(componentTypeInfo.SizeInChunk, 1);
    12.             var archetypeInstanceSize = archetype->InstanceSizeWithOverhead + componentInstanceSize;
    13.             var chunkDataSize = Chunk.GetChunkBufferSize();
    14.             if (archetypeInstanceSize > chunkDataSize)
    15.                 throw new ArgumentException($"Entity archetype component data is too large after adding {componentType.ToString()}. Previous archetype size per instance {archetype->InstanceSizeWithOverhead}  bytes. Attempting to add component size {componentInstanceSize} bytes. Maximum chunk size {chunkDataSize}.");
    16.         }
     
    NotaNaN, 5argon, Guedez and 1 other person like this.
  13. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Also I want to point out that, we can skip work by simply add a ChangedFilter for each system..
    Time complexity now change from per system per entity per frame to system per chunk per frame.
    BTW, when there are many tags as state transition, it's bug-prone.
    It's harder to make choice..It seems like Tag wins in good looking but loses in performance.
     
    Last edited: Dec 20, 2019
  14. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    Right, sometimes we can get a "free tagging" effect that lasts for one round of update on writing data.

    But in this test code it is not possible to get a free performance from changed filter. As to determine whether to mark or not, the ForEach that produce the chunk already set the chunk to write mode and they are all dirty, no matter the early if line returns true and actually results in a write or not. (With `ref` on lambda all chunks are considered written and will bump version as changed)
     
    eterlan likes this.
  15. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    emmm..May I shamelessly ask about add a SCD as tag test? Haven't try this testing extension.. Just curious to know whether it's as fast as normal tag..I remember someone said tag would be treat differently.:p
     
  16. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,555
    SCD should be the same as tag component as it technically is a zero-sized component that only need to be pasted on the chunk header. There is only an added routine on hashing SCD value to determine whether to produce a new SCD index or use which existing one. SCD tagging is not possible to do in a job even with command buffer that's the downside. But you can instead use one SCD type as multiple tags with query filter.
     
    eterlan likes this.
  17. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    I add some simple test, and surprisingly find that SCD is 20% faster than tag. Not sure, for some reason I cannot disable Leak detection.
    Code (CSharp):
    1. class JustSCDTag : JobComponentSystem
    2. {
    3.     public int AffectedEntities;
    4.  
    5.     protected override JobHandle OnUpdate(JobHandle inputDeps)
    6.     {
    7.         int affected = this.AffectedEntities;
    8.         var ecb      = new EntityCommandBuffer(Allocator.Temp);
    9.         Entities.ForEach((Entity e, in Work w) =>
    10.         {
    11.             if (w.ID < affected)
    12.             {
    13.                 ecb.SetSharedComponent(e, new SCDTagMark
    14.                 {
    15.                     Tag = -1
    16.                 });
    17.             }
    18.         }).Run();
    19.  
    20.         ecb.Playback(EntityManager);
    21.         // Note. Dispose ecb if you need to manually playback it.
    22.         ecb.Dispose();
    23.  
    24.         return default;
    25.     }
    26. }
    27.  
    28. [DisableAutoCreation]
    29. class WorkOnSCDTag : JobComponentSystem
    30. {
    31.     public int AffectedEntities;
    32.  
    33.     protected override JobHandle OnUpdate(JobHandle inputDeps)
    34.     {
    35.         Entities.WithSharedComponentFilter(new SCDTagMark
    36.                 {
    37.                     Tag = -1
    38.                 }).ForEach((ref Work w) => { w.Value += 100; })
    39.                 .Run();
    40.  
    41.         return default;
    42.     }
    43. }
     
  18. exiguous

    exiguous

    Joined:
    Nov 21, 2010
    Posts:
    1,749
    Sorry for necroing this thread. But reading the 2 possibilities (tagging, marking) I was wondering if there is no 3rd approach I would like to discuss. Note that I'm an ECS noob and maybe the attempt is utter nonsense. Then please tell me.

    Tagging seems to be advantageous when the tags live on the entity for a certain time. But in cases where the tag is "consumed" and removed in the same frame or frequently it seems costly. I'm wondering if it would be easier to leave the entity in its chunk and "mark" it via a second "intermediate" entity which is created and just contains the entity it refers to. So the system which is interested in the "tag" just gets the newly created "tag" entity, queries the entity it is really interested in, and does its operation on it. Then the tag entity is removed again. So basically entities are used as tags and not components. The question is if this could be faster since there is a random memory access via ComponentDataFromEntity. But the structural change is reduced since I guess creating and removing a small entity is "cheaper" than moving a large entity between chunks back and forth. I guess this is only real usefull for a few entities of each archetype. Where iterating over all (marking) or moving them between chunks (tagging) is both costly.

    Opinions?
     
    Egad_McDad and charleshendry like this.
  19. PublicEnumE

    PublicEnumE

    Joined:
    Feb 3, 2019
    Posts:
    729
    In an upcoming version of Entities, there will be an ideal 3rd option, which will (possibly) offer the best approach: Component enable states.

    From what’s been described: every IComponent and IBufferElement will have a built-in ‘enabled’ state that can be set to ‘disabled’. Disabled components still live in chunks, but they’ll be excluded from EntityQueries.

    But most importantly: you’ll be able to set the enabled state of any individual Component from Bursted Jobs, and it won’t require an Archetype change.

    So - as long as you don’t mind having the Component always present on your Entity, you could just enable/disable is whenever you want that entity to show up in Queries. No more main-thread bottlenecks or compromises.

    (I assume all the normal restrictions about not concurrently writing to the same Component type from parallel Jobs will still apply for setting the enable state).

    (Also: no idea if disabled Components will be excluded from EntityQueries at the individual, or chunk level. I would assume chunk for now, just to be safe).
     
  20. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Do you know when this is coming out?
     
  21. Discipol

    Discipol

    Joined:
    May 6, 2015
    Posts:
    83
    Bump, it's been a bit, any news on this?
     
  22. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,779
    It should come with Entities 1.0, possibly. But first Entities 0.50 need to be rolled in. So it may take another year or more, before it happens.

     
    Last edited: Jan 15, 2022
    Opeth001 likes this.