Benchmarking tag component v2

5argon · Dec 19, 2019

I did it again because back then I didn't have the performance testing package.

Suppose we have many entities but we would like to work on only some of them. There are 2 approaches :

"Mark" : You have some kind of check, and if this is true, you work only on this entity. Advantage is that we don't have to pay for chunk movement cost on tagging entities one by one. Disadvantage is that we cannot skip work, we must always iterate through all and check which one to work on.

"Tag" : Utilize the chunk unit of query in Unity ECS. When adding a component tag, the entity would change its archetype and cannot stay in the same chunk as those not tagged. Then EntityQuery could bring in only the chunks that contains the tag and we can work on these entities without any if in the code, because we already pay for the if when we decided to tag or not. Disadvantage is that we must pay for chunk movement cost in addition to that if to decide to tag. But an advantage is that we can keep using these tagged chunks for extended period of time where the mark approach must always iterate through all entities.In this test, tagging is queued into EntityCommandBuffer while in bursted main thread Entities.ForEach that contains conditional checks which one should be tagged, then playback out of that, then an another bursted Entities.ForEach work on those tagged entities.

"Affected entities" : Entities that must be tagged then worked on, or entities that returns true on conditional checking and will be worked on in the case of marking.

Analysis

If not counting the tagging cost, by iterating through only the tagged entities without worry about checks cost about half cheaper of the marked equivalent. (The work in this test is adding 100 to an int.)

If the work target changes that often that it results in this much tagging consistently each frame, you are better off with just conditional on some data. (Anyways, can you design to avoid this and use tag in the first few frames or less often instead?)

If you can tag once in a now loading screen or something then keep using the tagged chunks, tagging almost always worth it.

You may consider mixed approach, tag some, add SCD to some for EQ filter, then still have inner if for detailed checks. Going all the way to remove all if maybe too much especially those that could be an inline-if :. If the if looks more like a control flow that could be solved with tagging, it is likely that tagging will benefit.

Notes

Using Entities 0.4.0 preview.10, Burst 1.2.0 preview.11. In editor, MacBook Pro Early 2015, Burst on, safety checks off, leaks detection off.

I gave an edge to the mark approach by using inline-if where both side returns a value. This results in a better assembly that do not require logic branching. In real program, you may have a real if that costs more than this per entity.

Tagging may also cost more if other component that did have data on the entity to be moved is larger. This test is just one int.

Test code
https://gist.github.com/5argon/849144dd3e766f415c08526dbfa0dcf6

eterlan · Dec 19, 2019

I remember it's 23ms -> 0.56ms per tag right? Great work!

Guedez · Dec 19, 2019

Then, adding tags/components that lasts for less than a frame is literally the worst thing you could possibly do?
I've been doing plenty of that

eterlan · Dec 19, 2019

I found you might miss sth. We all know that when there is a structural change, entity would be moved to another chunk. But what if the entity has many components? Cost of copy operation might have a linear relationship with count of components.

5argon · Dec 19, 2019

Guedez said: ↑

Then, adding tags/components that lasts for less than a frame is literally the worst thing you could possibly do?
I've been doing plenty of that
Click to expand...

It sounds like you would rather use ECB to perform the work after tagging directly, or use event system solution like NativeStream that tertle made.

The take away I got is while it sounds like everything in ECS is fast and we are trying to utilize the C, sometimes I forgot that just plainly iterate things also got boosted by Burst, linear array, etc. so it could be just as fast.

The bool for marking case here has been separated into its own component of just a bool instead of making a bool field to the work componenet. This is to make a linear array of just bools to help the outer if to be as fast as possible while searching them. (Get extra bools in each cache line read instead of getting contents not related to the check, so false case get better performance.)

5argon · Dec 19, 2019

eterlan said: ↑

I found you might miss sth. We all know that when there is a structural change, entity would be moved to another chunk. But what if the entity has many components? Cost of copy operation might have a linear relationship with count of components.
Click to expand...

I have been thinking about that too, so on tagging a single entity out this happens

1. If destination chunk doesnt exist reserve/alloc/reuse some chunk slot. Should be one off unless you completely remove all tags in each run in that case I think you shouldnt tag. Chunk capacity depends on total components so if it has many, we may have to pay more ofte for this due to low capacity.
2. Just that entity copied to the destination chunk along with all its component. Copy should also depends on size obviously.
3. The hole that was left got filled by the final element of the chunk. This is an another copy.

Rerun the test should be easy, I can just add tons of fields. In a moment..

Antypodish · Dec 19, 2019

Thx for test an analysis.

I always considered using tagging, only when need them change occasionally. Or maybe during instantiating procedure, which requires multi system to cooperate. Or few system need process data temporary, only on tagged entities, avoiding iteration over unnecessary entities.

Otherwise for often changes, bool or enum is more convenient, for me.

I think one thing is always overlooked in similar discussions from the past. Always considering one system to work on.

But if having multiple systems, which relay on bool marking/tagging for thousands of entities, with tagging we can reduce amount of required processing by quite significant portion.

However, I am curious about analysis, of lets say tagging entities, with multiple components and buffers. I suppose, buffers would be easier to test, as just require to populate number of elements. As long as it fits in chunk.

Do you think you could run such test for us?
Maybe also add some dummy systems, which consider tagging/marking filtering approach. But probably I could just use results, of last two columns, from initial test, and multiply by number of systems?

5argon · Dec 19, 2019

eterlan said: ↑

But what if the entity has many components? Cost of copy operation might have a linear relationship with count of components.
Click to expand...

Antypodish said: ↑

However, I am curious about analysis, of lets say tagging entities, with multiple components
Click to expand...

Ok done. The previous case is just 2 int per entity, plus one bool on an another component. To simulate real use case, I am going to attach this while not using it in any way in the test.

Code (CSharp):

struct Junks : IComponentData //<-- add this component

{

public Junk junk1;

public Junk junk2;

public Junk junk3;

public Junk junk4;

public Junk junk5;

public Junk junk6;

public Junk junk7;

public Junk junk8;

}

struct Junk

{

public int4 i1;

public int4 i2;

public int4 i3;

public int4 i4;

public int4 i5;

public int4 i6;

public int4 i7;

public int4 i8;

}

Additional 4*4*8*8 = 1024 bytes added to each entity. Because chunk is 16kB, the chunk capacity now should be around only 15 entities. Maybe this is a bit excessive but let's see. Data wise this is why both mark and tag case would take a hit in performance from this unrelated component :

Mark : When we iterate to check bool, remember that each element came from separated chunks. When the chunk capacity is low, we cross chunk border more often. The query that brought you chunks in the first place would have to work more also.

Tag : When tagging and moving entity out of chunk to a new one, we pay 2 copy cost for moving it along with all its component away, and copy swap back the last element of old chunk to fill its hole. Also when chunk capacity is low, we may also have to reserve new chunk more often. This is added on top of chunk crossing on iteration like the marking method.

Also I will do this as well, instead of adding a Junks component, I add that as a wasted large field in the same component we work on instead.

Code (CSharp):

struct Work : IComponentData

{

public int id; //<-- use to determine whether to mark or tag.

public int value; //<-- work on this.

public Junks junks;

}

This is bad because as we iterate to work on the value field, the remaining data that we would get for free in one cache line read would be junks instead of next component. Using small component benefits performance because of this.

Analysis

Looks like the marking way take a big performance hit at some threshold. It should be related to something about device's cache size that while iterating it has to go to RAM more often.

Both case I think took similar performance hit. It is not that tagging or marking way would scales more well on larger entity.

All three side by side :

eterlan · Dec 19, 2019

5argon said: ↑

Ok done. The previous case is just 2 int per entity, plus one bool on an another component. To simulate real use case, I am going to attach this while not using it in any way in the test.

Code (CSharp):

struct Junks : IComponentData //<-- add this component

{

public Junk junk1;

public Junk junk2;

public Junk junk3;

public Junk junk4;

public Junk junk5;

public Junk junk6;

public Junk junk7;

public Junk junk8;

}

struct Junk

{

public int4 i1;

public int4 i2;

public int4 i3;

public int4 i4;

public int4 i5;

public int4 i6;

public int4 i7;

public int4 i8;

}

Additional 4*4*8*8 = 1024 bytes added to each entity. Because chunk is 16kB, the chunk capacity now should be around only 15 entities. Maybe this is a bit excessive but let's see. Data wise this is why both mark and tag case would take a hit in performance from this unrelated component :

Mark : When we iterate to check bool, remember that each element came from separated chunks. When the chunk capacity is low, we cross chunk border more often. The query that brought you chunks in the first place would have to work more also.

Tag : When tagging and moving entity out of chunk to a new one, we pay 2 copy cost for moving it along with all its component away, and copy swap back the last element of old chunk to fill its hole. Also when chunk capacity is low, we may also have to reserve new chunk more often. This is added on top of chunk crossing on iteration like the marking method.

Also I will do this as well, instead of adding a Junks component, I add that as a wasted large field in the same component we work on instead.

Code (CSharp):

struct Work : IComponentData

{

public int id; //<-- use to determine whether to mark or tag.

public int value; //<-- work on this.

public Junks junks;

}

This is bad because as we iterate to work on the value field, the remaining data that we would get for free in one cache line read would be junks instead of next component. Using small component benefits performance because of this.

Analysis

Looks like the marking way take a big performance hit at some threshold. It should be related to something about device's cache size that while iterating it has to go to RAM more often.

Both case I think took similar performance hit. It is not that tagging or marking way would scales more well on larger entity.

All three side by side :

Click to expand...

Cool! It seems like tag scales better than mark at a large amount level, which is surprising! What I guess before this test is tag would be much slower when there are 100 components on an entity.
I'm thinking what about using extra description entity-component to handle frequently moving state, like a table. Each state is a HashSet<Entity>, what about that? But as you said, we don't have proper tool to iterate entity array or set extra filter with entity array, what we have is ComponentDataFromEntity.

Guedez · Dec 19, 2019

Large entities are just slow regardless of how you use them it seems.
Making more entities instead of making the larger seem to be the way to go.

What happens if an entity is bigger than the chunk?

Antypodish · Dec 19, 2019

@Guedez do make mi wonder.
In case of large buffers, they simply will be stored on heap. Not in entity. So no direct impact on moving between chunks.
But if having many (too many) components, it makes me think now?

eizenhorn · Dec 19, 2019

Guedez said: ↑

Large entities are just slow regardless of how you use them it seems.
Making more entities instead of making the larger seem to be the way to go.

What happens if an entity is bigger than the chunk?
Click to expand...

It can't. It will throw you error. -> throw new ArgumentException($"Entity archetype component data is too large after adding {componentType.ToString()}. Previous archetype size per instance {archetype->InstanceSizeWithOverhead} bytes. Attempting to add component size {componentInstanceSize} bytes. Maximum chunk size {chunkDataSize}.");

Code (CSharp):

[Conditional("ENABLE_UNITY_COLLECTIONS_CHECKS")]

public void AssertCanAddComponent(Archetype* archetype, ComponentType componentType)

{

if (componentType == m_EntityComponentType)

throw new ArgumentException("Cannot add Entity as a component.");

if (componentType.IsSharedComponent && (archetype->NumSharedComponents == kMaxSharedComponentCount))

throw new InvalidOperationException($"Cannot add more than {kMaxSharedComponentCount} SharedComponent to a single Archetype");

var componentTypeInfo = GetTypeInfo(componentType.TypeIndex);

var componentInstanceSize = GetComponentArraySize(componentTypeInfo.SizeInChunk, 1);

var archetypeInstanceSize = archetype->InstanceSizeWithOverhead + componentInstanceSize;

var chunkDataSize = Chunk.GetChunkBufferSize();

if (archetypeInstanceSize > chunkDataSize)

throw new ArgumentException($"Entity archetype component data is too large after adding {componentType.ToString()}. Previous archetype size per instance {archetype->InstanceSizeWithOverhead} bytes. Attempting to add component size {componentInstanceSize} bytes. Maximum chunk size {chunkDataSize}.");

}

eterlan · Dec 20, 2019

"Mark" : You have some kind of check, and if this is true, you work only on this entity. Advantage is that we don't have to pay for chunk movement cost on tagging entities one by one. Disadvantage is that we cannot skip work, we must always iterate through all and check which one to work on.

Click to expand...

Also I want to point out that, we can skip work by simply add a ChangedFilter for each system..
Time complexity now change from per system per entity per frame to system per chunk per frame.
BTW, when there are many tags as state transition, it's bug-prone.
It's harder to make choice..It seems like Tag wins in good looking but loses in performance.

5argon · Dec 20, 2019

eterlan said: ↑

Also I want to point out that, we can skip work by simply add a ChangedFilter for each system..
Time complexity now change from per system per entity per frame to system per chunk per frame.
Click to expand...

Right, sometimes we can get a "free tagging" effect that lasts for one round of update on writing data.

But in this test code it is not possible to get a free performance from changed filter. As to determine whether to mark or not, the ForEach that produce the chunk already set the chunk to write mode and they are all dirty, no matter the early if line returns true and actually results in a write or not. (With `ref` on lambda all chunks are considered written and will bump version as changed)

eterlan · Dec 20, 2019

emmm..May I shamelessly ask about add a SCD as tag test? Haven't try this testing extension.. Just curious to know whether it's as fast as normal tag..I remember someone said tag would be treat differently.

5argon · Dec 20, 2019

eterlan said: ↑

emmm..May I shamelessly ask about add a SCD as tag test? Haven't try this testing extension.. Just curious to know whether it's as fast as normal tag..I remember someone said tag would be treat differently.
Click to expand...

SCD should be the same as tag component as it technically is a zero-sized component that only need to be pasted on the chunk header. There is only an added routine on hashing SCD value to determine whether to produce a new SCD index or use which existing one. SCD tagging is not possible to do in a job even with command buffer that's the downside. But you can instead use one SCD type as multiple tags with query filter.

eterlan · Dec 21, 2019

5argon said: ↑

SCD should be the same as tag component as it technically is a zero-sized component that only need to be pasted on the chunk header. There is only an added routine on hashing SCD value to determine whether to produce a new SCD index or use which existing one. SCD tagging is not possible to do in a job even with command buffer that's the downside. But you can instead use one SCD type as multiple tags with query filter.
Click to expand...

I add some simple test, and surprisingly find that SCD is 20% faster than tag. Not sure, for some reason I cannot disable Leak detection.

Code (CSharp):

class JustSCDTag : JobComponentSystem

{

public int AffectedEntities;

protected override JobHandle OnUpdate(JobHandle inputDeps)

{

int affected = this.AffectedEntities;

var ecb = new EntityCommandBuffer(Allocator.Temp);

Entities.ForEach((Entity e, in Work w) =>

{

if (w.ID < affected)

{

ecb.SetSharedComponent(e, new SCDTagMark

{

Tag = -1

});

}

}).Run();

ecb.Playback(EntityManager);

// Note. Dispose ecb if you need to manually playback it.

ecb.Dispose();

return default;

}

}

[DisableAutoCreation]

class WorkOnSCDTag : JobComponentSystem

{

public int AffectedEntities;

protected override JobHandle OnUpdate(JobHandle inputDeps)

{

Entities.WithSharedComponentFilter(new SCDTagMark

{

Tag = -1

}).ForEach((ref Work w) => { w.Value += 100; })

.Run();

return default;

}

}

exiguous · Sep 16, 2020

Sorry for necroing this thread. But reading the 2 possibilities (tagging, marking) I was wondering if there is no 3rd approach I would like to discuss. Note that I'm an ECS noob and maybe the attempt is utter nonsense. Then please tell me.

Tagging seems to be advantageous when the tags live on the entity for a certain time. But in cases where the tag is "consumed" and removed in the same frame or frequently it seems costly. I'm wondering if it would be easier to leave the entity in its chunk and "mark" it via a second "intermediate" entity which is created and just contains the entity it refers to. So the system which is interested in the "tag" just gets the newly created "tag" entity, queries the entity it is really interested in, and does its operation on it. Then the tag entity is removed again. So basically entities are used as tags and not components. The question is if this could be faster since there is a random memory access via ComponentDataFromEntity. But the structural change is reduced since I guess creating and removing a small entity is "cheaper" than moving a large entity between chunks back and forth. I guess this is only real usefull for a few entities of each archetype. Where iterating over all (marking) or moving them between chunks (tagging) is both costly.

Opinions?

PublicEnumE · Sep 16, 2020

In an upcoming version of Entities, there will be an ideal 3rd option, which will (possibly) offer the best approach: Component enable states.

From what’s been described: every IComponent and IBufferElement will have a built-in ‘enabled’ state that can be set to ‘disabled’. Disabled components still live in chunks, but they’ll be excluded from EntityQueries.

But most importantly: you’ll be able to set the enabled state of any individual Component from Bursted Jobs, and it won’t require an Archetype change.

So - as long as you don’t mind having the Component always present on your Entity, you could just enable/disable is whenever you want that entity to show up in Queries. No more main-thread bottlenecks or compromises.

(I assume all the normal restrictions about not concurrently writing to the same Component type from parallel Jobs will still apply for setting the enable state).

(Also: no idea if disabled Components will be excluded from EntityQueries at the individual, or chunk level. I would assume chunk for now, just to be safe).

MintTree117 · Apr 17, 2021

PublicEnumE said: ↑

In an upcoming version of Entities, there will be an ideal 3rd option, which will (possibly) offer the best approach: Component enable states.

From what’s been described: every IComponent and IBufferElement will have a built-in ‘enabled’ state that can be set to ‘disabled’. Disabled components still live in chunks, but they’ll be excluded from EntityQueries.

But most importantly: you’ll be able to set the enabled state of any individual Component from Bursted Jobs, and it won’t require an Archetype change.

So - as long as you don’t mind having the Component always present on your Entity, you could just enable/disable is whenever you want that entity to show up in Queries. No more main-thread bottlenecks or compromises.

(I assume all the normal restrictions about not concurrently writing to the same Component type from parallel Jobs will still apply for setting the enable state).

(Also: no idea if disabled Components will be excluded from EntityQueries at the individual, or chunk level. I would assume chunk for now, just to be safe).
Click to expand...

Do you know when this is coming out?

Discipol · Jan 15, 2022

PublicEnumE said: ↑

In an upcoming version of Entities, there will be an ideal 3rd option, which will (possibly) offer the best approach: Component enable states.

From what’s been described: every IComponent and IBufferElement will have a built-in ‘enabled’ state that can be set to ‘disabled’. Disabled components still live in chunks, but they’ll be excluded from EntityQueries.

But most importantly: you’ll be able to set the enabled state of any individual Component from Bursted Jobs, and it won’t require an Archetype change.

So - as long as you don’t mind having the Component always present on your Entity, you could just enable/disable is whenever you want that entity to show up in Queries. No more main-thread bottlenecks or compromises.

(I assume all the normal restrictions about not concurrently writing to the same Component type from parallel Jobs will still apply for setting the enable state).

(Also: no idea if disabled Components will be excluded from EntityQueries at the individual, or chunk level. I would assume chunk for now, just to be safe).
Click to expand...

Bump, it's been a bit, any news on this?

Antypodish · Jan 15, 2022

Discipol said: ↑

Bump, it's been a bit, any news on this?
Click to expand...

It should come with Entities 1.0, possibly. But first Entities 0.50 need to be rolled in. So it may take another year or more, before it happens.

s_schoener said: ↑

IIRC it is not part of 0.50.
Click to expand...

Search Unity

Benchmarking tag component v2

5argon

eterlan

Guedez

eterlan

5argon

5argon

Antypodish

5argon

Attached Files:

tagbenchmark.numbers.zip

eterlan

Guedez

Antypodish

eizenhorn

eterlan

5argon

eterlan

5argon

eterlan

exiguous

PublicEnumE

MintTree117

Discipol

Antypodish

Search Unity

Unity ID

Useful Searches

Benchmarking tag component v2

Attached Files: