Search Unity

Question Best way to cache and re-access entity queries in the same frame?

Discussion in 'Entity Component System' started by CPlusSharp22, Oct 5, 2020.

  1. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    I have a system using SystemBase from Entities 0.11

    I manually call `Update` on this system multiple times in a frame (its part of a simulation and a simulation may require multiple updates per frame).

    The `Entities.ForEach` is too costly, especially when the system may only have 1 entity to work on per update. Even with an early return and no work is being done, calling the ForEach adds up.

    I'm looking for recommendations on how to cache the ForEach query every frame, and reuse it.
    It would work like this:
    1. PreSimulation -> system calls and caches an array of EntityQuery.CreateArchetypeChunkArray
    2. SimulationTick -> system Update is triggered, it may manually iterate the array and access components using GetArchetypeChunkComponentType or maybe even create a (non-parallel) job
    3. Repeat SimulationTick x times.
    4. PostSimulation -> system disposes of the nativearrays and anything else.
    Note:
    • I do not expect new entities to be introduced during this cycle.
    • I do not expect components to be removed or added during this cycle.
    • I do expect to modify component data during this cycle.
    • I want to keep to option to Burst where possible, but I'd probably leave it to the user.
    Does this sound reasonable? Suggestions for alternatives besides CreateArchetypeChunkArray ?
    And while I'm here, may I ask how do you modify component data while using chunks, can you not get a reference to that component somehow to avoid calling SetComponent?
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,270
    The NativeArray returned by ArchetypeChunk.GetNativeArray() is the direct memory of the components in the chunk, not a copy. That means if you write to the NativeArray, you are writing to the entities' data.

    I'm surprised the "Entities.ForEach" is so costly for you. I would love to see a profiler timeline snapshot. How many times are you updating the system per frame? But if this is truly the bottleneck, you found the right API. You can pass that array and the the type handles into an IJobFor with [BurstCompile] if you need parallel bursted jobs.
     
  3. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    oh cool, so if I access the index of the chunk with the accessor, it's a reference? that's awesome.

    I can update it anywhere from 1 to 32 times a frame (worst case). At 5 times on a Pixel 2 Android, it can take 0.8ms with no processing too. Multiple that by 25+ systems and my game is struggling to be smooth unfortunately (Hitting around 30ms just for the simulation systems). Most of these systems just check a component and early out. It's rough!
     
    Last edited: Oct 6, 2020
  4. Lieene-Guo

    Lieene-Guo

    Joined:
    Aug 20, 2013
    Posts:
    547
    FYI, you can use Entities.WithStoreEntityQueryInField(ref EntityQuery).ForEach()
    So you can use the Query in OnUpdate() or OnCreate() even before this Entities.ForEach().
    then generation of the Query is codegened to a function before OnCreate().
     
    Last edited: Oct 6, 2020
    Enzi likes this.
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,270
    Ah. Mobile. That explains a bit.
    Other things to watch out for, there's some fancy checking Unity does before running the system. You might be able to reduce that cost using [AlwaysUpdateSystem] which early-outs that process. Also, job scheduling can be expensive. So using Run() will bypass that overhead while still using Burst. Lastly, if you can compute whether a system needs to run before it runs and don't rely on OnStartRunning or OnStopRunning, you can manually update systems and just don't call Update when you don't need them to run.
     
    Enzi likes this.
  6. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Just to follow up I got a working version using the chunks but boy is the code long and annoying. Rewriting 45+ systems like this and future systems was too much to do. I plan to consider a "controller" or something that will manually update systems it thinks it needs to run instead
     
  7. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Entities.ForEach().Run(); using SystemBase is very fast / very low overhead, in particular for very small entity counts.
    Much faster than asking a query to allocate an archetype chunk array and processing it manually.

    So please continue to write simple code, in this case it is also the fastest.
     
    MNNoxMortem and florianhanke like this.
  8. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    In my use case it's faster to cache the chunk array for multiple updates rather than using .ForEach.Run multiple times. I've also had unity devs straight out say that ForEach has overhead issues.
     
    Last edited: Oct 23, 2020
  9. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    That is outdated information. SystemBase.Entities.ForEach.Run() is definately faster than allocating queries.

    If you feel that this doesn't match what you are seeing, please write a simple loop in your game, measure it and show the comparison results here.
     
  10. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    It is not faster than doing a single query vs multiple queries in a single frame. If you call ForEach repeatedly vs caching the chunks and repeatedly doing work, the latter is faster.

    Create 20-50 systems in a world, each with different queries in a ForEach.

    Try to call update on each system 5-10 times in a frame.

    Then convert one or more systems away from ForEach to using chunks. So no more ForEach.

    You will see the non ForEach systems speed up by 10% or more. Assuming that any burst foreach workloads are reconverted to burst jobs too of course.
     
  11. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Please post the sample code you used for both versions and the performance numbers you measured based on it.
     
  12. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,685
    Well as @OmegaNemesis28 don't want to provide measurements, I did this for him to prove that @Joachim_Ante words correct.

    Two systems for tests. One of them using
    BurstCompiled
    IJob
    for iterating cached chunks in
    OnUpdate
    and increment one component value. We cache chunks only once in measurement iteration (let's assume this as a frame) then update system 100 times, the second one just uses
    Entities.ForEach
    without any our explicit optimisations every
    OnUpdate
    call, and also just increment one component value, we also update this system 100 times in measurement iteration.
    Code (CSharp):
    1. using Unity.Burst;
    2. using Unity.Collections;
    3. using Unity.Entities;
    4. using Unity.Jobs;
    5.  
    6. namespace Tests
    7. {
    8.     public struct ComponentForCache : IComponentData
    9.     {
    10.         public float Value;
    11.     }
    12.  
    13.     public struct ComponentForForEach : IComponentData
    14.     {
    15.         public float Value;
    16.     }
    17.  
    18.     [DisableAutoCreation]
    19.     public class CachedChunksSystem : SystemBase
    20.     {
    21.         private NativeArray<ArchetypeChunk> _cachedChunks;
    22.         private EntityQuery                 _queryToCache;
    23.      
    24.         public void CacheChunks()
    25.         {
    26.             _queryToCache = GetEntityQuery(typeof(ComponentForCache));
    27.             _cachedChunks = _queryToCache.CreateArchetypeChunkArray(Allocator.TempJob);
    28.         }
    29.  
    30.         public void ClearCache()
    31.         {
    32.             if (_cachedChunks.IsCreated)
    33.                 _cachedChunks.Dispose();
    34.         }
    35.  
    36.         [BurstCompile]
    37.         private struct IterateCachedChunksJob : IJob
    38.         {
    39.             public NativeArray<ArchetypeChunk>            CachedChunks;
    40.             public ComponentTypeHandle<ComponentForCache> ComponentForCacheType;
    41.          
    42.             public void Execute()
    43.             {
    44.                 for (int i = 0; i < CachedChunks.Length; i++)
    45.                 {
    46.                     var componentArray = CachedChunks[i].GetNativeArray(ComponentForCacheType);
    47.  
    48.                     for (int j = 0; j < componentArray.Length; j++)
    49.                     {
    50.                         var updatedValue = componentArray[j];
    51.                         updatedValue.Value += 1.5f;
    52.                         componentArray[j]  =  updatedValue;
    53.                     }
    54.                 }
    55.             }
    56.         }
    57.  
    58.         protected override void OnUpdate()
    59.         {
    60.             new IterateCachedChunksJob()
    61.             {
    62.                 CachedChunks          = _cachedChunks,
    63.                 ComponentForCacheType = GetComponentTypeHandle<ComponentForCache>()
    64.             }.Run();
    65.         }
    66.     }
    67.  
    68.     public class ForEachSystem : SystemBase
    69.     {
    70.         protected override void OnUpdate()
    71.         {
    72.             Entities.ForEach((ref ComponentForForEach componentData) =>
    73.             {
    74.                 componentData.Value += 1.5f;
    75.             }).Run();
    76.         }
    77.     }
    78. }
    Performance test with warmups for clear numbers. Synchronous compilation for Burst enabled, all safety checks, leak detection, jobs debugger disabled. 1000 measurements, 100 iterations per measurement, each call system update 100 times, 10000 entities
    Code (CSharp):
    1. using NUnit.Framework;
    2. using Unity.Entities;
    3. using Unity.PerformanceTesting;
    4.  
    5. namespace Tests
    6. {
    7.     public class PerformanceTestGathering
    8.     {
    9.         [Test, Performance]
    10.         public void CachedChunksPerformance()
    11.         {
    12.             InitializeTestWorld<CachedChunksSystem, ComponentForCache>(10000);
    13.  
    14.             var systemWarmup = _testWorld.GetExistingSystem<CachedChunksSystem>();
    15.             systemWarmup.CacheChunks();
    16.             systemWarmup.Update();
    17.             systemWarmup.ClearCache();
    18.          
    19.             Measure.Method(() =>
    20.             {
    21.                 var system = _testWorld.GetExistingSystem<CachedChunksSystem>();
    22.                 system.CacheChunks();
    23.                 for (int i = 0; i < 100; i++)
    24.                 {
    25.                     system.Update();
    26.                 }
    27.                 system.ClearCache();
    28.             })
    29.             .MeasurementCount(1000)
    30.            .IterationsPerMeasurement(100)
    31.            .SampleGroup("CachedChunksPerformance")
    32.            .Run();
    33.  
    34.             DisposeTestWorld();
    35.         }
    36.  
    37.         [Test, Performance]
    38.         public void ForEachPerformance()
    39.         {
    40.             InitializeTestWorld<ForEachSystem, ComponentForForEach>(10000);
    41.          
    42.             var systemWarmup = _testWorld.GetExistingSystem<ForEachSystem>();
    43.             systemWarmup.Update();
    44.  
    45.             Measure.Method(() =>
    46.             {
    47.                var system = _testWorld.GetExistingSystem<ForEachSystem>();
    48.                for (int i = 0; i < 100; i++)
    49.                {
    50.                    system.Update();
    51.                }
    52.             })
    53.             .MeasurementCount(1000)
    54.             .IterationsPerMeasurement(100)
    55.             .SampleGroup("ForEachPerformance")
    56.             .Run();
    57.             DisposeTestWorld();
    58.         }
    59.  
    60.         private World _testWorld;
    61.  
    62.         private void InitializeTestWorld<TSystem, TComponent>(int entitiesCount)
    63.             where TSystem : SystemBase, new() where TComponent : IComponentData
    64.         {
    65.             _testWorld = new World("Performance Test World");
    66.             var simulationGroup = _testWorld.GetOrCreateSystem<SimulationSystemGroup>();
    67.             var system          = _testWorld.GetOrCreateSystem<TSystem>();
    68.             simulationGroup.AddSystemToUpdateList(system);
    69.             simulationGroup.SortSystems();
    70.  
    71.             var entityArchetype = _testWorld.EntityManager.CreateArchetype(typeof(TComponent));
    72.             for (var i = 0; i < entitiesCount; i++)
    73.             {
    74.                 _testWorld.EntityManager.CreateEntity(entityArchetype);
    75.             }
    76.         }
    77.  
    78.         private void DisposeTestWorld()
    79.         {
    80.             _testWorld.Dispose();
    81.         }
    82.     }
    83. }
    And results, where you can see that ForEach faster than manually caching and iterating chunks (0.17 median against 0.26 median)
    ForEach:
    upload_2020-10-24_1-45-36.png

    Caching and iterating chunks:
    upload_2020-10-24_1-45-44.png

    Don't even mention that cache version require much more code.
     
  13. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Thanks Eizenhorn.

    On top of this, caching the ArchetypeChunk array like this is not safe & you have to have code that invalidates the cache when structural changes occur. You can do this using EntityManager.Version. In practice however, caching an archetype chunk array over multiple frames is a quite an unrealistic expectation. Most games instantiate / destroy at least a couple of entities every frame... Meaning that such caching is essentially completely pointless. And that is when performance goes from being equal in the case of Entities.ForEach to being significantly better when using Entities.ForEach
     
    MNNoxMortem and eizenhorn like this.
  14. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Watch your words. Never said I don't want to provide measurements. You can't just say "ask for a repro" and expect a sample project overnight. I have better things to do. You swoop into the thread less than a few hours later after the last post and act like I'm refusing to share information or something? That's just rude.

    These numbers do not match mine.
    I haven't drilled into your code, but

    1. just looking at it briefly shows you only have 1 system and a SUPER simple use case with 1 component, no filters or anything complex.
    2. I also only see 1 job, where every system would be creating different jobs with various workloads too. Some may be burst, some may not be.
    3. You're updating the system in loops sequentially. When in reality, they'd be updated along all the other systems in a master loop. This can easily invalidate your numbers.
    4. You have 10000 entities I think? What do you get with 1 entity, which is more inline with my use case (or up to 10)
    This is not a proper comparison, its very barebones and very naive.

    Yes I stated this before which is why I do not like it. But necessary measures may mean I have to do this now since my performance can't suffer this much.

    No one is caching archetype chunk arrays over multiple frames. Unless his code is and I haven't read it thoroughly enough. Whioch means its even more of an invalid comparison. I thought I stated this before. Same with "Meaning that such caching is essentially completely pointless." statement. No one is maintaing a cache across frames with new entities. I specified my use case earlier.
     
    Last edited: Oct 25, 2020
  15. Lieene-Guo

    Lieene-Guo

    Joined:
    Aug 20, 2013
    Posts:
    547
    According to @Micz84's test, Caching read-only native container data in a temp container (even stackalloc) only makes it slower, as Burst is doing an excellent job in that case.
    Cache ReadWrite/WriteOnly data in stackalloc memory block and write data back in batch with MemCpy will be faster than setting data one by one directly to container memory.
    https://forum.unity.com/threads/please-help-me-understand-why-second-job-is-slower.990731/
    And Chunk is also a NativeContainer.
    Burst can somehow keep data in the cache as much as possible.
    That one extra MemCpy to cache data will only make it slower.
    But if cached data is used sparsely across several systems that access large chunks of memory over different locations. Burst probably would not be able to help.
    In that case, caching chunk data is unsafe, as data in chunk could have been updated.
     
    Last edited: Oct 25, 2020
  16. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Because of all the impatience, I did something *super* quick and it still doesn't cover my use case at the worst possible scenario. This is a super simplified workload and I still see that ForEach is slow. Like I said, caching the chunks ends up being just as fast or faster on my devices. The only exception to this is if I flag [AlwaysUpdate] on the system, this speeds it up considerably but I need to investigate if I can use that in my actual code.

    Info/Conditions:
    1. com.unity.entities@0.11.1-preview.4
    2. 10 entities (all match the use case, this is too optimal/naive for real performance numbers, it would actually be worse with more entities in a real use scenario)
    3. 10 frames of updates
    4. 64 ticks of Update per frame
    5. I only have 6 ECS systems, all doing largely the same thing and poking at the same memory/entities. (Also too optimal/naive, it would be nice if I could create a bunch of systems of the same type doing dummy work but annoyingly ECS worlds are type-keyed I think which means its 1:1, would have to create dummy classes.)
    6. Burst compiling = on
      Job safety checks = off
    7. Notice the code of the "jobs" all have early out conditions and out of box none of the entities actually end up doing anything (it sees value == 0.0f return/continue in the loop)
    Method:
    Attach profiler, record, press button to begin test, wait for cube to disappear, stop profiling. Open profile analyzer, pull data, highlight the test (it will be one big block of frame time, something like 10-20 frames to highlight), use name filter "test." with the period at the end.

    Unity 2019.4.7f1 In-Editor Windows
    • UpdatePretendEntities (no ECS, just MonoBeh) = 0.02ms (this is the ideal performance)
    • ForEachSystem.OnUpdate = 0.23ms
    • AlwaysUpdateForEachSystem.OnUpdate = 0.17ms
    • ForEachSystemNoBurst.OnUpdate = 0.30ms
    • ChunksSystem.OnUpdate = 0.07ms + 0.11ms (for PreLoop to cache) = 0.18ms
    • ChunksJobSystem.OnUpdate = 0.13ms + 0.11ms (for PreLoop to cache) = 0.24ms
    • ChunkJobSystemNoBurst.OnUpdate = 0.30ms + 0.11ms (for PreLoop to cache) = 0.41ms
    Android Pixel 2
    • UpdatePretendEntities (no ECS, just MonoBeh) = 0.16ms (this is the ideal performance)
    • ForEachSystem.OnUpdate = 0.88ms
    • AlwaysUpdateForEachSystem.OnUpdate = 0.74ms
    • ForEachSystemNoBurst.OnUpdate = 1.03ms
    • ChunksSystem.OnUpdate = 0.37ms + 0.47ms (for PreLoop to cache) + 0.01 (for PostLoop to dispose) = 0.85ms
    • ChunksJobSystem.OnUpdate = 0.70ms + 0.47ms (for PreLoop to cache) + 0.01 (for PostLoop to dispose) = 1.02ms
    • ChunkJobSystemNoBurst.OnUpdate = 1.03ms + 0.47ms (for PreLoop to cache) + 0.01 (for PostLoop to dispose) = 1.51ms
    The quickest take away is that ECS here kills performances whether you're using ForEach or Chunks, Monobehaviours win. Of course this is just with 10 entities, rather than a million, but like I said that's how my game operates right now. There's usually only 1 entity these systems look at. My game does not have or will have many entities, it's not a battle royal or anything.

    For 1 system to take 0.88ms is crazy to me. Yes, it's unusual to call update on the system 64 times. But even at 1/4 of that, it shouldn't be breaching 0.20ms especially when the systems are not actually doing work (just conditional check). I have 70+ systems now that have to do this every frame because it's a simulation, that's 61.6ms at minimum (assuming 64hz) :(
    Non ECS would be 11.2ms for comparison

    Disclaimer: besides the test favoring ForEach for several noted reasons, it is worth mentioning I could have mistakes here. I rushed this since I didn't appreciate the rudeness I perceived. Lots of ways to make the test "closer" to my use case such as adding lots more systems, mix/match burst use, add more entity archetype variation for the chunks, don't use the entities sequentially, introduce mixed branching of logic.


    Profile files:
    https://www.dropbox.com/s/qz6d8piw512d2fq/profiles.zip?dl=0

    Project/Code here:
    https://www.dropbox.com/s/3ii8llny18pcxwr/test.zip?dl=0

    How outdated by the way? Less than a month? In the unity slack channel for DOTS it was recent, beginning of October I think. @Joachim_Ante
     
    Last edited: Oct 25, 2020
  17. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    That perf of that makes sense, kind of, to me. Do note that not everything in this can be bursted though. It's up to the user, but a lot of these systems can't be bursted due to poor design decisions outside of my control, long story. But I've observed this even with burst. The burst jobs could literally be looking at 1 entity, checking 1 float, returning and they'll take end up taking 0.10ms I've seen which is killer.
     
  18. Lieene-Guo

    Lieene-Guo

    Joined:
    Aug 20, 2013
    Posts:
    547
    Look like there are two major reasons.
    1. SystemBase pre-update checks (Query count check, required singleton check blah blah...)
    2. Job schedule overhead.

    For reason 1. I'm waiting for the unmanaged system. Bursted unmanaged system will be much faster.
    For reason 2. A manual entity count/chunk count check, and a Run/Schedule/ScheduleParallel switch could make it better.
    Generally, as the game designer, you should be aware of what type of entity is rare and can be updated by run and what should be batched by ScheduleParallel.

    By adding [AlwaysUpdate] attribute, it is just up to you to decide if the job should Run/Schedule/ScheduleParallel or skipped totally.
     
  19. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    10,778
    I am not sure, if ECS is a good approach, if having only few handull entities to deal with. ECS shines with high volume of data. There is some small overhead of running systems. But for large count of entities, it is neglegable. Also, you maybe not require to run every system in every frame.

    But maybe Instead stick with jobs and burst?
     
    CPlusSharp22 likes this.
  20. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    As Lieene said, the solution here is :

    unmanaged ISystemBase which allow fully bursted update calls. We have done a lot of refactoring in Entities to allow for this in the last couple releases. (EntityManager / SystemState / EntityQuery etc are all structs & burstable now...)
    What is missing is code-gen for Entities.ForEach, but we are also almost there with that.

    This will significantly reduce the overhead of System.OnUpdate including the cost of invoking Entities.ForEach.Run

    Our intention here is very much that a single entity + single system OnUpdate + ForEach.Run should be the same or better than the cost of MonoBehaviour.Update. Obviously where the benefits of dots kicks in is in having more than 1 thing, but we fully realise that there are plenty cases in games where there is just one of a thing & the minimum bar for that is that it is no worse than MonoBehaviour.Update. (Lets note however that your MonoBehaviour example has a direct function call to a Test method in an innerloop. That is definately not how MonoBehaviour.Update works and is not a fair comparison)

    It's possible that if you cache the archetypechunk array and then reuse it 64 times you can get some speedups.
    I wouldn't recommend refactoring a bunch of code to such a pattern unless you are shipping very soon & you absolutely need exactly those speedups right now.

    In this case, you probably just want to trust me that this particular codepath will become much better optimised in the next coming months.
     
    Last edited: Oct 25, 2020
  21. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Yes I keep hearing it's getting optimized soon but I've been waiting quite a bit. Plus upgrading to Unity 2020 is mandatory now, thats a whole other thing. But I'm willing to get over the hurdle when it comes at least. I really wish these optimizations were transparent and on a schedule that we could rely on.

    In regards to the MonoBehaviour example, that's how I personally manage all my MonoBehaviours. It's how it would have to work for the fixed simulation anyway in my use case. I much prefer explicit updates vs the Update monobehaviour methods (personally). So the code is using a single MonoBehaviour update hook from the "test" and it's responsible for the code flow to call methods on other monobehaviours. Maybe I'm just particular, but I wouldn't do it any other way for this simulation thing. Deterministic ordering would be a nightmare otherwise. Perhaps another convo :p

    But thank you for the more detailed response and shining some hope my way.
     
  22. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Yes in hindsight, I really wish I didn't use ECS in its current state. Coupled with the Unity 2020 upgrades, it's been an unfortunate ride in hair pulling. All my systems thankfully are manually updated so I don't have to run them every frame if I don't need to.
     
    Last edited: Oct 26, 2020
  23. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Thanks, I wasn't understanding the unmanaged system stuff you were mentioning until @joachim just explained that it's coming. I see now. The systems will be unmanaged types, interesting. I hope it will indeed be faster.

    Several problems have us avoiding the scheduling for the time being. Designers don't know anything about ECS so it's just the programmers doing the ECS logic right now and we're all new to it. The overhead of .Schedule calls means everyone has been avoiding it rather than trying to map out what could be scheduled. But we'll get there hopefully. In the simulation loop it's tough because most of it has to be done in order and by the end of the frame.

    Typically, if the simulation is running, all these systems should be able to update. So every system should be able to have [AlwaysUpdate] I think. It could be a worthy speed up based on this test, I'm surprised I was able to measure such a big delta on the Pixel.
     
  24. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Just to bump this, I intend at some point to update my test project to the newest entities and try the burst systems mentioned. I really hope to not see 0.88ms on a single system anymore!

    What I can't find in the change list for 0.17 is whether job schedule overhead was improved as well. I'd like to hope so but once I find time to update we can see.

    I've since abandoned ECS due to this issue but I'm still following along because I am interested in the tech. I would like to revisit it in the future with a new game.
     
  25. CPlusSharp22

    CPlusSharp22

    Joined:
    Dec 1, 2012
    Posts:
    111
    Using the same benchmark code I previously shared, these are new results on my personal windows 10 machine with a Ryzen 5600x (since upgraded from what I last posted)

    Reminder: 64 systems each, Jobs Leak Detection off and JobsDebugger off, 10 entities

    Unity 2020.1.17f1
    0.11.1-preview.4
    Monobehaviour (Pretend Entities) = 0.02ms
    ForEachSystem = 1.97ms (1.58ms foreach)
    ForEachAlwaysUpdate = 0.28ms (0.13ms foreach)
    ForEachNoBurst = 1.93ms (1.80ms foreach)

    0.17.0-preview.41
    Monobehaviour (Pretend Entities) = 0.02ms
    ForEachSystem = 1.20ms (1.04ms foreach)
    ForEachAlwaysUpdateSystem = 0.29ms (0.14ms foreach)
    ForEachNoBurstSystem = 2.03ms (1.89ms foreach)

    Improvements are improvements! I won't lie and not say I hoped for more out of box but it's a step. The AlwaysUpdateSystem is unfortunately the best way to go still.

    As for SystemBase... well... it took me *hours* to set it up the same way I use my traditional systems. Mainly because all the of extension methods needed for manual unmanaged world creation are flagged as internal! And there's no exposed API in SystemBase I could find besides UnmanagedUpdate which is also flagged as internal, as well as ResolveSystemState as used in ComponentSystemGroup. So you cant even manually call Update() directly on these yet.
    https://forum.unity.com/threads/world-extensions-for-unmanaged-systems-are-not-public.1043650/
    and docs like this are empty
    https://docs.unity3d.com/Packages/com.unity.entities@0.17/api/Unity.Entities.ISystemBase.html

    Because I'm insane and reluctantly decided to spend more time on this, I forked it locally and modified the Entities package and made a few tweaks to expose all the methods I needed to grab the handles to the new systems and I manually resolve the state on my side, then call into the burst code, at the same point I do the previous tests. Results:

    Jobs Leak Detection off and JobsDebugger off
    Also I ignore the first warmup frames for this, JIT creates ALOT of GC Alloc in these the first time it hits.
    ForEachSystem(NewIBase) = 0.19ms (0.18 ms job, entire systemstateresolve 0.21ms)
    ForEachNoBurstSystem(NewIBase) = 0.29ms (0.27 ms job, entire systemstateresolve 0.31ms)

    This is a real win if you're willing to go the distance to fork 0.17 and expose everything, and the headache of figuring out the entry point to call the code like the actual SystemGroup would call it (ie calls Shouldrun to check entitiy queries)

    Still, however, Monobehaviour wins this by a considerable amount. And Im sure these numbers spike on device, but I do not have the Pixel 2 anymore to compare.

    It's a great direction though. I will maybe revisit Entities in the future.
     
    Last edited: Apr 25, 2021