Search Unity

  1. Want to see 2020.1b in action? Sign up for our Beta 2020.1 Overview Webinar on April 20th for a live presentation from our evangelists and a Q&A session with guests from R&D.
    Dismiss Notice
  2. Interested in giving us feedback? Join our online research interviews on a broad range of topics and share your insights with us.
    Dismiss Notice
  3. We're hosting a webinar for the new Input System where you'll be able to get in touch with the devs. Sign up now and share your questions with us in preparation for the session on April 15.
    Dismiss Notice
  4. Dismiss Notice

Disappointing performance of EcsQuery

Discussion in 'Data Oriented Technology Stack' started by NBender, Aug 21, 2019.

  1. NBender


    Aug 12, 2019
    Hi. I've decided to write a very simple performance test to estimate just how fast common entity operations are. And frankly I was very disappointed. Here is what I have:

    1) A custom world with single system that I manually update. This system basically just adds two vectors together and saves the result.

    Code (CSharp):
    2. public class SystemA : ComponentSystem
    3. {
    4.     private EntityQuery _query;
    6.     protected override void OnCreate()
    7.     {
    8.         _query = GetEntityQuery(ComponentType.ReadWrite<Position>(),
    9.                                                  ComponentType.ReadOnly<Speed>());
    10.     }
    12.     protected override void OnUpdate()
    13.     {
    14.         Profiler.BeginSample("SystemA_Query");
    16.         var positions = _query.ToComponentDataArray<Position>(Allocator.TempJob);
    17.         var speeds = _query.ToComponentDataArray<Speed>(Allocator.TempJob);
    19.         Profiler.EndSample();
    20.         Profiler.BeginSample("SystemA_Loop");
    22.         for (int i = 0; i < positions.Length; i++)
    23.         {
    24.             var position = positions[i];
    25.             position.Value += Vector3.up * speeds[i].Value;
    26.             positions[i] = position;
    27.         }
    29.         Profiler.EndSample();
    30.         Profiler.BeginSample("SystemA_Copy");
    32.         _query.CopyFromComponentDataArray(positions);
    33.         positions.Dispose();
    34.         speeds.Dispose();
    36.         Profiler.EndSample();
    37.     }
    38. }
    40. struct Position : IComponentData
    41. {
    42.     public Vector3 Value;
    43. }
    45. struct Speed : IComponentData
    46. {
    47.     public float Value;
    48. }
    2) I then create 100 entities of matching archetype and start calling SystemA.Update() every frame.

    Code (CSharp):
    2. public class EcsTest : MonoBehaviour
    3. {
    4.     private EntityManager _manager;
    5.     private SystemA _system;
    7.     // Start is called before the first frame update
    8.     void Start()
    9.     {
    10.         _manager = new World("custom_world").EntityManager;
    12.         for (int i = 0; i < 100; i++)
    13.         {
    14.             var entity = _manager.CreateEntity(typeof(Position), typeof(Speed));
    15.             _manager.SetComponentData(entity, new Speed {Value = 0.1f});
    16.         }
    18.         _system = new SystemA();
    19.         _manager.World.AddSystem(_system);
    20.     }
    22.     void Update()
    23.     {
    24.         _system.Update();
    25.     }
    26. }
    3) And here is what I see in profiler:
    - getting component arrays from query takes a whopping 0.41ms on its own
    - iterating over arrays takes another 0.03ms
    - and another 0.17ms to copy the results
    So its 0.6ms in total on i7 laptop, and most of this time is eaten by query methods!
    And thats just 100 entities of single archetype neatly placed in a single chunk...

    Why is it so slow, what can possibly take so long? Am I doing it wrong or measuring it wrong?

    Attached Files:

  2. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    NBender likes this.
  3. Enzi


    Jan 28, 2013
    I've tested your code and got this:


    Cranking it up to 1 million entities it gets into single digit frames BUT with jobs and BurstCompile, 1 million looks like this:

    ComponentSystems are SLOW. Joachim always warns us about this. I also hope it gets faster but right now, performance tests only make sense with Burst and Jobs. Other parts are just as slow or slower because of overhead.

    Attached is also the test with jobs and burst.

    edit: Well, Joachim was faster than me :D

    Attached Files:

    NBender likes this.
  4. Razmot


    Apr 27, 2013
    He's faster, cause it's his job ...

    Ok I leave the forum now ;)
    starikcetin, RBogdy and Lurking-Ninja like this.
  5. NBender


    Aug 12, 2019
    Thank you for confirming my suspicions. If there was a disclaimer saying that you have to basically "go burst or go home" I've definitely missed it. :) In my case I sadly can't move most of the work into jobs because it is reliant on reference-type data, and changing that would require more time and effort than I can spare. I was sold on this whole "performance by default" thing, but apparently it comes with some very important caveats.

    /feedback on
    After toying with unity ecs for about a week, I've got an impression that it moves into... questionable direction. It showcases and optimizes for those weird 1kk-entity scenarios, while most projects will never reach that level of complexity. And projects that do not require that level of optimization and just need to iterate over a couple of hundreds entities are paying the price. They have to either live with those insane performance overheads for task that should be trivial, or embrace the obscured job API with all the additional restrictions and flood the code with job handles, native collections, etc.
    /rant off

    P.S. Turning off leak detection for native arrays somewhat improves performance (х1.5-х2 speed up) but EntityQuery is still too slow for my purposes.
  6. M_R


    Apr 15, 2015
    you can use
    instead of copying the query back and forth to native arrays. that will operate on the data in place
  7. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    Agree for the simple non-scale cases EntityQuery currently has performance issues, we are both aware of it and are working on big improvements. Our goal is to be able to have > 1000's of systems running each processing very few entities with minimal overhead to easily enable 60 FPS games.

    Until now most performance optimisations have been done on making on number of entities processing axis. We are focused on the other perf axis now.
  8. eizenhorn


    Oct 17, 2016
    It’s not his job, it’s great respect to Joachim, that He communicate with people directly, and not through community managers.
  9. Lurking-Ninja


    Jan 20, 2015
    Maybe I just too naive, but I saw the pun in that comment. Although admittedly it would have been better to state that it is his Job.

    Obviously great respect to Joachim, regardless.
    Razmot likes this.
  10. Enzi


    Jan 28, 2013
    It's his IJob, right? Paralleling for, all our posts in 64 different tabs, obviously with burst enabled and no safety restrictions.

    Fun aside, I also don't take this for granted. Without the active communication and highly skilled people here I would not even be here. It's really great to have such a direct way.

    What kind of reference types are you talking about? I can give you some tips how to work around that. I had the same problems at first.

    Here's what I do, regarding the jobs vs componentSystem dilemma. I write and prototype systems as ComponentSystem and then transition into jobs. Most can be turned into jobs but for anything MonoBehaviour related we still have to use ComponentSystems. These can be capsulated from all the other systems so when 1 runs slower because of it, it's not the end of the world. Sure it would be great to have every job just be called on the main thread and be done but that's not something that can be achieved in an instant, I think, especially not at this current stage where core features like animations are a pain in pure ECS.

    I'm using ECS now since the first public release and I have to say, systems, even ComponentSystems are never the bottleneck. Mostly I run into a middleware problem, so for example, rendering mostly, pathfinding, bottlenecks in physics/collisions, etc... but mostly, all games run at 120-200fps.
    What's important to know is that systems don't scale linear and have huge initial costs with ComponentSystems, very few with JobComponentSystems and with adding more systems the cpu time will balance out instead of increasing in a linear fashion. Even my most complex games never reached mainthread limits of more than 16.6ms and I'm mostly doing crazy stuff with scale.

    I've used ECS now in a small scale FPS with around 80 enemies per level, factorio like simulation with 1mil items in pure ECS, city builder with enemy waves upwards of 10k also pure ECS, incremental game like Clicker Hereoes 2 were you can kill 3k enemies in a single frame with mostly ComponentSystems and a multiplayer shooter with hybrid ECS. It's not like all these ran perfect at first but with the debugger you can quickly see which systems are problematic and then optimize this specific part.

    System programming doesn't turn as messy as OOP codebases were everything is so entangled at one point it's hard to optimize specifically. I know this because those projects I've listened are going back to Unity 3.x and were OOP and ended up quite messy or in other words un-maintainable for further features without destroying performance. :D
    GliderGuy and Kender like this.
  11. rsodre


    May 9, 2012
    I see some idle systems taking around 0.4-0.6 ms each, just doing nothing. A few of these will have a big impact on the fps.

    Take this job for example, it's a simple system with just one job, taking 0.45 ms every frame doing absolutely nothing.

    Code (CSharp):
    1. [BurstCompile]
    2.         struct SetTransformersHierarchyFromLinkedListJob : IJobForEachWithEntity<ElementData, TransformerReorderRequestTag>
    3.         {
    4.             [ReadOnly] public ComponentDataFromEntity<LinkedListNodeData> Nodes;
    5.             [NativeDisableParallelForRestriction]
    6.             public ComponentDataFromEntity<Parent> Parents;
    8.             public void Execute(Entity entity, int index, [ReadOnly] ref ElementData elementData, [ReadOnly]ref TransformerReorderRequestTag tag)
    9.             {
    10.                 // do the job...
    11.             }
    12.         }
    14.         protected override JobHandle OnUpdate(JobHandle inputDeps)
    15.         {
    16.             var job = new SetTransformersHierarchyFromLinkedListJob
    17.             {
    18.                 Nodes = GetComponentDataFromEntity<LinkedListNodeData>(true),
    19.                 Parents = GetComponentDataFromEntity<Parent>(false),
    20.             };
    21.             inputDeps = job.Schedule(this, inputDeps);
    22.             return inputDeps;
    23.         }
    Using the same query the job will make to avoid declaring the job got it down to 0.01 ms...

    Code (CSharp):
    1.         private EntityQuery g_ElementsToReorder;
    3.         protected override void OnCreate()
    4.         {
    5.             g_ElementsToReorder = GetEntityQuery(ComponentType.ReadOnly<ElementData>(), ComponentType.ReadOnly<TransformerReorderRequestTag>());
    6.         }
    8.         protected override JobHandle OnUpdate(JobHandle inputDeps)
    9.         {
    10.             if (g_ElementsToReorder.CalculateEntityCount() > 0)
    11.             {
    12.                 var job = new SetTransformersHierarchyFromLinkedListJob
    13.                 {
    14.                     Nodes = GetComponentDataFromEntity<LinkedListNodeData>(true),
    15.                     Parents = GetComponentDataFromEntity<Parent>(false),
    16.                 };
    17.                 inputDeps = job.Schedule(this, inputDeps);
    18.             }
    20.             return inputDeps;
    21.         }
    My guess is that the overhead was generated by the two `GetComponentDataFromEntity()` calls. Even if the job's internal query returns nothing, just by declaring the job those two arrays are being filled. For nothing.

    I copy this way of declaring and running jobs from the many Unity examples. It is very simple to write, understand and maintain, but it has a hidden impact that is starting to bother me.

    Is that the recommended way to declare a job like this?
    Do I really need to add a query to separate the scope of every job that need other entities?

    What if, instead of filling the `ComponentDataFromEntity` directly like this, the Schedule() method had a lambda that would be called only if the internal query has any results, and in that lambda I fill the `ComponentDataFromEntity`?
    Something like this... (syntax may be wrong, just conceptualizing my idea)

    Code (CSharp):
    1. var job = new SetTransformersHierarchyFromLinkedListJob();
    2. inputDeps = job.Schedule(this, inputDeps, (job) => {
    3.     job.Nodes = GetComponentDataFromEntity<LinkedListNodeData>(true),
    4.     job.Parents = GetComponentDataFromEntity<Parent>(false),
    5. });
    Last edited: Aug 24, 2019
    Razmot and Enzi like this.
  12. Joachim_Ante


    Unity Technologies

    Mar 16, 2005
    Are you profiling in the editor? Do you have Jobs Debugger & Leak detection disabled?
  13. digitaliliad


    Jul 1, 2018
    @rsodre, you shouldn't update a system just to see if it's going to schedule a job by calculating some query's length: you simply RequireForUpdate(query) during the OnCreate() function. If you do it this way, the system won't run at all if the query is empty, saving you some overhead.
    Enzi likes this.
  14. rsodre


    May 9, 2012
    Yes to all.
    Off jobs debugger, there's no change in the system's time.
    Off Leak Detection, it dropped from 0.45 to 0.06.