Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

So.. I was using ECS the wrong way... #Story

Discussion in 'Entity Component System' started by Ziboo, Apr 26, 2020.

  1. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    Hello everyone,

    I wanted to share my experience with ECS with you.
    I spend a lot of time reading documentation, forum, best practices, etc...
    Mostly my take away to be effective in ECS were:
    • Multi Threaded code with Jobs
    • Avoid sync points when possible
    I was in a mindset where if my system was taking more that 0.01 ms it was not optimized, I was doing something wrong.

    For a frame of reference, my game is World oriented, so I'm simulating hundreds of different Worlds.
    So my guess at first was to use Jobs a lot to parallelized everything and use command buffers to avoid sync points.

    But I didn't had the performance I wanted, worked for a week on trying to optimize everything...

    I modified all my system to use .Run() instead of .ScheduleParallel() and I gained 60 Fps.
    I was amazed on how fast already is .Run() when working with Entities. It's crazy fast !

    Conclusion: don't use Jobs until you really need it !

    That might be a very dumb conclusion, but I don't think it is enough said when you read about ECS and all. You think you need to do all those crazy Jobs, Chunk iterations, etc...

    So for every beginner out their, don't try to hard.
    Build you systems with .Run until you have a big performance hit and really need to use Jobs.
    I don't know if it's gonna be better in the future, but Scheduling a job takes a lot of time for simple systems.

    Hope this help some people.

    Cheers
     
    NotaNaN and jdtec like this.
  2. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    I'm gonna go out on a limb and theorize that you probably did something wrong in your tests. There's almost no way this could be true if everything is done properly

    Could you share some code examples of a system + job that runs way faster with .Run than with .ScheduleParallel?
     
    deus0, MNNoxMortem, Orimay and 3 others like this.
  3. RoughSpaghetti3211

    RoughSpaghetti3211

    Joined:
    Aug 11, 2015
    Posts:
    1,697
    I read it as, even without ScheduleParallel there was a 60 FPS gain and more potential if u use ScheduleParallel. But now I’m not sure how to read that.
     
  4. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    Oh.... I see what you mean now

    I still think the whole "wait until you have performance problems before you optimize" mindset is very very often a bad idea. An optimization that takes 1-2 hours when done early can cost you months if done later

    It always depends, of course. But when the optimization is as obvious as working with the Job System, I think it's definitely worth it to make the effort to use it properly from the start
     
    deus0, NotaNaN, Orimay and 2 others like this.
  5. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    Maybe it's really specific to my case, but Scheduling the job was indeed taking more time that just using .Run().
    I'm not saying that you NEED to do that, and that profiling and optimizing is not important.
    I'm just saying that .Run() can give you just what you need and in some cases better performance.
    So don't throw away .Run() just yet ;)

    I was just in the mindset that if I didn't use Jobs it was wrong.
     
  6. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    I think you need to remember that multithreading comes with the cost of copying data and job scheduling. Optimization don't mean "now everything will be multithreaded.
     
  7. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    When you have small entity/chunk counts, the overhead of scheduling can be higher than just executing the code with .Run(). Do note, that we are doing a lot of work in order to make that not be so...

    Specifically we are:
    * Adding support for completely bursted struct based systems. So a system itself can be burst compiled.
    * Doing a bunch of optimizations in IJobChunk & JobScheduler to reduce overhead.

    Essentially you can say right now what DOTS is truly amazing at is scale on the axis of large entity counts.
    But what we are focused on optimising now is speed on the axis of number of systems with small amounts of entities.
     
    deus0, bobbaluba, _met44 and 19 others like this.
  8. BackgroundMover

    BackgroundMover

    Joined:
    May 9, 2015
    Posts:
    215
    Mike Geig mentioning its sometimes more efficient to run simple jobs on the main thread, vs incur job overhead
     
    Egad_McDad and SamOld like this.
  9. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    I'm using Burst everywhere.
    For the sync points, I'm trying to avoid them, but in the current state of the debugging tools, it's also hard to debug where it happens or maybe lack of experience.

    That's good news.
    It's pretty much what I had I guess.
    I have a lots of Worlds, with lots of independent systems, not a lot of entities (~1000) per world.
     
    deus0 and SamOld like this.
  10. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    that could make sense. If you have let's say 1000 ECS worlds with 50 systems/jobs each, that would mean 50000 jobs to schedule. Could be where the overhead of ScheduleParallel comes from. But if you truly have a huge quantity of Worlds, maybe a single-ECS-world setup would perform waaaaay better, and a different strategy could be used to represent the concept of a "world"

    Still, it'd be interesting to see code examples and project settings. I could imagine .Run() performing a bit better than .ScheduleParallel() at low entity counts, but the 60fps gain is a bit suspicious (I'm assuming you went from something like 30 to 90fps, and not 400 to 460fps, which would be a relatively small gain). Maybe there's an easy fix

    Some thoughts:
    - did you try running this in a build?
    - is Burst Compilation enabled in the top menu option?
    - are safety checks and Leak detection disabled?
    - is Burst compilation set to Synchronous? (if not, performance will be bad for a pretty long time after you press Play, but will eventually settle down)
    - Do you exclusively use the new math types/operations from Unity.Mathematics in jobs?
    - Maybe your Jobs are used in unintended ways
    - etc, etc....
     
    Last edited: Apr 26, 2020
    bobbaluba, Orimay and SamOld like this.
  11. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    The gain was in a Build from 40fps to 100 fps. I know the editor has a lot of overhead.
    I use Unity.Mathematics, yes
    Maybe your Jobs are used in unintended ways ?
    Maybe ^^ that's were it's hard to say. But like Joachim_Ante said, I have a lot of simple short jobs, so I guess I was paying for the scheduling more than the job it self

    A note:
    - I have a lot of system that needs to work on others entities, ComponentDataFromEntity, I'm checking neighbours, etc...
    - I have a AMD Ryzen 9 3900X, 12 dual cores. Does having a lot of cores also impact the time to schedule Jobs ?
     
    Last edited: Apr 26, 2020
  12. l33t_P4j33t

    l33t_P4j33t

    Joined:
    Jul 29, 2019
    Posts:
    232
    It might have something to do with the multiple worlds and system groups bit.. no? there's something funky going on there imo on my end.
    my performance gets destroyed when i run anything in ghost prediction system group, each system takes up 1-2ms at least even if its just changing one rotation component on one entity, as is the case with body rotation system for me

    are you using lots of worlds to simulate different clients in netcode?
     

    Attached Files:

  13. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    I'm not using multiplayer.
    I just have multiple custom worlds.
    I remove Rendering and Transform Systems on the world that is not currently shown to the player though to gain performance
     
  14. RoughSpaghetti3211

    RoughSpaghetti3211

    Joined:
    Aug 11, 2015
    Posts:
    1,697
    100% agree
     
  15. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Thats because the prediction group has to run multiple times per frame...
     
    l33t_P4j33t likes this.
  16. vildauget

    vildauget

    Joined:
    Mar 10, 2014
    Posts:
    120
    I can't help but notice nobody mentioned .Schedule() as the third alternative.

    In a simple system of mine (finally get some working, yay), I get 0.06 ms with .Run(), 0.14 with .ScheduleParallel(), and only 0.03 with .Schedule() on a normal frame.

    It might be because I'm having to account for possible of structural change, so that .Run() creates a new sync point, I guess.

    Just don't forget Schedule() as an option, it'll be sad left alone in the dark. :)

    Code (CSharp):
    1.     protected override void OnUpdate()
    2.     {
    3.         var worldSquareCreateDistance = _settings.worldSquareCreateDistance;
    4.         var worldSquares = _worldSquares;
    5.         var ecb = m_EndSimulationEcbSystem.CreateCommandBuffer().ToConcurrent();
    6.         var archetype = _archetype;
    7.         Entities
    8.             .WithName("CreateNewWorldSquares")
    9.             .WithAll<PlayerTagComponent>()
    10.             //.WithStructuralChanges()
    11.             .ForEach((int entityInQueryIndex, in WorldSquarePositionComponent worldSquare) =>
    12.         {
    13.             // create new worldSquares as necessary  
    14.             for (int x = worldSquare.Value.x  - worldSquareCreateDistance; x <= (int)worldSquare.Value.x + worldSquareCreateDistance; x++) {
    15.                 for (int z = (int)worldSquare.Value.y - worldSquareCreateDistance; z <= (int)worldSquare.Value.y + worldSquareCreateDistance; z++) {
    16.                     if (! worldSquares.ContainsKey( 'x' + x.ToString() + 'z' + z.ToString() ) ) {
    17.                         var entity = ecb.CreateEntity(entityInQueryIndex, archetype);
    18.                         ecb.SetComponent(entityInQueryIndex, entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
    19.                         //var entity = EntityManager.CreateEntity(archetype);
    20.                         //EntityManager.SetComponentData(entity, new WorldSquarePositionComponent{Value = new int2(x,z)});
    21.                         worldSquares.Add( 'x' + x.ToString() + 'z' + z.ToString() , true);
    22.                     }
    23.                 }
    24.             }
    25.         }).Schedule();
    26.     }
     
  17. brunocoimbra

    brunocoimbra

    Joined:
    Sep 2, 2015
    Posts:
    679
    It remembers me of that feature request: https://forum.unity.com/threads/request-scheduleauto-dep-chunkthreshold.870163/#post-5727157
     
    JonBFS and vildauget like this.
  18. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    Out of curiosity, what's a rough estimate of your nb of Worlds, and nb of your own jobs that are ran per World?

    And what is the main reason for a separation into many Worlds in your project? Maybe you are using lots of worlds when you don't really have to

    Let's say we call an ECS world a "World", and your in-game worlds a "level" for the sake of readability. You could have:
    • a "visibleWorld" containing the entities of the level that's currently visible
    • an "invisibleWorld" containing the entities of all the levels that are not visible. All in the same ECS World
    • Have your levels be represented by an Entity with a DynamicBuffer on it, containing all the Entities that belong to this level. This way you know which Entities to transfer to the visibleWorld when a level switch happens
      • If necessary, you can also have a BelongsToLevel (containing the Entity of the parent level) component on your entities so you can retrieve the parent level
      • The level Entity can also contain any additional data that is specific to that level
    This kind of setup would definitely reduce the amount of jobs to be schedule by a lot, and will allow you to make good use of parallelization because nearly all of your entities will be in the same World. Someone correct me if I'm wrong, but I think the main reason to put things into a different world is when there are differences in the types of systems that are run, and/or the frequencies at which they are run
     
    Last edited: Apr 27, 2020
    bobbaluba likes this.
  19. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    I don't have a specific count for the number of Worlds or Entities the game could have at the end.
    I think that you're right, I guess it's possible to not use different Worlds at all.
    But I decided to uses different Worlds for simplicity I guess and/or lack of experience, for instance Debug entities with the Entities Windows using World Filtering.

    Here is a small exemple:

    Code (CSharp):
    1. var cropsEntities = this.cropsStorageQuery.ToEntityArray(Allocator.TempJob); //Get All Entities that are Crops
    2. storageBuffers = this.GetBufferFromEntity<StorageSlot>(true); //Get All Storages (IBufferElementData)
    3.  
    4. this.Entities
    5.     .WithNone<FlyDestination, TargetEntity>()
    6.     .WithName("CropsGathererRobots_FindTarget")
    7.     .WithReadOnly(storageBuffers)
    8.     .WithDeallocateOnJobCompletion(cropsEntities)
    9.     .ForEach((Entity entity, int entityInQueryIndex, in CropsGathererRobot robot, in StationReference stationReference) =>
    10.     {
    11.         {
    12.             Entity cropsTarget = Entity.Null;
    13.  
    14.             var maxDist = float.MaxValue;
    15.  
    16.             for (var i = 0; i < cropsEntities.Length; i++)
    17.             {
    18.                 var cropsEntity = cropsEntities[i];
    19.  
    20.                 //<--- HERE I would need to check if the cropsEntity is in the same "Fake World" than my robot Entity
    21.            
    22.                 if (!storageBuffers.Exists(cropsEntity)) //Check if Crops has a Storage
    23.                     continue;
    24.  
    25.                 var cropsStoragesBuffer = storageBuffers[cropsEntity];
    26.            
    27.            
    28.                 //Do something with Storage
    29.             }
    30.         }
    31.    
    32.    
    33.     }).Run();

    If I go your solution, it means that in every ForEach lamba that I do, I would need to filter out all my components / entities per "fake world". That could be a lot of boiler plate code, where Worlds just do it for me.

    I could use a SharedComponentData like shown in the doc:
    Code (CSharp):
    1. public class ColorCycleJob : SystemBase
    2. {
    3.     protected override void OnUpdate()
    4.     {
    5.         List<Cohort> cohorts = new List<Cohort>();
    6.         EntityManager.GetAllUniqueSharedComponentData<Cohort>(cohorts);
    7.         foreach (Cohort cohort in cohorts)
    8.         {
    9.             DisplayColor newColor = ColorTable.GetNextColor(cohort.Value);
    10.             Entities.WithSharedComponentFilter(cohort)
    11.                 .ForEach((ref DisplayColor color) => { color = newColor; })
    12.                 .ScheduleParallel();
    13.         }
    14.     }
    15. }
    But that's pretty much the same thing as each Worlds Scheduling the job I think (minus systems overhead for sure)
     
    Last edited: Apr 27, 2020
  20. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    Also after thinking about it.

    If I have only one "Fake World", even a small change will affect the chunks / creating sync points, invalidating arrays, where if I'm Using ECS World, if a World is not really active (not a lot of things happening) at least it will not affect the other Worlds.
    Also in a future, if I want to tick some Worlds slower that would be also easier.

    It would need a try to see if paying the Worlds / Systems overhead is smaller/bigger than everything in one
     
  21. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    I see now how multiple worlds do make this problem easier, and I agree it's not instantly obvious if the structural changes will become too heavy or not if you go with the combined world. And yeah, it looks like in this case, the SharedComponent would not reduce the amount of jobs you need to schedule

    One thing you could try, with the "level Entity" thing of my previous post, is this:
    • On the Level Entity, you would have a DynamicBuffer of "CropData" which is an IBufferElement containing information relevant to what your robots need to know about the crop (remember to initialize it with a large enough initial capacity for better performance). CropData is not the same struct as your actual Crop component, but it contains the crop Entity, the amount of storage left, and other data about the crop
    • All your entities have a BelongsToLevel SharedComponent (sharedComponents have zero per-entity overhead) so they can retreive their associated Level Entity
    • First, launch a job that clears all CropData buffers
      • for each level, do cropDataBuffer.Clear()
    • Then, launch a job whose purpose is to fill that DynamicBuffer of CropData on every Level Entity for this frame:
      • for each crop, get the level Entity they belong to, then get the CropData DynamicBuffer on that entity, and then Add() a new cropData for this crop
    • Then, launch your robots job:
      • for each robot, get the level Entity they belong to, get the CropData buffer on that entity, and iterate on the buffer to do whatever calculations you need
      • Once you've found which Crop you want to do something with, you can access the Crop itself because CropData will contain the crop Entity. You'll want to avoid having to access the actual Crop entity on every iteraion, so try to work only with CropData for as long as you can
    Basically, the purpose of those first 2 jobs is to make the robot-to-crop iteration as fast as possible by removing the need to get Crops or StorageBuffers by Entity. You could even go one step further with IJobChunk and get the CropData buffer only once per chunk instead of once per robot, because you would know all robots in the same chunk would belong to the same level (because of the BelongsToLevel SharedComponent).

    This solution gives you the benefit of multithreading + much less scheduled jobs, but it has the downside of being more complex (and we have yet to confirm if it truly is more performant, but I'd be pretty confident about it). I'll admit I was wrong about my initial assumption that you were mis-using DOTS, though. The best solution is definitely not obvious. Sorry! :D
     
    Last edited: Apr 27, 2020
    NotaNaN and SamOld like this.
  22. Ziboo

    Ziboo

    Joined:
    Aug 30, 2011
    Posts:
    356
    No problem ;) always good to have other feedbacks.
    I will keep your solution in mind for the future. It's interesting but you can see that it's a lot of work ^^.
    And that's just for the Crops ^^
    I have many more things that I'm doing.