Search Unity

What is the best way to iterate over entities of a query over time

Discussion in 'Entity Component System' started by Ashkan_gc, Jun 29, 2021.

  1. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    I want to iterate over say 100k entities in 20 frames , processing 5k per frame. What is the most performant way of doing it.

    If i put the query results in an array, getting components is slow right? since ComponentDataFromEntity is not linear? Or I'm wrong.

    if i do chunk processing, then if the entity order changes over frames, I'm screwed. What is the best way to do this in ECS atm.
    I'm fine with entities getting destroyed over these frames since I can simply check if they exist or not.
     
    Egad_McDad likes this.
  2. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    Do you know that these entities move between chunks a lot? Can you make sure that those specific entities have no structural changes after they are fully constructed?

    Is it important that it each entity is touched exactly once in 20 frames? What happens if it is more or less often?

    Most efficient is to use IJobEntityBatch or use the Query to get a fresh set of chunks chunks every frame. And just have an int counter, where you left off last frame. Unless all entities change archetype every frame, even if a couple do, eventually you will get to process them.

    If it must be 100% precise and you do need structural changes, then ComponentDataFromEntity is the only way.
     
    davenirline likes this.
  3. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    @Joachim_Ante Well we are writing a utility AI system for our game. Almost done in fact other than this feature.
    Naturally we want to calculate scores every say 1 seconds and not every frame and if entities are too many then it is good to process a bounch of them every frame. They would change chunks every now and then, let's say once per few seconds for going to highly different steps.

    Can I use some trick with chunk components to make sure all entities which have AIAgent get eventually processed so i mark each processed chunk and increase my count and next frame go through the rest and process them and skip the marked chunks so even if they change structure, they still get processed and potentially don't starve.

    Is ComponentDataFromEntity using a query's array much slower than this? I did not look at implementation of it and not the query.ToNativeArray either yet

    I don't care if I touch entities multiple times in a processing wave so if this is the current second and I want to process them in 10 frames, I am fine going over all chunks if this is faster than using an array received from query and then using ComponentDataFromEntity


    P.S It has a debugger and UI Elements based UI and we are going to use it but will probably put it on the store as well. Here are a few screenshots. Our game's vertical slice should be ready in a month or two as well.






     
  4. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    For anyone who reads this. Chunk components cannot be used here because if we are on say iteration 3 of AI and an entity which its chunk component has the value of 2 in its chunk component. then that entity can be skipped by changing archetype and moving to a chunk with the same chunk component type and value of 3.
    The different approaches are:

    - If the processing happens relatively less often, then move entities to other chunks after processing them and after going through them for x frames, at last do a clean up job and process any which is not processed yet.
    - run the job for processing more often so if an entity skipped a few times , it is processed eventually
    - In each entity have a component which is called AIExectuionFrame and then have an initial job which sets the value of this between 0 and your max frame count and then process all entities which have the current frame's number in the current frame nd at the end , every entity is processed. This is not efficient though and is similar to using ComponentDataFromEntity
    - Use ComponentDataFromEntity to process all entities one by one and based on their count divide them.

    Keep these in mind that executing for 2ms per frame is better than executing only once per second and for 10ms. This is unless you schedule many heavy systems to execute once per second in different frames. otherwise it is better to have multiple systems which execute partially per frame.

    AI and other simulation related behaviors are the only things which i can think of that you might want to run less often and divide over frames. Maybe networking and serialization too.

    I'm curious to know how other people handle this, specially in AAA games and will look into GDC talks and ... and will update here. I remember Naughty Dog and a few others had great talks on their job systems.

    All this said in many scenarios dividing your system into enough jobs could be good enough if your number of entities/amount of processing is not too big.
     
    apkdev likes this.
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    I don't understand your reasoning here. Is the issue that the entity that moves chunks is getting updated too often? Or not enough?

    If not enough, then sort chunks first by change order version and then by when they were last updated (using chunk components). Then process the first X number of chunks in that sorted list.
     
    SolidAlloy likes this.
  6. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    Could a shared component data be used to fragment the chunk into X set of entity of the same size ?
    When creating an entity you could give the number of entity per shared component value and assign it to the least populated one or create a new one if the max entity per frame to process is reached.
    Then you can iterate over your chunk based on the shared component value you want to process this frame and cycle through the number of shared value you have.
     
  7. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    You can add a system state component to the entity with lastComputedFrame.
    When the system run pass in the frame number.
    If the entity was not processed since x number of frame, process it.
    Otherwise skip it.
     
    Last edited: Jul 3, 2021
  8. threedots1

    threedots1

    Joined:
    Oct 9, 2014
    Posts:
    88
    With computationally heavy stuff like pathfinding and AI I don't want multiple entities processed in a single job. It could end up with a perfect storm of a large number of entities all ending up on one worker thread.

    My solution has been keeping a timer component on entities and adding them to a native queue or unsafe ring queue when they need reprocessing. These queues are then dumped into native lists respecting a maximum number of entities allowed to be processed per frame based on perf etc.

    These lists are then used in IJobParallelFors so I can restrict each worker thread to 1 entity at a time. Results are really good, very tightly packed thread usage.
     
  9. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,761
    Unity already added a solution for this, IJobEntityBatch and batchesPerChunk which allows you to split single chunks over multiple worker threads.
     
  10. threedots1

    threedots1

    Joined:
    Oct 9, 2014
    Posts:
    88
    My hesitancy with using that method is that I would have to be changing entity archetypes whenever an entity needs to be reprocessed.

    If the entity has a tag component added when an update request is made I have no control over how many entities are updated per frame or the order of updating. If the entity has a tag component added to signify an update to be processed this/next frame then I can still control the entities per frame, but there's a lot of chunk rearranging and fragmentation for little gain in my mind. Now the entities could be split into many extra archetypes based on the combination of pathfinding, influence, AI update requests etc which would slow everything else down.

    The cost of sticking with an IJobParallelFor is the random access to pull in entity data to run the job, but this would be dwarfed by the processing time to run the actual logic.

    There is the possibility to use IJobEntityBatch with a Native array of filtering entities, which is what I was using at one point, but there is no overload that accepts batchesInChunk, so it goes back to the problem of uneven performance.
     
  11. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    Well yes too rare is my concern and what you are describing is a potential solution to get chunks and sort them and execute a bunch in some jobs every frame however there are two problems. One is that the version changes by every read-only system passing by which you can fix but not running throw those which you don't need to proccess but the bigger problem is, if an entity is in chunk with ProcessedFrame=2 and it moves to a chunk with ProcessedFrame=4 before we process it, it doesn't change the value of the chunk component for its new chunk and it should wait until the next time that the chunk with ProcessedFrame=4 gets processed. If an entity just lose an update, it is fine if the execution times are relatively close, however if they are not or if bad luck causes an entity to have this state two times in a row then it might show up in its behavior.

    To be more clear. Entity A is in chunk C1 which is processed at frame 2 and we are in frame 4.
    We process chunk C2 and also some other system moves entity A to C2 after this frame.
    C2 still has frame 4 at its processedFrame and at frame 5 we process C1 but A is not in it and we skip C2 because it is processed last frame and will be processed in the next iteration of the system x seconds from now.
    A skipped 1 iteration of the system and if x is big enough or happens multiple times to an entity then it is a problem. Specially if your entitites change archetype much.

    Shared components usage like this would mean too much copies.

    Your other suggestion regarding system state components is good but then still I can process too much each frame. A random number can be used for x but that is not still reliable and can create spikes.
     
    Last edited: Jul 4, 2021
  12. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    This might end up being what I will doo as well and I suggested above as well. It depends on how much processing of each entity takes in my case and if I gain compared to sorting chunks and executing a set of them per frame.

    Our AI is utility AI and the comutations are essentially a big number of floating point multiplications. after SIMDifying them, I'm not sure if that is still that big which the cost of it is much bigger than a cache miss or not. There are considerations like path finding queries and influence map look ups which are in the category of things you describe. Unity itself processes physics entities like this for more control over how to execute them. It is not divided between frames but all entities are written to a stream and then processed and written back to ECS at the end.
     
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    So what I am about to propose may not be granular enough if your entities jump between archetypes too much.

    Let's suppose your chunk component stores a frame number for the frame the chunk was last processed.

    1) Find the minimum chunk component value for all chunks and store it in a minFrame variable.
    2) For each chunk with a dirty order version number (a chunk that had an entity added or removed since last frame), set the chunk component value to minFrame.
    3) Sort all chunks by their chunk component values.
    4) Process the first X chunks using an IJobFor, updating their chunk component values.
     
  14. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    As you say this can become problematic if you say process 10% of chunks frame and say 2% of them change every frame or every 2-3 frames even but for other cases it is a good way to do it or even better keep track of chunks which changed in a list and work on them after all are iterated so you don't process a chunk more than 2 times.
     
  15. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    Ok so say you want to precess your entities over 20 frames. Make a system with a system state component and a nativemultyhashmap. When the system detect an entity without the system state component it add it to the map with frame index wich is the least populated (and add the state component) then for each update of the system use the map to cycle through the list of entities to process and filter the other out.
    You could also set a max per frame in wich case if all available frames are full you can accept a new frame to the map and cycle your entities on 21 frame up to the maximum acceptable number of frame between entity update.
     
  16. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    Yes I wrote about this approach in the initial post of the topic

    > If i put the query results in an array, getting components is slow right? since ComponentDataFromEntity is not linear? Or I'm wrong.

    I wanted to avoid this to keep linear access. With more thoughts however since the entities still need to look at parameters of other entities to calculate their score, the algorithm has enough jumps that probably it doesn't matter. I'll profile this.
     
  17. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    I did not mean to use component data from entity.
    I meant use your normal job but add a guard clause if the entity being processed is not in the list of entity to process this frame. Or use one of the extension of IJobEntiyBatch to provide the list if entity to process.
     
  18. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    I'm not sure of the implementation of the ComponentDataFromEntity yet but if it is not implemented very naively then it should probably not perform worse assuming it is not just a hash table and when you ready entity x from a chunk from it and then x+1 from the same chunk then it should not encounter a cache miss.
    I'll take a look at this and also the extention methods of IJobEntityBatch

    Thanks
     
  19. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    My understanding is that Unity and ECS does not do anything to prefetch data, the aim is more to layout data in compact memory so that it's laoded efficiently in CPU cache.
    So for me the naive implementaiton your refer to is pretty much what happens.
    Your CPU load the cache line size equivalent of data you fetch.

    Let's say a cache line is of size X, in X you can fit 4 component C and your game has 6 component C on entities.
    What ECS will do is put those 6 component C data one after another in memory so if you look up the first C, the CPU will try to load a cache line starting from the memory adress of the first C up to it's cache line size so you will have in cache from the first to the forth component C.
    If your job only works on the 4 first compoent C you won't have any cache miss but if you want to look up the value for the 5th or 6th component C, you'll have to load anouther cache line.

    Note this is my understanding I'm not an expert in those sort of things so I may be completly wrong.
    Either way the best thing to do is to try it out and pick whatever solution fits your requirement (simplicity/flexibility/preformance).
     
  20. Ylly-avvyland

    Ylly-avvyland

    Joined:
    Jan 6, 2021
    Posts:
    7
    if i use Entities.Foreach().SceduleParallel() it will be less performant than IJobEntityBatch ?
     
  21. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,685
    For now - yes a bit, as they'll be converted to IJobChunk by ILPP, but in the next releases, when codegen will be changed to IJobEntityBatch, they'll be the same and ForEach version will be even a bit faster in some cases as Unity codegen will optimize them under the hood as much as possible.
     
    Ylly-avvyland and apkdev like this.
  22. MaNaRz

    MaNaRz

    Joined:
    Aug 24, 2017
    Posts:
    117
    I'm writing an UtilityAI myself and wondered about the same issue but performance is good enough at the moment to do everything every frame so i haven't bothered implementing it yet.
    Couldn't you use SharedComponents and just have a static int parameter in them that splits up your Entities in chunks you can filter? So every Frame your System would just increase the parameter by one and processes all chunks where the shared component has that value. if you reach the highest value you start from the beginning without ever changing the contents of the shared component and thus never changing the chunk layout. This way even if Entities change their chunk based on other systems you still guarantee you process every entity exactly the same amount of times.
     
  23. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    The goal was to minimize entity copies thus using the chunk component/versioning values to find it out, otherwise yes a shared component could be used but it would mean all components in a chunk need to be coopied after processing unless unity was smart enough that if all entities in a chunk changed their shared component value into the same value, it wouldn't copy them and just change the index that they maintain for chunks and shared components if any.
     
  24. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    992
    Changing the shared component index would only happen if you needed to rebalance the number of entity per shared index value.
     
    Ashkan_gc likes this.
  25. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    1,124
    I did not mean that unity prefetches data but depending on the data layout and implementation of ComponentDataFromEntity, it might cause a a cache miss even if the entities which you give to it subsequently reside in the same array beside each other. I'll check its impelemntation on saturday and report back.

    Cannot wait for GDC because hopefully unity releases animation package in preview (no longer experimental) and the shooter demo with it which means we can add the last part of the puzzle to our game and I'm super excited to show it to you guys here. We are building the website, doing some models and ... to look for publishers and using DOTS despite the hardships of early technology was a great decision.
     
    WAYNGames likes this.