Search Unity

Feedback My week with DOTS

Discussion in 'Data Oriented Technology Stack' started by Ferazel, May 25, 2019.

  1. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    This post is meant for the Unity devs looking for user feedback, but if anyone else has feedback or notes on the problems discussed feel free.

    I have about 9 years of programming experience with Unity now. I have shipped titles small and large. I would rate my programmer skill as above average (not amazing, but definitely not the worst). One of the perks my company does is a week game jam of sorts. This year I decided was the year to take a stab into the latest ECS APIs (preview 0.0.12-preview30) to see if I could get a handle on the concept and become an advocate for it in the future.

    My game was a 2D isometric ant simulator (something I felt would be well suited for ECS) where each ant would be an entity and we would have basic pathfinding to find food and return to the ant hill. Luckily one of my co-workers was willing to take the deep plunge with me and helped me with the project. We had no other Unity resources or mechanics other than what we were able to find online and the sample repository.

    First the good!
    1. When ECS works, it REALLY works. I can definitely see the future in this regard for game logic iterating over multiple entities. Creating systems that managed the death mechanics of all of the entities in the world so that we could do cool things like fade out or other behavior was really powerful.
    2. Splitting the data away from behavior. This was a wonderful change of pace from the OOP world we live in. I really appreciated the way that ECS helped really isolate behavior in a new way.
    3. The samples have a lot of good fundamental data in them to get up and running. They were often a big resource for us, especially as we were getting started.
    4. Having the C# code is really handy. Searching the Mathematics library or doing a deeper dive into the renderer system was a huge help for us. Please continue to keep sharing the backing code as much as possible!
    5. Burst + Mathematics + Jobs is a beast given form. When we could use the system fully, it tore through anything we could throw at it with ease.

    So here are some pain points that we suffered when we were trying to develop this project. I fully realize that these pain points likely have easy solutions, but we weren't able to figure them out with our week with the system. I just want to give you an idea of maybe where the current experience might be lacking. I know that a lot of this has already been mentioned on various forum posts and there have been a lot of "we're working on it" posted from Unity.
    1. The rendering is completely barebones. This was probably the biggest surprise for me, with 2 years of API, demos, and work that has been made in the DOTS system. We were trying to render transparent sprites with correct sorting. I knew we would likely need to write a sprite switching system. However, when we found out that the basic renderer doesn't support MaterialBlock data easily, I almost threw in the towel right then and there. Luckily my partner was willing to tackle the challenge of going through the hybrid renderer and writing one that suited our needs better. If you're going to basically enforce GPU instanced meshes by default, you'd figure you'd have a better way to apply material block data. There are only so many demos you can use instanced opaque geometry.
    2. There is still a lot of magic. Systems automatically being added to the simulation, readonly data dependencies, system component execute queries. I really appreciate the more recent pass on the API to try to clean it up a bit by adding ForEach/ForEachWithEntity/RemovingInject. However, at the end of the week my co-worker ended up preferring using the underlying ChunkJobSystem because it didn't hide much behavior. My problem was that I found the chunk system very verbose so I tried to use the IJobForEach systems. What I found was situations where I would write an EntityQuery (GetEntityQuery()) to feed into the job, but the System would think *that* was the query it needed to have in order to operate the system. I had already defined the filtering criteria on the job template. This was just a minor confusion point for a bit while I tried to figure out why my system stopped updating. My recommendation is there is magic happening still, and it is fine to hide the IJobParallelFor and the Chunk iteration, it is not ok to suddenly stop working because I added another entity query to the system and it didn't know what entities it needed to operate on. I realize that this is a very fine line to walk, but the magic that is happening when you throw component types around and "it just works" can be really confusing and problematic. I don't want the verbosity of chunk jobs, but I also want slightly more explicit behavior.
    3. Sharing multi-threaded data efficiently is hard. I realize that this is a topic that could fill books, but sharing data with the job system created some fundamentally difficult problems with a lot of hidden costs. For example, we have a relatively large path finding data buffer objects (probably about 256KB of data per layer stored in NativeArray<float> collections). I struggled a lot to figure out a good way to send this data to jobs in a way that was efficient that played nicely with Burst and the job system. I didn't want to create an entity per path node when all it really needed was a float. Also it would then become difficult for me to do fast positional data lookups into the array using a position. I didn't want to create copies of this data every frame for every job. I also wanted to wrap this data in some nice convenience methods. I ended up having a static class that has static buffers/methods to manage and maintain/query this data (not very DOTS of me, I realize). However, it was global data that couldn't be bursted, and I didn't have a great solution for.
    4. Burstable command buffers. Nothings quite disappointed me more than removing the BurstCompile off a burstable job because I needed to execute a CommandBuffer request. If commands buffers are meant to be the primary communication structure to execute data changes on entities not included in the system query, it seems weird to me that they would throw the work into the trash. I'm not sure what the answer is, but I thought I'd mention it.
    5. Syncing work can be very difficult. We had food entities in the world that when they ran out of food would be destroyed. However, we struggled for a long while to figure out how we could enforce the hard limit on distributed food in a good way. It was easy to write a job that would query the food entities for the ants and if they were within range reduce the ant by 1 via a command buffer. However, it seemed difficult to enforce the ability to enforce that the data would not go below 0 food for other entities that might also share that sync point data. I ended up letting the simulation go negative if it needed, but I could see this being a problem in a more rigid environment. Is there a good way to enforce that data on other entities is correct? I'm still not really sure.
    6. Maybe this was fixed in a new package, but we needed to still manually add the hybrid renderer to a link.xml in order to prevent the IL2CPP from stripping the Hybrid renderer.
    7. Clearer separation between simulation and rendering worlds. I found out about this VERY late, but the idea of having separate worlds, one for rendering and one for the simulation would have been really helpful for us. Although we didn't have time to split the work. I'm a little confused still about how queries from one world would be allowed in another? Maybe an opportunity for an advanced sample?
    At the end of the day, I'm really happy that I spent a week with the DOTS system. It certainly gave me an interesting experience. I know you're still working on it, and I appreciate you taking your time to get it right. I also fully admit that a lot (if not all) of my problems may already have solutions in the current API. I mainly wanted to give you a user story of two OOP programmers doing a deeper dive into the waters of DOTS independently for a week.

    Thanks for your time!
     
    Last edited: May 25, 2019
    florianhanke, pakfront, RaL and 17 others like this.
  2. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    232
    When you have a job that needs command buffer then split it in two jobs. One that does calculations and another that uses command buffer. Code monkey recently has posted a good example of that on YouTube.
    Burst compiled code is so fast because of restrictions it has on the jobs. Allowing usage of command buffers would make it slower.
     
  3. Shinyclef

    Shinyclef

    Joined:
    Nov 20, 2013
    Posts:
    364
    #2. I agree on too much magic with the queries. I'd rather explicitly register my query as a pre-requisite for system execution.

    #4. They are planning to make burst work with command buffers and improve their performance in general.

    #7. Don't worry, this is intended default behaviour, but was not working in p30. Simulation is meant to run in FixedUpdate and Presentation in Update. See my thread on this where I was wondering the exact same thing: https://forum.unity.com/threads/render-sim-world-approach.673177/
     
    Jes28 likes this.
  4. Brendon_Smuts

    Brendon_Smuts

    Joined:
    Jun 12, 2017
    Posts:
    43
    Can you give a more detailed example of your use case for point #3? There should be protections/automatic dependency handling for jobs that work with the same data to prevent any sort of race conditions unless explicitly disabled.
     
  5. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    7,098
    Regarding #3 this sounds like a problem I hit using jobs and native containers, when dealing with 2D or 3D spatial information higher dimension arrays are ideal and seem unsupported. I think support for 2d/3d/higher arrays with a block/region slicing system so agents could get a copy of the data around their location would be ideal for this kind of problem and allow for the data to fit into the L1 cache.
     
  6. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    Thanks everyone for the replies.

    @Micz84 Yeah, making two separate systems for this was done once or twice to test it out. However, I would say that it is a very verbose solution. With systems in general taking a bit of time to setup/test, when I already have the system that has all of the data I need.

    @Shinyclef That is interesting that they are planning on running the simulation step on FixedUpdate. I think that would work for my case, but I'm not so sure it would work for situations like what they experienced on Overwatch with delayed network packets as mentioned in the Blizzard Overwatch talk

    @Brendon_Smuts #3 was pretty interesting and I fully admit that I may have been doing something wrong. We still were able to put the logic into jobs, but we were not able to do any burst compilation on them.

    With the pathfinding array being relatively large (256x256 nodes) and needing to be accessed and managed through multiple systems my initial instincts were to make it a static NativeArray which burst immediately complained that it didn't like. I didn't think making copies of the data for the sake of being able to burst it made a lot of sense in this case with the data being so large.

    I did consider ditching the array and making an entity for each node, but I was worried that even a 16-byte overhead per node would be too much. Also the locality of the data being useful for index purposes was too much to give up. It does appear that this is a common problem and Unity is thinking about it.

    Best practices for big data sets. Arrays on entities?
    Struggling to understand how to implement voxel engine in ECS
     
    Last edited: May 25, 2019
  7. starikcetin

    starikcetin

    Joined:
    Dec 7, 2017
    Posts:
    230
    Excellent feedback, thanks for sharing.
     
  8. jdtec

    jdtec

    Joined:
    Oct 25, 2017
    Posts:
    86
    You could try using a DynamicBuffer? aka an array attached to an entity.
     
  9. Micz84

    Micz84

    Joined:
    Jul 21, 2012
    Posts:
    232
  10. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    I did look into the DynamicBuffer, but it didn't seem right for the needs I had for my pathfinding. First, it appears to to be used in a situation that you are adding buffer data into the entity archetype. I feel like the better use case for a buffer would be if each pathfinding agent needed to remember the last 8 nodes they visited for example? I could definitely be wrong though in this assumption though.

    I think the best ECS style approach I left with was that using a shared component data might have been a better choice if I wanted to try to put the data onto the agent entity.

    @Micz84 Yeah, that seems a little more complicated than what I was trying to achieve, but thanks for the additional information.
     
  11. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,385
    Nice sharing @Ferazel. Thx.

    I would say week may be quite short, to get grasp many concepts of ECS, specially when it evolves. Documentation also may feel insufficient atm. For myself, it takes me weeks and months, to learn new approaches. That not even mention continuous previews changes, which also adds up to the pail, of stuff to learn.

    Optimal solutions are easy to get missed. Just like learning any programming language.
    I think I have seen some post of yours in past. But I could suspect, if you decide to stick for bit longer with ECS / DOTS community and also engage bit more, with questions, I think you will find this paradigm suitable for your ants :)
    I am sure for example, mentioned buffer would suit you well, if you put a bit more thought into it.
     
  12. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    Oh yes of course! I'm sure I missed many solutions. I'm going to be watching the changes and dabbling here and there in the future too. I haven't given up on ECS/DOTS by any stretch. Even with our inexperience, I'm sure we got better perf than if we would have used GameObjects. My goal here was to give a anecdotal user case study of what the learning curve felt like when tackling DOTS for a game jam style project.
     
    Antypodish likes this.
  13. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    199
    Considering #5
    Writing back and not getting the updated data in further iterations is actually the problem here, right?
    As a solution, you could build up a temporary hashset that tracks how much was consumed during the frame. I think any sycning problem can be solved with localized cache data inside systems or DynamicBuffers because you can instantly write back the changes.

    It may seem like a pain but it's a bigger one when you need to make a barrier and run the system again or force a complete on the job which would be the normal way to handle syncing but I don't think you have "that kind" of syncing problem.

    #3
    As others have said I'd go with a dynamicBuffer as first instinct but how does your data look like? What are the requirements and is it a full grid with lots of individual data?
     
  14. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,671
    First of all thanks for sharing. We really appreciate feedback in this format in particular. It's very useful, for helping us understand where things are at from an onboarding into DOTS perspective.


    #1 Agree. There is now a dedicated team working on DOTS rendering integration. Also 2D team has begun work on building DOTS based packages for 2D.

    #2 has been fixed in p32. When explicit queries are used, no magic is created for IJobForEach queries embedded in the system anymore.

    #4 Agree. Pain in the ass. We have a plan for fixing it in the coming months.

    About #3 and #5. I find that sometimes when users get started with DOTS they try to go all the way with optimization.

    Jobifying code and parallelism are two very different things. Writing parallel algorithm's especially when there is interaction between elements is a very large and difficult topic. There is no magic to make that inherently simpler. Except... For just not doing parallelism...

    Just putting code into a job alone already has big value (Burst + running in parallel to other systems jobs + not stalling main thread) So sometimes it's better to get started just using IJob or IJobForEach.ScheduleSingle than aim for the perfect parallel code.

    Also sometimes ComponentDataFromEntity<> is being avoided because it has random memory access patterns. This is obviously true, but some algorithms require random memory access. IJobForEach is a good starting point for the simple cases, using ComponentDataFromEntity<> inside of the same IJobForEach and accessing other entities data is perfectly fine when writing normal game code.

    If you want something truly scalable then yes you want linear memory layout & parallel for loop, but thats not necessary for all code in a game. The 10-20x speedup from burst & math & no main thread sync point alone is often enough...
     
    Last edited: May 28, 2019
    Tony_Max, florianhanke, Enzi and 9 others like this.
  15. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    10
    Is your pathfinding array just the distance cost between nodes for your pathfinding algorithm?

    Unless you have hundreds of entities pathfinding every single frame, a 256x256 grid should be manageable.
     
  16. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    We did indeed have hundreds of entities reading and writing multiple values from the pathfinding grid every simulation step. We were going to write a custom pathfinding algorithm, but ended up using a solution that we found online and ported it to ECS as best I could. The algorithm uses two pathfinding layers, and the grid density on both was 256x256. The idea was that this data could be easily index based on the entity's transform position to find its current node and the neighboring nodes thus having it in an array was very helpful. The data in each node was the strength of the node's layer value (NEST or FOOD). You can see the full algorithm here if you're curious: https://github.com/bwiklund/ant-simulator

    The biggest question I struggled with was how best to communicate this memory into a job. Ideally, it was going to be burstable so that we could do full decays on it in a SIMD matter later in the frame. The per-frame upkeep on this data was actually a big bottle neck for us in the code as it took a good chunk of time to linearly go through each of the nodes for each layer and perform a decay on it. We ended up using a managed thread for the decay and another for its visual representation (writing into a texture2D for debug purposes). My OOP background dragged me into using a static class that used static memory that could be accessed through some static helper methods. This ended up being what we worked with, but I'm not sure it was really the best way to send the data to a job. As others mentioned we could have also attached a DynamicBuffer to each entity or a shared component data and probably received similar results (eating the cache miss). Using these buffer components, we could maybe have possibly made the data burstable in its per-frame decay, but I'm not sure on that.
     
  17. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    10
    For such large, global data, I do not recommend that you use a DynamicBuffer for this.

    Can you use a persistent NativeArray<float> that is owned by a system and you just pass it around to other systems or jobs that need it? Unless I've missed something, there shouldn't be any reason why you're forced into using a static class for this case.
     
  18. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,356
    For the nest/food stuff I would just use spatial structures.

    NativeHashMap with an int2 key would be pretty simple.

    For structures you can fit entirely in memory then using NativeArray is an option also. Encoding 2d values like so:
    https://softwareengineering.stackex...12808/treating-a-1d-data-structure-as-2d-grid

    The NativeArray approach would likely be noticeably faster if you need to index into it in your pathfinding.

    The decay you could do over time, say pick N rows per frame to decay.
     
  19. jdtec

    jdtec

    Joined:
    Oct 25, 2017
    Posts:
    86
    This is interesting. Can you please explain why you don't recommend to use a DynamicBuffer for large, global data?
     
  20. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,385
    I think @Dale_Kim references the fact, that NativeArray performs much better than DynamicBuffers.
    Not that DynaymicBuffers are bad. But if looking for performance and you got data in one place anyway, like you don't need multiple copies of such data arrays / buffers per entity, then NativeArrays will be better options. That removes Entity handling on its own.
     
  21. jdtec

    jdtec

    Joined:
    Oct 25, 2017
    Posts:
    86
    Why does NativeArray perform better than a DynamicBuffer? I thought a DynamicBuffer basically was a NativeArray, just a contiguous block of data but one that is tied to an Entity.

    I'm speaking from the point of view of having a single DynamicBuffer on one entity vs a single NativeArray.
     
  22. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    915
    @Antypodish where do you get such information ? I came across a few threads where you seem to have more insights that is available from docs / forum.

    If you interpret a dynamic buffer .asnativearray (which is almost free) it should be exactly the same - if I recall correctly this was stated by Joachim a good while ago

    I have good experience with native buffers :)
     
  23. Ferazel

    Ferazel

    Joined:
    Apr 18, 2010
    Posts:
    346
    Oh, I just want to be clear. I don't think that there was anything forcing me to use a static class. However, it was the easiest way for me at the time to access this global data. In the static class it was managed and maintained as a NativeArray<float> and had various static methods to more easily access/mark the layers. I realize this was very OO of me (I'm sorry I was weak!).

    In regards to your system suggestion, I originally had entities for each layer, but finding those entities and passing in the LayerComponent seemed in-efficient since the memory was going to be a heap lookup regardless. That seems like what you are suggesting since the system would need entities to work on. Then you would have the system perform the decay for example on the layers.

    Based on the information I've gathered so far I think if I were to rearchitect it I would probably have made a series of Util functions in a static class that would act and perform the modifications and keep the data more separate and put into either an entity or other lookup mechanic.

    @snacktime I definitely used the the 2D position -> 1D index lookup behavior a lot so I really wanted to use an array to store the data.
     
    Last edited: May 28, 2019
  24. jdtec

    jdtec

    Joined:
    Oct 25, 2017
    Posts:
    86
    So if we narrow it down, we have the cost of an entity query and lookup vs whatever DOTS does behind the scenes when you access (inject?) a NativeArray that belongs to another system. I assume DOTS is doing these additional behind the scenes things as otherwise how would it know about access rights (assuming the array is not readonly static data)?

    In the end I figured the difference would probably be something not to be concerned with. My default behaviour when working in DOTS so far has been to use entities and DynamicBuffer for these situations as it seemed to me to be a more explicit form of data sharing.

    Happy to know if some of the above assumptions are not accurate, please let me know.
     
  25. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,385
    @jdtec, @sngdan, well, saying me
    maybe a bit far fetched. My knowledge may be a bit archaic now. :)
    But it stuck to my mind from while ago (2018) and repeated multiple times on forum, that NativeArrays are always better option, than Dynamic Buffers, when suitable to use. By how much they are better now to be honest, in terms of performance I would need test. Probably negligible difference.

    But yes, you don't need handle entity with NativeArray. So that is main benefit I would say.
    However, I use both NativeArrays and DynamicBuffers, when applicable.

    Another difference is for small buffers, they are sitting in chunks, until they exceed chunk's capacity. Then they are put on heap. So entity simply reference that buffer via Buffer Element. But suppose, that is not OP concern in this case?

    I can suspect, that for one entity with buffer vs NativeArray, there won't practical difference, as suggested in OP case.


    Yet, I am happy if someone correct me, in case, I misunderstood some concepts. Self refresh / update would be welcome :)
     
  26. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,671
    In terms of innerloop... DynamicBuffer.AsNativeArray() lets you cast to a NativeArray at which point all looks up are 100% same perf as NativeArray. In terms of simplicity if you have a game with a single navmesh having one big NativeArray grid seems simpler.
     
    eizenhorn, GliderGuy and Antypodish like this.
  27. Antypodish

    Antypodish

    Joined:
    Apr 29, 2014
    Posts:
    5,385
    Appreciated for clarification.
     
  28. Dale_Kim

    Dale_Kim

    Unity Technologies

    Joined:
    May 8, 2019
    Posts:
    10
    Yes, what Joachim said. It's not really that DynamicBuffer is slower but that the API required to access the data is more awkward, especially if what you intend to store is really just global data.

    There is an argument to be made about how much overhead there is in a DynamicBuffer since there's some internal details about how much space is allocated within an ECS chunk vs allocated on the heap when you've run out of that internal chunk space.

    Mostly, I was referring the NativeArray from a simplicity point of view.
     
    Antypodish likes this.