Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice

Highly recommended read : dataorienteddesign.com/dodbook

Discussion in 'Entity Component System' started by 5argon, Dec 6, 2019.

  1. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,554
    I have finished reading every page of this online book http://www.dataorienteddesign.com/dodbook/. After, I am very tempted to redo my data design again which I started last year when I knew nothing, as they are now naive and stupid. I think it is a required reading that can't be filled with just Unity's Entities documentation which is how to use the library. You want to use the library to design things mentioned in this book or else you could be doing OOP inside data-oriented. (e.g. not actually thinking how the data looks like and Entity is now a synonym for an object)

    Though it maybe quite intimidating to start reading it so I would like to give a rough overview on each chapter how it will solve your problems coming from OOP. Actually I typed these down along the way as a way to keep myself focused, I'm going to just paste them now.

    (You can read on a bigger font at : https://gametorrahod.com/dodbook/)

    Chapters
    • Data-Oriented Design : Convince you why OOP is only good for human programmer.
    • Relational Databases : Proof there is a surefire way to answer "how should I model my data" if your mind is still with OOP. By borrowing techniques from DB people, each object can be untangled from its spaghetti of references and null fields into a beautiful linear array, where you can then easily model with Unity's Entities library. There are multiple levels of normalization you can choose to do. When you linearize out the array field to a more data-oriented desigh or maybe with DynamicBuffer member, you can now properly call it "1NF".
    • Existential Processing
      • How to "de-if" your code so you can just iterate on everything without if guards or runtime checks, by data design.
      • Identify the if that really matters and essential.
      • The eternally popular demand to check for "subtype" (OOP concept) while iterating, but IComponentData is just that thing with no hierarchy. Using generics was not helping since each one became a completely new IComponentData. Baking in an enum or use by-value ISharedComponentData to characterize them to replace subclassing is one solution that may come to you by common sense, this section expands on that so you feel it is not such a bad thing to do, compared to in OOP.
      • It is not weird to performs a check first then work on a chunk "blindly" after that one check, instead of the usual OOP way of checking for each object at the very last chance. This is essentially ISharedComponentData filtering, so you can kind of switch case on any concrete value and get relevant data to work on.
      • Instead of runtime polymorphism (casting type, and there are some logic that change it behaviour depending on the current type), changing its archetype could also solve this. You will fear that data movement cost will be too much, but this article assures that that is in fact the norm in data-oriented.
      • The event system : This is almost the bane of data-oriented design that you feel there is definitely no way out elegantly at first, and you were taking it for granted in OOP. There is a subsection for this, so you know you are not alone in finding this a problem. See how it relates to existential processing, I wasn't expecting this subsection to be here at first.
    • Component Based Objects
      • Finally it came to the C of ECS, previous sections wasn't even working with "components" but instead "just a data". So this section is very directly relatable to Entities package.
      • It has an explicit mention to Unity and its classical components (the MonoBehaviour attachable component) that it is not yet the component in data-oriented way.
      • Harms of an object that impose its definition over its containing data (facts).
      • How to deal with your Player uber-class where each things inside it sometimes talk to something inside it, and in the end you cannot take any of them out.
    • Hierarchical Level of Detail
      • Or "HLOD". Unexpectedly, the book takes directly on rendering, a concept specific for games and similar media.
      • It extends HLOD out of rendering concept though, you can "lod" just about anything such as simpler processing when accuracy is not needed.
      • "Mementos" is an interesting term I first found here, but it is quite interesting. When LOD make the thing go to low definition, they should preserve some states so they could get back to it when they would become high definition again.
      • Mementos deal with the occassional fear in Entities pacakge, of destroying Entity as a part of logic to make your system works the way you want to (not finding the entity anymore, therefore automatically do something/not doing things anymore, eliminating null reference problems found in OOP), but you feel that "destroy" is a bit too much though there is no better clean way, and you start hacking in a flag or a boolean, then you have an if. Adding tag component might help, but in some situation I think memento is a very clean and data-oriented way. For example adding a tag component requires modifying the system to account for it. (None-query it) You will have a clearer strategy how to get that back dynamically.
      • The author not only removed rendering aspect from HLOD, but also the distance. You will realize that it is all about reducing works, a recurring theme in the Entities package and data-oriented design. This includes reducing instances of things, though the Entities package advertises it is fine with millions of things. It is then related to state complexity from the previous section, using LOD (axis of value) maybe a more data-way to determine what they should look right now than multitude of flags and booleans. This is as simple as refering to a stack of dishes as a single thing with an int how many dishes on it. This is already a HLOD because you reduced the details. Mind blown!
    • Searching
      • This one also hits hard, it seems to be going against the flow in data-oriented to find something not already segmented (chunked) by components where previously you do something like LINQ query. Now you see inconveniences everywhere to allow searching : no longer parallelizable, manual work and tedious state maintenance, reduced performance, ugly code.
      • Realize that the search is faster if the thing you are checking for is not interrupted by other things in between in terms of cache line, not just the complexity of search algorithm you use. How about a linear search that is faster than binary search because it travels in straight memory instead of jumping around thanks to data-oriented design? The key is in that 64 byte cache line size vs the size of each of your object, how much free stuff you can get in one read? Big O notation in shambles.
      • Data-oriented search requires keeping in mind what would be your current cache line and get clever with it. For example on top of my head, separating a field to a new IComponentData for it to be linearly iteratable sounds good for speeding up search.
      • Get to know awesome algorithms that data-oriented design has advantage, such as Bloom filter or B-Tree.
      • Differentiate searching from sorting and hashing problems.
      • It is possible to just modify the previous search result if you are aware when things are added or removed as an optimization, so you don't have to search again.
    • Sorting
      • An another common need that feels wrong with Entities. You seems to be getting everything ordered randomly all the time caused by Entity removal and insertions. An unsettling experience coming from OOP world of hand-made List<T>.
      • Think it through if there is any other way than sorting or not.
      • The same with searching, data-oriented sorting should be aware of cache line and which linear memory to work in which order.
      • Get to know data friendly algorithms such as Radix Sort or in-place Bubble Sort or Sorting network. There are severtal given to you in Entities package as well as an extension to work with NativeArray.
    • Optimisations
      • Making clear that which premature optimization is bad.
      • Premature optimization is the root of all evils because in OOP and its instance based design abstracts so much you see everything as premature, until later when they show up, then you cannot optimize anyways because OOP is getting in the way. In data-oriented you see the impact early and they looks no longer premature, therefore not evil then you can optimize it right away. Also you can fix it now, or improve them later.
      • Don't give in to hopeful optimizations. Determine, profile, make reports. We have the Performance Testing package for this. It is great because as it take care of messy warm up step and report generation, exactly so the optimization isn't hopeful.
      • Give you several optimization examples on common problems.
      • Several OOP optimization brought up here will hit home. You have done that before. How will that fare in data-oriented? Most of them reveal their weakness in cache line utilization.
      • One part deal with data addition and deletion combined with multithreaded code. In this part you will appreciate that the system of Unity Entities manages that for you.
      • "Don't delete" is a good advice so other concurrent system do not have to sync. They can continue to process values that ended up unused.
    • Helping the compiler : Needless to say, we are glad that most of this chapter are solved by C# Jobs limitations, Entities API design, and Burst, that we would otherwise have to face if coded pure in C++. But still useful to know about them.
    • Maintenance and reuse
      • One of the bigger reason against using data-oriented approach that you can no longer subclass and infinitely build on the previous thing. This chapter make you feel better that there is more dimension in extending that is not inheritance.
      • Why data-oriented cause less bug has been covered many times already but again here. The Entity query thing is already a protection against null referencing in OOP since the work didn't even start if there is nothing to work for.
      • Convince you and remind you of your past OOP terror why data-oriented could make debugging faster. When he said a chain of if with multiple hidden return in them, I immediately could relate to that one method I spent a day on creating and debugging. It wouldn't have to exist if it was data-oriented.
      • Different view on the word "reuse". You can indeed carry over something in a data-oriented project forward.
      • Unit testing is hard in OOP because of setup step and that make us lazy to do it, you have to wire up objects. When it's just data, it is clear what to setup and what to look for : it's data. No more instantiating GameObject and placing in an imaginary world we can't even see because it is an Edit Mode test.
    • What's wrong
      • This is actually a good first chapter to read. Answer questions as why you should drop OOP mindset.
      • It reminds you about the current trend in OOP that is only good for you but bad for the machine and your players, and why data-oriented solve them.
      • Explain why virtual calls that came as an ability in OOP are much more harmful (to data in your cache) than you think. Most devs thought that there is no way that the work in there is not worth the "tiny" virtual call cost, and that's what's wrong. As he talk how to avoid this problem, it ends with how about stop doing OOP altogether.
      • Abstractions can help you solve problems, but in game development the problem is performance and that abstraction is of the wrong kind.
      • The chapter can make you less carved for inheritance addiction.
      • A more realistic view of "reusing" provided by OOP. Generic code is not as good as it sounds, it is only good on paper.
     
    Last edited: Dec 7, 2019
  2. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    10,004
    Oh man! I'm reading this on paper, I'm 1/3 through, it is not an easy read for sure. Thank you for taking the time and writing this. When I'm done (couple of weeks down the line since I have not much time nowadays), I'll try to chime in with my take on this.
    BTW, I really urge everyone, who is interested in this and can afford it to buy the real book and support the author, he has done a great job as far as I can tell after 80 pages (out of ~300).
     
    elcionap and 5argon like this.
  3. eizenhorn

    eizenhorn

    Joined:
    Oct 17, 2016
    Posts:
    2,655
  4. recursive

    recursive

    Joined:
    Jul 12, 2012
    Posts:
    669
    This book was the light-bulb moment for me. I have recommended it to every programmer I know.
     
  5. Vacummus

    Vacummus

    Joined:
    Dec 18, 2013
    Posts:
    191
    Bought that book from Amazon the moment it was released last year. It was a great read and a lot good stuff in there. My only problem with it is that it was not very beginner friendly. You have to already have a good understanding of DOD and how the hardware works to really be able to digest this book.
     
  6. Laicasaane

    Laicasaane

    Joined:
    Apr 15, 2015
    Posts:
    293
    I guess, it's time for me to read the book properly. I've been postponing reading it for too long (for the exact exuse you'd mentioned in your blog).
     
    Last edited: Dec 8, 2019
    5argon likes this.
  7. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Thanks for you recommendation and blog! May I ask a question? How would you handle mutable state now? I saw you new blog "Reminders about dod", seems like you prefer ISCD instead of tag now, right? So, how do you think of ISCD which as documentation said is
     
  8. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,554
    Hmm it is not that I prefer to use ISCD, but it is a dynamic tag of sorts that it generates a new tag just by varying a value in it to change its hash. Therefore the value indeed rarely or not at all change if you treat it like that. The weakness of ISCD is still that you need to add a filter on top of query instead of just saying the tag outright. This weakness mirror its dynamic nature because typing system is rigid.

    For the case that you really want to mutate SCD that is a tag, it is still impossible to "follow to the real SCD and change it so it reflects on all entities that was shared to" but an equivalent result is possible efficiently since 0.1.1. Make an EQ with SCD filter as a target, then perform an overwrite with desired value to change to. I believe this is a chunk operation that set a new SCD index, so it will not touch entities inside.

     
    eterlan likes this.
  9. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    You write a better summary than the needlessly wordy text itself. Thank you.

    Also a point about "premature optimisation":

    Sometimes you just know what works and what doesn't for performance. No need in making triple work for yourself by having to test, analyse, and redo later during the project when you have no time and should be working on other things instead. The way the writer wrote about it is as if you have infinite time on an infinitely long project with an infinite budget. It makes me so angry to see such delusion.

    Just read up on what is a good practice for your situation. Most of this is based on bandwidth and cycles anyhow, and all of which are (at least supposed to be) very well understood by now.
     
  10. Vacummus

    Vacummus

    Joined:
    Dec 18, 2013
    Posts:
    191
    I don't think the author was implying that you have infinite amount of time at the end of the project to do your optimizations. Quite the opposite. I remember the author describing that you don't have much time towards the end of the project, and that the problem with OOP is that it doesn't allow you to easily optimize. Thus when you get to the end of project and you run in performance issues, you are often forced to do large refactors to get any meaningful performance improvements. In contrast, DOD is highly optimizable allowing to you to optimize without incurring huge refactor costs.

    And avoiding premature optimization does not mean to push off all optimizations towards the end of the project. It means to optimize only after you have profiled your code and there is data that proves that there is a need to optimize. And it just so happens that in some cases, you won't have this "data" until more towards the end of a project's life cycle when the entirety of the project comes together and there is enough going on to produce this "data" for you to be able to see where the performance issues are. Trying to predict the performance issues you will run into down the road is nearly impossible and can be extremely time consuming (and time wasting).

    The author also talks about the importance of performance budgets, where through out your project's development life cycle you are constantly profiling the performance of your code, and if the performance goes over a set budget, you optimize right then and there.
     
  11. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    I disagree. The more you add to it the more complex it gets and the more resources in one way or the other you will need. Let's say you need 100 NPCs, but the hardware can only handle 10 at the same time for their level of interactivity. You should have a general idea about available performance if you decide to limit it to only 5 instead. If you keep track of what you have running, and the target hardware specifications for budgeting, then it is possible to get a prediction of the available resources so it can put you in the right direction for what to optimise for.

    I have at a few occasions stopped a project at an early stage if I saw it could not possibly be running at a target framerate. This saves time and money instead of having to wait to the end of the project to see if you could potentially spend another week optimising using the data from the now already rather complex framework that has been assembled.

    Nothing is ideal, and I agree that the OOP (or any) approach can get difficult to rework (I have been there done that on larger projects). I am i no way against the DOD-architecture.
     
  12. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    I think there are two things here

    1) Optimisable code
    2) Optimised code

    It really comes down to how you define it. Unfortunately that varies from team to team wildly. In the DOTS team at Unity we define it like this:

    Optimisable code is code that has great data layout, we don't load data into cache without it being actually used in the critical path. The algorithm is chosen for the data that is being processed. The data transformation being applied is clear and you can't shave much off.

    Optimised code is when we take such code and we ensure that the hardware truly runs optimally on it. We force the code to SOA style SIMD. Specific SIMD instructions are used where necessary to get better performance out of it. We might take advantage of specific patterns in our specific game and hand tune code for it.


    By this definition no OO code is ever 1 or 2, because that is not even how it is approached.

    Most of the samples we provide are 1). I think if a game has mostly Optimisable code that is a great place to be. When you ship a AAA game, you usually then take some of the code and you optimise it. If you have some things in your game that happen at massive scale you do that.

    When Richard Fabian talks about premature optimisation, I am pretty sure he talks about 2). And yes prematurely optimising to that level reduces code readability etc.

    1) Is assumed to be the baseline of how you write code when following the DoD way of writing code.
     
    eterlan and Vacummus like this.
  13. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    Exactly. Just put it on Azure/AWS and let the servers deal with it.

    I am curious to see though, how the development of more remote processing (rendering/object recognition/etc. are already on their way and mostly implemented in Azure) will change the way we think of code when we in theory could have infinite processing power.
     
  14. Vacummus

    Vacummus

    Joined:
    Dec 18, 2013
    Posts:
    191
    Definitely a lot of value in understanding your hardware and what you can throw at it. The problem though is that if you have validated that your game can handle a max of 10 NPCs at the beginning of the project, that may not hold true towards end of the project where your game may end up only being able to handle a max of 4 NPCs due to other systems (that were not present at the beginning of the project) competing for the same hardware resources. Thus the reason it becomes nearly impossible to predict what performance issues you will run into early on in the project and prematurely optimize for them. You don't know what you don't know, thus by prematurely optimizing you are trying to solve for a problem that doesn't exist.

    But it's definitely good practice to profile the performance of your game early and often and not just wait until the end, hence the point of a performance budget.
     
  15. Vacummus

    Vacummus

    Joined:
    Dec 18, 2013
    Posts:
    191
    I think it's going to make writing performant code even more important, because performant code uses less energy and less resources, which means you will be spending less $$$ having cloud service host performant code vs non-performant code.
     
    Laicasaane likes this.
  16. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Thanks for your quote, doesn't know that before. But I still wondering what about only some entities change, not all? After reading the dod-book I realized that tag component is the table, and we use that to hide bool, which is a good thing. However, what is the con of SCD is that it's only one for each entity, while I may accidentally add many states using tag component. The "costly" change sounds like exactly as other structural change, so it might be ok I guess?
    May I ask which data structure do you use for your AI state? For example what current action the npc is taking.
     
  17. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,554
    It sounds like you want to chunk the AI that is doing the same thing to the same chunk, so you can iterate on them without `if` on the normal `IComponentData`'s field? By this logic it is tempting to tag everything extensively until you can perfectly slice and dice query everything (no matter how many chunks you get)

    An amount of data and work inside define the right API for the job. If this AI is of the boss character and 1-10 minions, or town with 5 idle NPC that maybe doing something on their own, I would just iterate and `if` through them without tagging, or try using inline-if when possible. (when both side could be value and make difference between state e.g. the one with Running state get +5 to the position while the one without get +0, or the one that should render get alpha 1 and one that should not get alpha 0, etc.) In RTS when action maybe simple but number is massive I may use an enum in SCD, given than it is an AI of enemy that cannot change rapidly. If it is your team's AI that could change as soon as you command them maybe I won't use SCD as tagging and moving chunks would be too much. Data always define solution and in the end the answer is usually to profile it whether chunk movement or if is more expensive for you, and if you can use other API solution such as change version or not.. anyways you should ask someone who really make RTS or AI intensive game. I have very little experience on that.
     
    eterlan and JesOb like this.
  18. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Thank you so much for your answers, it's very insightful!
    I should be more cleared about what I'm talking about next time any way. I'm trying to make a utility AI, which supports maybe thousands of or more npc in towns that run all the time. Now I have a rough idea about how to make it efficiently with your and the book's information. It's simply combined enum and SCD, one is write friendly while other one is processing friendly.
    1. When set npc a new action, just call setNewState in a normal component.
    2. Maybe 0.5s later, use per chunk check if this normal component it's altered, then gather all kinds of entity with same enum into same array, then set SCD for each kind of enum.
    The idea is to defer and combine the same operation, from easy-setting to easy-processing form. It has a problem, which is use enum in two places, it's not that beautiful.
     
  19. axxessdenied

    axxessdenied

    Joined:
    Nov 29, 2016
    Posts:
    33
    Thanks for sharing this! Definitely need to go through this. The whole DOTS+ECS approach seems really interesting but it's requires such a different perspective from what I am used to.
     
  20. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    10,004
    Going back to the book a bit after the offtopics.
    I somewhat agree with the thought that this book isn't beginner friendly, although it does not promise such a thing, so it's okay.
    And the book has a very good bibliography as well, most of the articles/presentation are worthy to check out.
    Anyone has the book, should check them out, there are some which may help with some architecturally deeper questions and elaborate where this summary judgement is coming from.
    (The online version unfortunately does not contain those URLs/references)
     
    5argon likes this.
  21. 5argon

    5argon

    Joined:
    Jun 10, 2013
    Posts:
    1,554
    Currently I found that replacing object interaction is the hardest of all, (e.g. event system, reactive system) but the book didn't touch on this enough times that you can get used to think about it. For example, if previously you have a Human instance calling .Kick on a Ball once (as a discrete action, there is no state like Human holding .Pedal on a Car), what would you do in data-oriented? It is appropriate that there is a talk about this at Unite. Interaction = relationship, you maybe able to grasp that. But still feel wrong after have written everything out. It feels like that concept was not from human world. Even harder to relate to why sometimes system state component is needed, that most of the time I have to use it abstractly (for its zombie entity property) instead of its intended meaning "system can have states now, but tied on an entity so thats OK".

    The resulting code is equally awkward to read that to replace an interaction I have to create a data, in order for the other thing to consume the data, then delete it accordingly so it represent discrete action. Therefore (unintentionally) allowing multiple actions to be consumed in one run since the query assume nothing about amount, the discrete-ness seems to disappear after realizing this. Data relationship is then a superset of interaction after all but it is weird to program.

    I expect something as easy as GetSingleton<T> except it works for multiple entities without error, fine tuned/named for this specific problem (use IL magic to perform GetEQ in OnCreate for you, whatever.), is needed before ECS can gain mass adoption. Otherwise a document about this data-as-methodcall practice must be really good.
     
    Last edited: Dec 16, 2019
    eterlan, Nyanpas and Laicasaane like this.
  22. fholm

    fholm

    Joined:
    Aug 20, 2011
    Posts:
    2,052
    I completely agree, this is what is the hardest to get right with a data oriented design IMHO. Yes performance can be had easily, but modelling very fine grained and varied behavior between different entities in the game becomes very complex.
     
    Nyanpas likes this.
  23. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    I've spent some time with the job system and the way it works with handling "data" makes good sense to me. However, the rest of the code I do is still very much not looking like the ECS-examples (so far). I am sure the Unities will sort something out at some point.

    The best option would be to just continue as before visually in the editor and to a degree in code with the "gameobjects" as we know them, and at runtime they will get automagically translated to ECS as far as it can be done. At least that is my dream. I would not like having to spend too much time on coding solutions that a commercial game engine should have by default.
     
  24. eterlan

    eterlan

    Joined:
    Sep 29, 2018
    Posts:
    177
    Now I usually create command component to process operation between entity. It basically has two reference entity, actor and receiver, and some argument field. Then simply make a consumer system to "eat" this operation. If we treat Entity as a whole object table index, then this technique is just use entity as foreign key.
    The reason why I not like to simply add a new component , is because DOTS treat entity not only but add it one more couple - chunk. Move the entity from one chunk to another, as everyone know, have to pay the price for copying.
    It's quite suitable for asynchronous requirement, I can make the consumer running per 4 frames or longer, or just add one more timer argument field.
    In this way, I can use EntityManager.AddComponent(entities[], c) or IJobParallelFor to GetComponentFromEntity & ECB.SetComponent fast.
    However, it's not suitable for really large amount operations compare to chunk iteration, since it's random memory access.
    I think this must be very helpful
    BTW, Instead of deleting the entity at the place we create it, I just let consumer delete it. Forget about Single responsibility rule, which make me suffer a lot from system sequencing problem.
     
    Last edited: Dec 18, 2019