Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Could DOTS lead programmers to brute force problems that could have more optimal solutions?

Discussion in 'Entity Component System' started by Arowx, Nov 8, 2021.

  1. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Development Scenario GO

    Let's say your working on a game and using classic GOs and profiling as you go then when you hit a performance issue you optimise it and repeat until you need to step back and analyse your game 'solution' design and rework it a different and more efficient way to get the best out of it.

    E.g. you add in batch based process LODing and occlusion culling to ensure off scene/distant NPCs use fewer ticks and less processing power than those within view of the player.

    Development Scenario DOTS

    Let's say your working on a game and using DOTS and your game is super fast but things start to bog down in late stage development. You profile and fine tune and you reach just playable levels of performance with occasional hiccups.

    So do you find yourself brute forcing problems with DOTS when there could be more efficient designs and solutions but DOTS lets you be lazy as a game designer?
     
  2. redwren

    redwren

    Joined:
    Aug 2, 2019
    Posts:
    69
    Could OOP lead programmers to create sprawling abstractions to solve problems that have simple brute force solutions?

    Yep, and it happens all the time. Programmers can write bad code in any architecture.
     
    sebas77, andreiagmu, Krajca and 10 others like this.
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Well you say sprawling abstractions and DOTS is well on it's way to high levels of complexity in it's API and abstractions.

    If you apply the Keep It Simple Stupid (KISS) principle 80% of your code won't need optimising and 20% of it will.

    Now if you apply the KISS principal then we need super simple or BASIC DOTS or GOs that just work without lots of complexity.

    Combined with a easy way to profile and transition the other 20% to high performance code.

    I'm guessing that even of the 20% needing optimisation the rule still applies and 20% of that (5%) will need ultra performance optimisations that may need more complex DOTS/BURST/JOBS code.

    To me it looks like this:
    • OOP build a set of domain like Objects with enough methods to solve their game domain roles problems.
    • DOTS build a larger set of systems/methods to solve all of the data problems in your games domain.
    Only with DOTS it seems like your building a high speed train network of interacting systems that need lights and junctions to optimise the flow. And a lot of container handling to load them on and off the trains.

    Whereas in OOP you build a road network and only add lanes/junctions when your profiler spots a traffic jam.
     
  4. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    I think I don't understand the following points:
    1. why is it implied that you can easily fix late-dev performance issues in OOP, but not in DOTS?
    2. what is it about DOTS that makes devs more likely to "brute force" problems than in OOP?
    From my perspective, there are no reasons to believe either of those points are true. For point #2, if the answer is "DOTS makes everything too fast, therefore people will care less about implementation quality", then that would make you opposed to anything that makes the engine more performant? That's kind of a good problem to have, if you ask me

    But what I would like to say on this subject is that OOP makes it more likely that you'll end up with a code architecture that is not easily optimisable once you start detecting perf issues late in development. In real projects, it is common to end up concluding that large refactors would be necessary in order to optimize things in a meaningful way. And often we decide that we can't afford to take all this time to refactor everything, so the problem stays there forever. I've seen it happen countless times.

    But this is a problem that is much less present in DOTS. One of the biggest advantages of DOTS is that it makes everything ready for easy future optimizations if you ever end up needing it. Your data is already laid out in an efficient manner, things are setup in advance for multithreading compatibility even if you haven't really thought about it, and the ECS makes sure every part of your game is very decoupled from every other part (which means: very easy to refactor)
     
    Last edited: Nov 8, 2021
  5. redwren

    redwren

    Joined:
    Aug 2, 2019
    Posts:
    69
    Why are you prematurely optimizing your DOTS/ECS code but not your MonoBehavior code? You can happily write your ECS systems as single-threaded pure functions executed in sequence and promote the expensive ones to complex parallelized jobs. And if you end up needing to parallelize your MonoBehavior code, you'll end up with the same (likely more) scheduling complexity as ECS scheduling.
     
    apkdev likes this.
  6. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    It seems you are bashing a technology you haven't tried using.
     
  7. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    I have tried using it and I'm currently trying to work with it but the Abstractions and very poor documentation (lack of good examples) combined with the sheer complexity and volume of boilerplate code just to do a simple function astound me.

    Maybe I need DOTS BASIC - 90% of the performance of DOTS with only 10% of the complexity.
     
  8. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    How would this be accomplished?
     
    SamOld likes this.
  9. redwren

    redwren

    Joined:
    Aug 2, 2019
    Posts:
    69
    We've had no issues mixing ECS and GameObjects in the same game to solve different problems. Unity explicitly encourages it. Just use the right tool for the job.
     
  10. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    1. Well doesn't the very existence of DOTS show us that OOP can be bad for performance due to poor cache utilisation [DOTS], lack of multi-threading [JOBS] and no inherent vectorisation [BURST].

    2. Computing teaches us that if you want to find or hit a goal/solve a problem you can take two basic approaches:
    1. Burt force lots of attempts/work - tends to use a simpler algorithm but way more processing power.
    2. A Clever refined approach - tends to use a more complex algorithm but way less processing power.
    *3 AI not relevant yet to Unity DOTS OOP game development.
     
  11. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    It's just meant as a joke but I do think the gap between the DOTS API and the OOP API leave a gap big enough to park a great way to use DOTS and mix it with OOP. Making it easier to transition between the two.
     
  12. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    Why would that mean DOTS is harder to optimize than OOP?

    That doesn't answer the question. What is it about DOTS that would make people choose less clever solutions?
     
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Really?

    What aspects are giving you a hard time? Conversion? Job dependencies? Parallel job safety? Deferred transforms? Physics? Sync points? Setup/teardown?

    Can you share a snippet of some of your code you think involves too much boilerplate?

    It is one thing to make general statements without any experience, but if you are actually trying to learn and struggling, we'd be more than happy to help! :)
     
  14. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Imagine a game where you have lots of vehicles moving over a road network, but the players perspective is from a character in that city.

    With DOTS you have lots of cars and no problems.
    Then you trim down the number of cars a bit to hit your target frame rate on the hardware your using.
    Upon release the game hits performance problems as your PC happens to have more cores and a better GPU than the players.

    Same scenario with GOs but earlier in the cycle you hit performance issues and therefore implement an Occlusion and Vehicle LODing system.
    Your game is released without any issues on lower spec hardware.

    You can get around this inherent issue with good dedicated low spec hardware testing.
    And maybe stress testing but I think with DOTS you can end up with games using a brute force solution approach that will use more power and bandwidth than a smarter and more efficient OOP solution would.
     
  15. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    791
    Have you heard of Big O notation? Everything has limits which depend on the algorithms used.
     
    Anthiese likes this.
  16. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    Then you're just accusing DOTS of being too good for performance?

    The kind of problem you described can happen regardless of if you're using DOTS or OOP, if you don't test your game properly. You can't just count on your dev hardware to always be in the exact performance sweet spot that will make you find out if your game might have performance problems. The solution here is not to reject tech that performs better; but rather to just test things properly.
     
    Last edited: Nov 8, 2021
  17. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Is that a scale for the complexity of a mathematical problem. The thing is we are talking about games which are enclosed solved systems so does this really apply.

    E.g. in most game worlds the complexity is reduced often using a pre-computed data set e.g. lighting/navigation.

    Or how is Big O notation relevant to game design and OOP and DOTS in the context of this thread?
     
  18. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    What about code size or inertia, you have to admit that a one line function call in OOP is easier to refactor than the equivalent boilerplate bloated DOTS System and Data?

    Or think of OOP as lite on code but lots of performance headroom vs DOTS code bloat but very little performance headroom.

    After all the whole point of CODE is to make things easier for the programmer, something DOTS seems to have forgotten.
     
  19. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    I've found that the amount of lines of code is pretty irrelevant when it comes to development speed & ease of refactoring. The thing that counts the most above all else is how well-architectured your project is.

    Programmers are never limited by how fast they can physically type out letters on keyboards. We spend 99% of our time thinking, and 1% of our time writing code (maybe even less). The biggest issues when it comes to dev time are often:
    • optimization problems
    • bug fixing
    • refactors (due to poor scalability/optimizability/maintainability)
    • difficulty of adding/changing features as codebase grows in complexity
    • difficulty of understanding parts of the code that you're unfamiliar with
    A robust+modular+scalable+optimal+clean architecture will help to mitigate all of these problems, and DOTS makes it obvious & effortless to write such architectures. Good architectures are not impossible in OOP; but let's just say OOP makes it incredibly easy to make bad decisions, and often requires effort to make good ones. OOP's path of least resistance guides you & your team towards messy code that will seriously hurt your development process

    As mentioned previously, your #1 enemy as a programmer is a poor architecture. Since DOTS makes good architecture easier, I'd say DOTS provides more ease of use than OOP. It takes care of the biggest difficulty/threat of all.

    It's important to consider the long-term benefits of the tech when evaluating ease-of-use; not just the short-term. Because in real life, most of your time on a project will be in the "long-term" phase. The first few weeks spent on a project (when you're writing tons of code and everything seems to be super simple) are not very representative of what most of the development process will be like

    I would say it's quite the opposite. One of DOTS's biggest strengths is how everything is set up to be extremely optimizable and how it allows you to really go deep if you need to. With OOP, you have a lot less optimization potential
     
    Last edited: Nov 9, 2021
    andreiagmu, NotaNaN, RaL and 4 others like this.
  20. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    791
    Picking the right algorithm is very relevant. It is the main driver of your performance. My point is whether it is DOTS or OOP is only relevant in that Burst may gives you one order of magnitude better. The main drivers will always be the algorithms. Picking the wrong algorithm can give you infinity bad performance.

    ECS on the other hand gives you easy multithreading.
     
    Last edited: Nov 8, 2021
    xVergilx likes this.
  21. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    791
    To be fair that is more C# then OOP.
     
  22. redwren

    redwren

    Joined:
    Aug 2, 2019
    Posts:
    69
    The equivalent of a function call in OOP is a function call in ECS. Nothing is preventing you from using functions.

    Putting methods on components is an antipattern because ECS emphasizes separating data from logic. A method call on a class instance can create a deep graph of state mutations across many classes and methods with unknown side effects. A system only changes the data it queries for and those queries are plainly stated in the system.
     
    xVergilx likes this.
  23. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    791
    I like ECS. However, a system can definitely trigger wide spread changes across other systems.
     
    Last edited: Nov 8, 2021
    xVergilx likes this.
  24. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    9,903
    Yeah, you hate and don't understand ECS, we get it.
     
    thelebaron likes this.
  25. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    I've seen lots of sub-optimal code that is then converted to DOTS or DOTS examples that are sub-optimal that then take advantage of DOTS for a brute force performance boost.

    Think of it like your code is a Ford Fiesta and DOTS is a Nitro injection system that you can strap on to it to make it go faster.
     
  26. TheOtherMonarch

    TheOtherMonarch

    Joined:
    Jul 28, 2012
    Posts:
    791
    Jobs maybe like that. But ECS requires significant architectural changes to your code to even function.
     
    hippocoder, Krajca and Arowx like this.
  27. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    I bet adding a Nitro inject system is not that simple either.
     
  28. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    It's not the number of lines of code it's the fact that in OOP you group your methods into Objects in DOTS you just have systems that are about equal to the method calls in OOP.

    So in DOTS projects you probably have anywhere from 2 to 10 times the number of scripts which can grow to be inherently more complex e.g. more files and how do you mange to keep track of where the data goes is there some kind of network diagram that Unity can show you or do you need your own System map to keep track of everything?

    That's why I think DOTS could be easier to use if OOP methods could work like systems, maybe if we adopted a struct based parameter convention for Objects we could have a hybrid more OOP DOTS system syntax.

    And converting OOP code to DOTS would be as simple as creating an Entity parameter for method data and adding a [DOTS] tag to the method call. After all everything else is boilerplate code in DOTS.
     
    Last edited: Nov 8, 2021
  29. colin_young

    colin_young

    Joined:
    Jun 1, 2017
    Posts:
    243
    The difference with DOTS is that you aren't/shouldn't be writing a kitchen-sink script. One script does one thing (I know, the sample code doesn't do that and stuffs a whole bunch of stuff into a single script file) - the single responsibility principle. Each system operates on some component or set of components. You need to keep track of your systems, not your entities (in terms of thinking about the code design -- obviously your code at some point needs to do something with entities). You aren't thinking about how do I do "x" to an enemy entity, it's how do I handle an entity with the "attacker" component attached to it. Movement can be shared across multiple types of entities. If I want to share my attack code across birds and dogs and cats, but have a different move for birds than for dogs and cats I can do that in DOTS trivially. Trying to do that in OOP is either going to create a mess if you make the mistake of trying to use inheritance, or you're creating some sort of dependency injection system, which, I'd argue, is introducing the same boilerplate "problem" you are claiming DOTS has. It's just got different syntax. In any system, at some point you need to explain to the underlying libraries how your code and your instances should be wired up to each other.

    All the "boilerplate" code is actually the code that tells DOTS which component/set of components you are going to do some work on. It's a more functional programming way of thinking. You _have_ to tell your functions _all_ of the parameters. You can't rely on being able to mutate state of an object arbitrarily.

    To use your cars on a road network example, I'm literally implementing that exact scenario in DOTS. I personally find it far easier to write and manage the code in DOTS than with GameObjects. And as far as optimization goes, it's dead simple. Cars immediately in view where it matters, they get a component that implements a full-blown physics simulation with individual wheel colliders, full collision detection, the works. Those that are farther away where you won't see it use a simplified movement system that just drags them along the lane geometry with an appropriate pivot point for cornering, and simplified collision detection. Cars that aren't visible will just update their positions and don't have rendering at all (and may even forgo collision detection altogether). One system handles updating all the entities with the required components to group them into the correct bin. With GameObjects you've got all that logic jammed into one giant class, and, given the way most people seem to write code, it's probably mutating the state of the class all over the place.

    Yes, people may write inefficient code. The "right" way to fix a problem may be to build a Ferrari, sometimes strapping a nitro injection system to a Ford Fiesta is good enough to get the job done, and get it done sooner and under budget.
     
  30. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    A system shouldn't be the equivalent of a function call in OOP. A system represents a full and complete data transformation to be applied. This could be a single type of operation, or it could be many operations which happen in phases. I found starting out it is a lot easier to have a small number of larger systems which do everything related to a particular kind of entity. With a little bit of practice and experimentation, it becomes naturally how to break up, refactor, and generalize systems.

    You'll eventually develop your own conventions and a "sense" for this. The trick is to start small. Really small. Like 48 hour jam small (that's what I did). Do a couple of those, and figure out what organization scheme works best for you, what naming conventions work well, and what granularity feels the most comfortable regarding systems.
     
    SamOld and apkdev like this.
  31. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    I found that doing that is more preferable to breaking up it into components. In my opinion, you should think about "complete data transformation" as you called it. Work more on archetypes than particular components. Of course, there are many situations where your archetype can be a singular component and that's fine, but in all, for me, thinking about specific components is useless.
     
    DreamingImLatios likes this.
  32. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    The biggest problem I have is handling non-linear game mechanics. At some point you have N iterations with N times X random lookups. In my case, most of these lookups are in the same (other) archetype. The DOTS way, would be to be have 1 system write out data so you can linearly work on the next (other) archetype.

    So it's a case of having N iterations with X random lookups against 2*N Iterations + write + read to be able to have 2 systems work linearly. I kind of wish this would work but it never did work out for me. The overhead was always way worse and I'm not sure how anyone can handle this to be honest. Exchanging data over jobs puzzles me how it ever can be faster than just taking the hit of the random lookup. It was the same kind of thing in OOP. How can computing be faster when you write and read more data just to calculate faster? It's a total pitfall IMO, especially with how slow memory is and how fast CPUs are nowadays. In any case, I'd really like to know how someone solved this and ended up being faster with that approach. And I'm not even talking about taking the hit for scheduling the next job. Having 1 big system split up into several jobs that essentially work on the same query and data then is AFAIK complete nonsense.

    So, in that sense. It's kind of laughable to read about to "brute force" a problem in DOTS. Brute forcing never works, not in OOP, not in DOTS. Maybe for password cracking.
     
    Krajca likes this.
  33. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    I'm having trouble figuring out what this means in practice, do you have an example where this would happen?
     
  34. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    To be more specific, think of how you would do the spell casting mechanics of RPGs like WoW, FFXIV, GW2 etc.. I would say in general, a lot of high level game mechanics are really non-linear with lots of condition branching.

    edit:
    To go into more detail, a player or spellcaster has the stages of:
    - make input which spell
    - calculate if spell is allowed based on resources, distance, angles, etc... you still stay in 1 archetype
    - fire the spell, and write data back like cooldowns, spent resources, some resource gains like combo points or anything arbitrary, etc...
    Then things start to get problematic with how the spell is even modeled. As entity, you'd have an instantiation and later destruction (very expensive) I have modeled a spell now as pure blob data (This is the right and intended way I think) but some create data is still necessary. Source, target, which spell, etc...I've had this create data delayed to a specific system but now it's just fired where it happens and the spell is calculated.

    In the stage of the spell things get even more fuzzy.
    A spell has to acquire the targets, either single target of multi target.
    For every target a hit result has to be rolled based on source and destination stats. (random lookup)
    Then some conditions have to be checked, mostly buffs/combat effects that alter the behaviour. What if the player has a buff that makes the spell work differently in some cases, like, it can hit more targets now. Modern RPGs go crazy with these mechanics.
    Then the amount has to be calculated based on the stats of source and target again (which can be cached when done in the same loop)
    Finally any damage/heal amount has to be applied to the target health. As this is in the spell loop, race conditions can happen when several spells hit the same target in one frame. This has to be managed separately.

    For some of these parts I've better solutions than I had some weeks before but overall there are so many parts that are slowing this whole thing down, it's like nothing else I've ever worked on.
     
    Last edited: Nov 11, 2021
  35. colin_young

    colin_young

    Joined:
    Jun 1, 2017
    Posts:
    243
    What do you mean by "spell loop"? That sounds like you are trying to do the equivalent of a co-routine in ECS and possibly being stateful. Systems are stateless, and get all of their data from the input parameters. Anything that the system needs to know must be part of one of the components passed in. Any caching that needs to be done is via the components, so it isn't really caching, at least not how we normally think about it. It's just the data being given to the system.

    Taking a stab at how I think your mechanics are supposed to work (I'm assuming casting a spell is not instantaneous and is animated over several frames), I'd break this solution down like so :
    • PlayerSpellSystem: this handles the spell casting, checking if it is allowed, getting target (whatever that means, whether it is a direction or specific entity), additional buffs/combat effects. This system creates and spawns a spell entity and attaches the spell parameters component(s) to it.
    • SpellParametersComponent: all the data about the spell, including slots to hold the source and target entities.
    • SpellSystem: handles things that have spell components. Figures out if target is going to be hit and stores that in a component on the spell, assuming it takes more than a single update/frame to affect the target (this is kind of a Move component for magic). Adds the spell action component to the target at the appropriate time (either because it has moved far enough, or it was immediately applied). You use the buffs/combat effects data here to determine if you can have multiple targets or whatever.
    • SpellTargetActionComponent: holds the data about what you're going to do to the target, including a slot for the source, assuming you need to access the latest health info or something.
    • SpellActionSystem: this is where you apply the spell to the target. This is where you handle multiple spells hitting a target simultaneously (either because the effect takes place in a single update/frame or because another spell has hit while an existing one was still acting). You're going to have to have some logic to figure out how to add up the effects here -- it could simply be additive, or you could make it as complicated as you wish. This is where you start modifying the targets health, etc. Adds a SpellCleanup marker component to indicate it is finished its work.
    • SpellCleanUpSystem: This handles destroying the spell entities once they have hit their targets and are no longer needed, and removes the spell action component from the target when told to (use a marker component to signal this).
    Nothing in that seems like it should be affected by race conditions, including multiple spells, assuming you can calculate the net effect by simply summing up the individual effects. If you had non-linear effects (e.g. the power of a spell is stronger as the target's health declines) you might need to track the original target health stats at the start of the update/frame, instead of the instantaneous values, but that should cause any major trouble.

    Apologies if I've completely missed the mark on things. The entirety of my experience with that genre of game consists of having watched my kid play Zelda.
     
    Krajca likes this.
  36. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    Don't get me wrong, it's always nice to read someone else's thoughts about it but yes, you have missed the mark on several things. (Still, thank you for posting!)
    Not because of wrong logic but because you are under-estimating some things that are happening with how expensive they are or are over-estimating how well Entities actually performs. Short answer, it's really slow with some things and I go into detail in this post which are so slow you need to re-think the overall approach.

    It's not that I have problems with anything like 10k or even 100k spells in 1 frame. Well, I had with 100k but I'm getting there and the only test I'm working to even get meaningful results is with 250k casted spells in a single frame. 10k is just a blip. But is this blip 3ms or 0.3ms frame budget? There's a massive difference, right?

    Of course the set of actions can be spread out to several frames but that's more like desperation and not why I've started this project. There's still an overall budget even if you spread it out and the smaller it is, the better.

    I want to say 250k spells can be cast within a certain budget at around or under 16.6ms. Maybe I'll move the goalpost and it ends up with 170k or something. There is a ceiling after all. Anyway, reason why I'm saying this, is because the project started when DOTS Netcode hit. So think of massive scale, hundreds of players and NPCs. And think about server CPU where every optimisation or lower CPU timing can lead you to overall fewer servers and saved money.
    DOTS is the perfect use-case here for scaling software.

    So, first thing you've missed. Iteration and schedule overhead. With 250k casters and subsequent amount of spells, every iteration is massive when you could avoid it. Of course it's not just a read iteration, those are fast but if you're only reading, nothing is happening so:
    Second thing, writing back such a vast amount of data times 250k times the amount of jobs you have or the amounts of writes you have. The smaller the write operation the worse it gets. Writing back just a bool/int for a comp is several ms for a single job. You'd need to get the most out of every write and not just write out on every occasion. If you get into writing and reading from buffers, say goodbye to performance when it means 1-2ms for every buffer. When writing starts to be a bottleneck other alternatives are NativeContainers. The best for this, especially parallel is NativeStream. With the NativeStream writing 250k create data still took 1-2ms.

    Third thing, instantiation is the biggest. Instantiating even the simplest form of spell data will lead to timings of 300+ms in the ECB. I tried pre-allocationg the entity archetype (which you can read about here) with batched EntityManager.CreateEntity(archetype) which is faster but still massive amounts of data that live for 1 or 2 frames. You are also underestimating deleting this again. This is not free of cost even with no tagging. Tagging is massive as it involves moving chunks to another archetype. Anyone who has written events in this way can tell you about it. My current solution is using blob data for the spells and only a small struct with unique creation data. It helped speeding up things massively.

    There surely is a DOTS way but you are thinking too much in Unitys way of DOTS. I don't fault you in the slightest, it's how my project looked in the first iteration. Many small jobs, many iterations, new data being written to comps or buffers for the next job to work on. Exactly like Unity has taught us with the API and examples to use Entities.
    Overall, this failed though. There's the possibility that with the jobs being all vectorized you can save 4 times the amount of processing time but when the overhead of vectorization is like 8 times higher you still end up being overall slower.
    At the end I don't want to impress anyone how elegant or DOTS way-like the code is. I'm here for raw performance and Entities is a tool that doesn't fit for everything. I said this in Discord once, I wish this would be different. I'm the first to admit that having the elegant DOTS way would be a great thing but it's just not the case in practice (yet).
    Maybe Entities will improve every point I've written out here and starts to get faster.
    Right now, the only thing I use Entities and have archetypes for is the spellcaster itself and buffs/effects that live for a certain period of time. Everything else looks pretty much like C code that uses the Burst compiler and jobs. At the end, I'm not that surprised that it organically matured into this.
    As this is an overall topic I'm interested in I'll try to refine and rethink some approaches that were moulded into the code I'm having now.

    For anyone really interested, you can do limited tests without writing this whole spell system, with instantiation, reading of dynamic buffers, write tests and all that is really needed. I'm quite stumped that most people here have not done such stress tests for 250k. Even Unity fails with their tests and have something like 1000 or 10k elements. With such a low amount, you look in the profiler, oh 0.1ms. Well, all good, jobs done, when in fact, no this could be 0.01ms, you are not done and failed to get good scalable performance.
     
  37. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    Isn't 250k a lot? I mean really why you would want to have 250k spells in one frame. That amount of units/spellcasters alone would be heavy by computing their AI not to mention other things. For me, it is such a ridiculous number that made me wonder if you remember that our PC have finite computing resources. With such a number doesn't matter what technology you use, you'll need to make some drastic optimizations. I think @colin_young has a good idea to follow but just your requirements which you added after his post are edge case at best.

    That's why I prefer a larger system to smaller ones. Creating i.e. resource check entity is very heavy in comparison to just one if statement and then creating only those spell entities which had resources to be cast. Almost always I follow the pattern: get needed input data -> calculate with intermediary types of data -> write back only what was really changed. This allows me to partly mitigate event-like entities.

    Of course, branching is a huge problem. I'm still not satisfied with my ideas for it. For now, if there is a simple if I would check if it can be math.Select() statement, but if there are multiple choices I still use tags. The only splitting of systems I do now is for this tagging to be lighter i.e. now my units have separate ai entities that are much lighter so the cost of copying is reduced significantly.

    Now addressing OP question. Yeah, it will, and at the same time, it doesn't. Can I make this in less time just by brute-forcing it? Will it still meet the requirements of the project? I think a lot of times the answer to these questions will be a "yes". We don't live in an ideal world where everything can be made with amazing craftsmanship. We live in a world of money. Everything above "just barely good enough" will be cut out. And to be fair - technology stack doesn't matter DOTS, OOP everywhere there will be the calculation of gains and losses.
     
  38. colin_young

    colin_young

    Joined:
    Jun 1, 2017
    Posts:
    243
    Ha ha. Given that, yeah. That several orders of magnitude more spells going on than I assumed for even an unreasonable case.

    Entities or no entities, any time you need to modify data is going to be expensive, so the best you can do is to minimize the cost. As much as high-level programming languages have evolved, there are still situations where we need to be aware of what's going on under the hood and how expensive it is, as you've noted. The amount of data you are talking about here sounds like it's going to exceed the CPU caches no matter what you do, and I wouldn't expect a general-purpose solution (which is what ECS/DOTS is) to be able to handle everything, at least not without some hand-holding. If most/all of the game data can fit into the cache, iteration should be effectively "free" (at least compared to the game logic), but if you have to fetch data into the cache multiple times for even a single system, that's going to get expensive quickly.

    I'm curious, have you tested the performance for various numbers of entities and does it scale linearly? i.e. is 50K to 100K two times slower and 100K to 200K two times slower? Or more? It would not surprise me if there was some sort of performance cliff you can fall off after a certain point.

    It sounds to me like you are updating:
    • 100Ks of sources
    • 100Ks of spells
    • 100Ks of targets
    Every frame. That's a lot of data no matter how you organize it. This is where I wonder if entity/component pooling might actually be useful, despite https://forum.unity.com/threads/is-object-pool-a-valid-optimization-pattern-in-ecs.728606/ or https://forum.unity.com/threads/object-pooling-in-dots.886789/#post-5828116. And I think you definitely need to minimize the number of times you iterate over the collections (i.e. minimize number of systems).

    Going back to the OP's question, your situation may be a case where my suggestion is in fact the "brute force" method. GameObjects/OOP would be a nightmare here. I mean the coding would be easy, but performance...

    I'll happily admit to not having stress-tested ECS, mostly because any performance issues I'm encountering are more likely due to unoptimized vegetation (10Ks of bushes with 1Ks of triangles each and no/poor LODs), rather than the few hundred vehicles I'm moving around (and I have the timings to prove it, or at least can easily get them again). And by "problems" I'm talking about 30+ FPS in the editor.
     
  39. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    How many spell types are there as I would group spell systems by type, potentially in two ways: graphical types and effect types.

    Then you would have a collection of active spell types and for each type a set of sources and targets.

    With a types based approach the unique impact and graphical code for each type can have Unique Systems.

    Overlay the graphics and effects systems with a LOD/FOV system and you step away from how much activity is needed per frame e.g. time to effect and animations.
     
  40. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Just to be clear, entities and components are designed to be relatively persistent whereas for your use case "spells" have a lifecycle of less than a frame.
    Most people profile for their actual game's requirements and specific logic. They optimize specifically for their problem and not the general use case. A huge part of DoD is being very specific. What your tests have shown is that naïve intermediate writes to temporary memory on huge datasets have a cost, and you are better off doing larger operations with branches than to reload cache lines multiple times. It's all about using the right tools for the job, and that means a full understanding of the job which none of us know and I'm not even sure you fully know.
    I'm curious. Is this a problem with performance or code maintainability? I suspect performance since you mentioned math.select. If it is performance, I may know a couple of bit tricks that may help you.
     
    Per-Morten likes this.
  41. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    Just want to highlight this:
    People talk a lot about memory and cache optimizations when discussing DoD, but the core idea behind DoD isn't really about memory directly, memory and cache stuff just happens to be a big common bottleneck today. The idea of DoD is to analyze all the information you have (or can get) about your problem (what values you're working with frequency, combinations, etc, what hardware you're targetting, what access patterns you have, what context you're working in, what extra information you know of the domain, etc, any data you can get really) then creating a solution that works for that and takes advantage of all the information/data you have. Just using entities, or applying some other pattern often seen in DoD code != DoD. You're supposed to make your decisions based on data about specific your problem.

    To give my thoughts on the original question. I don't think it's a net negative that "DOTS lets you be lazy", if the stuff that didn't work with brute force before works with brute force now, then we've just lowered the barrier to being able to create that type of stuff. Also, I don't see that there's a guarantee that you would suffer from performance problems and that you act on it that much earlier in a GO scenario, or that fixing those problems in a GO scenario is that much easier than fixing them in a DOTS scenario. Also, in my (granted limited) experience, fixing performance problems in straightforward brute force code written in a more DoD style has been way easier than fixing performance problems in OOP code.
     
    Last edited: Nov 11, 2021
    NotaNaN, apkdev, thelebaron and 5 others like this.
  42. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    That's why it's a stress test. It's not realistic in any sense to really happen in a/my game. At the end of the day my code is more scalable now than it was before by orders of magnitude which means not only can I target (my high-end) PC but mobile better.

    And I just wanted to point out that it's very easy falling in the same trap with DOTS as with OOP/MonoBehaviour. You may have moved the ceiling down with a data oriented approach but you are still hovering high above the performance floor of what could be reached with proper programming. And I'd really like to make you guys in the forum and Unity devs aware where the problems reside because that's really the only thing we can end up with something that truly is "performance by default".

    Another example that I have solved with Kornflaks in Discord. As spells are batched they can hit the same target which leads to race conditions in the Health component. The DOTS way is to write out data and then process linearly. Which I did, it falls pretty much in the realm of brute forcing the problem. What I did was write the minimum data to a NativeStream at the stage where I'd apply the damage to the health component. Have a job that reads the NativeStream and bring the data into a NativeMultiHashmap, essentially sort the data. Then the NHM can be read in another job and apply the damage amounts to the health components. (Still random access on CDFE)

    All this took around 10-12ms. Kornflaks suggested, why are you not trying Interlocked.Add, which I did then and even in the literal worst case when I write to the same target 250k times it has overall around 2.3ms of stalling the worker thread. This may not win a price in style but it's so much better performance wise.

    Sure, 250k is just my extreme case. It is quite linear when it gets to the higher end.

    250k -> 26ms
    100k -> 11ms
    50k -> 5.7ms
    10k -> 1.95ms
    100 -> 0.4ms

    Yeah, most mechanics in games have a short life cycle, be it my spells or any other mechanic in another game. That's why "Event systems" threads are on top every week and it's obvious, people cram their events into Entities, essentially brute forcing. Their performance tanks and they don't know what to do because their whole architecture is screwed now. No proper documentation exists and devs are puzzled how to continue. Oh well, Entities isn't that great comments follow.
    If Unity is not careful Entities will end up exactly like MonoBehaviour. I mean, great, they have solved the case for forcing to write foreach Manager.Update code and now another problem arises.

    Oh, I know fully well. ;)
     
  43. colin_young

    colin_young

    Joined:
    Jun 1, 2017
    Posts:
    243
    I guess it depends how you define "performance". ECS/DOTS is IMHO a massive improvement over GameObjects, and not just from a performance perspective.

    250k -> 26ms = 2.5X objects, 2.5X time
    100k -> 11ms = 2X objects, 2X time
    50k -> 5.7ms = 5X objects, 3X time
    10k -> 1.95ms = 100X objects, 4-5X time
    100 -> 0.4ms

    Definitely looks like more scalability at the lower end, suggesting we are paying a penalty in setup/teardown (whether that's loading a system, copying data to the cache or whatever), and at the upper end it's just the raw processing of the massive amounts of data.

    It's not perfect, but it probably covers 90% of the most common cases. I understand from a comment in another post that Unity is looking at improving times to load systems. And this is still experimental/alpha stage code.

    Hopefully discussions like this will lead to areas for Unity to investigate to make even more performance gains, and to best practices that can be shared to help us get the most performance out of ECS by using it appropriately. Maybe some static and runtime analysis tools to identify potentially problematic code sections.
     
  44. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    I think both. I can make a huge switch or use tagging in case of branching behaviors. Both solutions seem "wrong" but tagging seems to be more in line with DOD principles. But it also has performance issues. To mitigate copying cost I prefer to split entities and have a "minimal data - high tagging frequency" and "high data - no archetype change" one. But it's not ideal.
    The best performance I get is always outside of entities. Just MB with the data it needs and some jobs to do something with it. But it's almost always a standalone solution that doesn't need communication with the rest of the scene.
    So my conclusion from that (and from i.e. boids example) was that I need less more packed systems that describe whole behavior/feature/idea instead of generic ones like i.e. "move system" or "health system".

    But I want to hear what you got as I feel like I'm missing something out.
     
  45. apkdev

    apkdev

    Joined:
    Dec 12, 2015
    Posts:
    263
    Actually, I think brute force solutions are genuinely better code in many cases, even if they're not optimal in computer science terms.

    The other day I wrote a (non-ECS but Burst-compiled) system that - in short - finds the optimal target for each object based on some conditions. It had really ugly computational complexity, but the implementation was barely a couple of lines, and anything closer to an optimal solution could easily explode into 10x as much code.

    Guess what, it runs in a fraction of a millisecond anyway. I'm very happy with both the code and with the performance. Without Burst, it would be way too slow for production. This kind of stuff happens to me all the time since I started using DOTS. Unity's Megacity looks very nice and all, but in most real-world cases I find myself working with tens or hundreds of entities at a time.

    If you really care about optimal performance that much, then half of your game should be in hand-written SIMD and the other half in compute shaders. I mean, sure, I enjoy beautiful data structures and optimal data flow as much as anyone, but in the end I need to get things done, and with DOTS it's really easy to make stuff completely disappear from the profiler window.

    My side project runs at hundreds of FPS at 4K resolution and it's not even ECS. I just write simple code, slap
    [BurstCompile]
    on top of a function here and there, and overall enjoy the zen of my workflow. Premature optimization isn't evil, but the reality is that you need to stop optimizing at some point.
     
    RaL, bb8_1, DreamingImLatios and 5 others like this.
  46. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    Didn't realize that you can burst-compile a function. Are there any special requirements?
     
  47. unity-freestyle

    unity-freestyle

    Joined:
    Aug 26, 2015
    Posts:
    45
    Couldn't agree more. Kudos for the awesome reply.
     
    Per-Morten likes this.
  48. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    I'm more convinced now. This is an optimization that does not work for the general case but works for your use case. I find atomics to be quite stylish when used in the right places. I'm curious, did you have to dig into the internals of the Entities package to do this?

    Disclaimer: This may not result in any measurable speedup or make things worse by costing too much CPU or breaking hardware prefetching. Always profile!

    So let's suppose you have an enum with options A, B, and C. And you have an array of these enums in a chunk or something. Now for this example, we'll pretend your batch size will be 4 enum elements. But in practice you want it to be something more like 64. And currently you are doing a switch case on those enums with different logic for each case.

    Let's suppose you have the values in your enum batch as follows: C, A, B, A

    Now what you want to do is make a 2D bit array that looks like this:
    0101
    0010
    1000

    You can build that branchlessly using either a scatter approach with O(64) in a scalar fashion, or you can use a gather approach with O(k * 64) where k is the number of options you care about. The latter seems worse when you look at complexity, but it can be vectorized quite aggressively if your inputs allow for it.

    Now in a loop, you can use tzcnt to seek to all of case A, and in a separate loop do the same for case B. No more switch cases, and no more branch misses.
     
  49. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    There's literally nothing more general than something taking damage in video games. I think this might be quite common unless you're creating wasteful data to solve for a very minor problem. Do you have a better idea of solving the problem of race conditions on the same target?

    No internals needed, the health comp has an int pointer now that I've to alloc/dispose manually. The only requirement of the Interlocked is the int* which, I'm sure, can be get without the manual alloc. I think there can be better solutions, this was a very hacky proof of concept. I'll try getting the pointer to the comp and having a sequential structlayout next.
     
  50. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Damage might be a general concept, but what isn't so general is what causes it. Or at least, there's enough variants that warrant further DoD analysis. In my case, damage happens when a damage trigger overlaps a hurtbox. Now my spatial query engine has the ability to dispatch thread-safe pairs of entities with overlapping shapes. So using that, I can safely touch the component data and decrement the health. That also gives me working change filters.
     
    Enzi likes this.