Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

If DOTS performance is mainly tied to the arrangement of data in memory then...

Discussion in 'Entity Component System' started by Arowx, Oct 19, 2021.

  1. PhilSA

    PhilSA

    Joined:
    Jul 11, 2013
    Posts:
    1,926
    some remarks:
    • you don't explain how things get converted to SOA at any point in this example
    • you don't explain how you will Burst managed types (or how you'll convert references to managed types in an object to something burstable)
    • you don't explain how you'll create jobs that hold managed types
    • you don't explain how you'll deal with virtual functions, polymorphism, delegates, lambdas, etc....
    • and so on
    this example avoids explaining 99.999% of the problem, and that's being generous
     
    Last edited: Oct 24, 2021
  2. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    If you go full reflection route (see checkboxes in a custom editor per field), you'd have to solve:
    - Reflection cost for the editor to start playmode fast enough;
    - Solve how to layout data as components;
    - What entity archetype result entity should have;
    - Convert those entity archetypes into entities (which is currently performed as subscene conversion);

    And in the end you'd still have to write systems, not updates. Meaning its just a hybrid solution / conversion.
    But with an inherited issue - no control over how to order execution update. Or how to structure systems code.



    Note, that its completely possible to author entities from MonoBehaviours directly without conversion process.

    That's what I do actually. However, its pretty darn hard to covert managed data into unmanaged data directly.

    At some point, data required for layout as components has to be coded manually.
    Plus an archetype result entity should have.

    Here's whats my current setup is, in case if you're interested:
    1. EntityContainer [MonoBehaviour] that gathers each attached MonoBehaviour with IEntitySupplier interface / Components from the GameObject hierarchy.

    Container stores archetype hash based on all component types that are taken as HashSet via interface, and corresponding strong type hashes for linking in runtime.
    Component type gathering is performed in editor only.

    2. When EntityContainer.OnEnable is called in runtime:
    - Archetype is taken from the lookup (constructed only once based on the stable hashes).
    - Entity created via EntityManager.CreateEntity(archetype);

    Creating entity via EntityArchetype covers the need for adding components
    (since they're generated as part of archetype -> Less code / .Add calls)

    - Then EntityContainer calls .SetupEntity(EntityCommandBuffer ecb) method on each IEntitySupplier.
    In that method each MonoBehaviour gathered by EntityContainer sets data via EntityCommandBuffer.
    This is non-blocking operation for the jobs and ECB playback is bursted.

    EntityContainer also contains Entity "reference" for the result entity for future access. And a bunch of utility method for accessing / modification of Entities side data.

    3. When OnDisable is called on EntityContainer -> entity is destroyed via ECB.
    Internal entity is set to invalid (in case if its attempted to be accessed);

    So basically entity exists only when GameObject is active.
    And data is gathered and applied during ECB playback only during main-thread only phase. (E.g. before SimulationGroup runs);
    Meaning it does not invalidate dependencies for jobs running.

    Here's an Exec Order:
    -- Frame:
    - Spawn Entities From MB / Propagate changes done from MonoBehaviour side;
    - Run Simulation Group;
    - Run MonoBehaviour Update;
    - ... repeat to the next frame..

    Pro's:
    - Benefits of modifying data from MonoBehaviours without stalling jobs;
    - Nice API for modifying components / buffers in runtime.
    - Does not rely on conversion process, meaning its possible to build without platform plugins (important because of UCB!).
    - No need to convert / create / store subscenes;
    - Work with any MonoBehaviour setup;
    - Simple to access entity from EntityContainer;
    - MonoBehaviours has access to the data after simulation is complete. Meaning no frame skips if necessary.

    Cons:
    - Slower than full conversion to generate entities (still faster than running MonoBehaviour logic though);
    - Some extra memory cost (due to EntityContainer being MonoBehaviour);
    - Sometimes tricky to figure out whats causing issues for the playback / entity creation (but most of the common cases are covered with asserts);

    Overall, its a "hybrid" "conversion". Works for me, and its fast enough.
    I'd post this solution ages ago on github, but it requires modifying Entities package to avoid archetype creation GC (specifically ComponentType* method);

    For the common cases, its possible to author based on ScriptableObject data setup as well.
    Implement IEntitySupplier interface for it. Store hash + strong type hashes (components) as fields. Request EntityArchetype and you can generate as much entities you need with NativeArray<Entity> override from EntityManager. Very fast and efficient performance wise.
     
    Last edited: Oct 22, 2021
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    The thing is I would like Unity to run/use DOTS under the hood in real time so that would mean no conversion steps. More along the lines of data remapping within GOs to Components and Entity data in editor and visualised in real time.

    So the editor would help users create batch based DOTS 'systems' from combined GameObject variables and a dedicated batch based API in editor support and code editor support would go a long way to making this an easy process.

    Any issues with Managed data would be highlighted in the editor and compiler as errors with clear messages on how to remap/convert data to more compatible types or use a single threaded batch based method call.
     
    Last edited: Oct 22, 2021
  4. Lhawika

    Lhawika

    Joined:
    May 27, 2015
    Posts:
    53
    @Arowx I would be interested in knowing what your programming level is, and how much of tool programming you've actually done. You're talking a lot about some pretty crazy tooling ideas, and it sounds delusional at this point.

    I don't have time to answer evertything you've talked about in this thread, but just specifically on that:

    Considering the issue of how the compiler detects, understands and respond to "managed data errors".

    Have you tried to convert managed code to unmanaged code yourself, by hand ? It's easy on paper, for small code snippet.
    But for real code base, it means rewritting most of the code.
    I have this scriting language/compiler/interpreter experiment going on where I do lexical and syntactical analysis. It is pointless, just for the fun of it because I like challenges (which is to say that it is not a real world case in a real commercial project).
    It was originaly in full managed code => classes, virtual methods, delegates, unmanaged data types like strings, regex, etc...).
    I was not happy with the performance of this system. So I went on updating it in order to "make it more data oriented" and "use unmanaged memory".
    It got messy, I tried to build bridges between managed and unmanaged, I burnt those bridges and in the end, what was left of the "managed code" was essentially nothing except maybe the outer structure of it.
    I had to rewrite everything, change the algorithms, read a lot of stuff over the internet about the algorithms I needed, memory management, pointers in C#, etc... It was a bumpy road.

    Take away from this experience ? I learned a lot of stuff, it took me waaaay more time than I initially expected, and I really don't think Unity could make such a "coder assistant" that would magically help us to upgrade our code from managed to unmanaged.

    But maybe I'm just an idiot and you are a genius. If you actually think you can make such a tool, please do it, show it off, and if it is as magnificent as you describe it, we will buy it from you and you'll be rich.

    Honestly, if the people who have the money today (aka Unity, Microsoft, etc) doesn't make such a tool, it is either useless or unimaginably hard to make.
     
    Krajca and Enzi like this.
  5. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Or maybe this is why DOTS is taking so long, so maybe we will have to wait to DOTS 1.0 before we can have a DOTSBatched API.

    Although I really don't understand why everyone is making such a big deal about managed and unmanaged data as I'm sure it's all just bytes to our CPUs?

    It sounds like the C# language itself is getting in the way.
     
  6. Krajca

    Krajca

    Joined:
    May 6, 2014
    Posts:
    347
    It is. Managed memory is dynamically allocated and it's not kept in line if you destroy something. So in time, you will have gaps of not used data in your memory. And that gaps will be filled with another random memory allocation. So there is everything everywhere. It's an oversimplified example as you can't even be sure objects in your memory are allocated near each other. Additionally, you have the magic GarbageCollector which governs when memory is deallocated. In short: your power over memory is almost nonexistent.

    The main benefit of having unmanaged memory is that you can now defragment it into larger chunks of the same type of data. In this way, memory is kept in a more CPU-friendly manner. Less cache misses = better performance etc.

    This is why you can't simply move it from one realm to another and why the current conversion system is the next best thing. It does exactly that: translates managed data into unmanaged and it does this exactly how you tell it to do it.

    That's probably right. As i.e. raycasts and navmesh has batched API I would assume more and more unity build-in features will get that also. But it's need time. To write but also to design proper architecture for the engine.
     
  7. Lhawika

    Lhawika

    Joined:
    May 27, 2015
    Posts:
    53
    Oh boy, it is !

    Coming from C/C++ and trying to actually get performance out of C# is a real pain. So much so that I ignored doing "low level stuff" (basically, using unmanaged memory) for about a decade of use of C# (and it took me 2-3 years to realize C# actually has pointers oO... that's how good they are at hiding them). I mean, it's not really that it's THAT difficult to do, it's just that by choosing to live in the unmanaged world, almost all of the API that come with C# go away, which means most of the stuff I learnt in school or from the internet suddenly become useless.

    It is the very reason why Unity implements its own "NativeArray" and friends and List<> are not usable with DOTS.

    To be fair to C#, it runs decently by default, and each new version of the language seems to open new gates into the unmanaged world thus making it easier for us to squeeze performance out of it (for example, the new function pointers we're getting with c#9 which are great).
     
    Krajca likes this.
  8. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    Read up on heap and stack memory. There are a lot of differences, all tied to performance for the convenience of memory allocation, handling and safety checks.
    Memory reading/writing outside of L1-3 caches is a lot slower, like factor 100. Utilizing L1-3 caches and filling them up is what leads to these huge speedups. Also memory alignment, heap doesn't care much about alignment. Your array could be spread out. Your second array could be packed together with your first, it could be on the other side of the memory.

    You really lack a lot of understanding on this topic, yet make these statements, which puzzles a lot of other readers who are (enough) knowledgable to understand what you say is lots of bogus. I wonder if you ever realized this yourself or just having fun with it.

    And you seem to forget that this is a physical machine we are working with, right? There are physical bounds to all this. It's not just magic. Memory doesn't operate on light speed, if it does, you would be right, it's all just data in memory, but then again, we wouldn't even have much discussions about performance because physical speed would have no limit.
     
  9. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    What do this have to do with managed and unmanaged?
     
  10. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Yes, I think you're messing things up here, managed and unmanaged memory have little to do with heap and stack. Managed memory is actually always on the heap, but unmanaged memory can be on the heap or stack. NativeArray as example are unmanaged, but the array is allocated on the heap.
     
  11. Enzi

    Enzi

    Joined:
    Jan 28, 2013
    Posts:
    908
    What am I messing up? The difference in practice IS how unmanaged and managed are using heap and stack. The answer is actually a lot more nuanced what can exist in stack and defined by the data type. Int arrays in C# are allocated in stack for example whereas they don't have to be managed with a deletion in C# but would memory leak in C++ if you don't. Int arrays which are too big are allocated in heap. So, as you're also confused, maybe you should read up too
    What you're calling managed memory is not making any difference between what data types you're actually using. If by managed memory you are specifically referring to reference types you are correct but that's only half the picture.

    Arowx said it's all just bytes to our CPU when there are big differences. C++ heap usage is different to C# heap. When I said he should read up on it, it's how C# is using heap. Obviously every programming language handles this differently. As we are in C# land, I don't have to explicitly write to read up on heap and stack memory for C#. - or maybe I would just like to see finally see some reflection in Arowxs posts. I don't know. I only care up to some point.

    It's one of the most convoluted and complicted topics, I don't blame anyone not knowing what this is all about. Just take the friendly reminder and read up on it, it'll help you get an understanding and helps you in not using a bunch of words that sound cool when it's all garbled together in a mess of a sentence that doesn't has any real meaning or any definitive statement what the machine is really doing.

    Everything.
    I play you now: "How can anything be slow, it's just 0 and 1, there are not even 2s."
     
    NotaNaN, Mockarutan, Occuros and 4 others like this.
  12. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Managed memory is always on the heap and is managed by the GC, but unmanaged code can be on both and is not managed, as the name suggests.
    You first said "Read up on heap and stack memory" and their differences, and less "how unmanaged and managed are using heap and stack", this can give the impression that you mean that managed memory always lives on the heap (which is true) and unmanaged memory is always on the stack (which is not true)
    C# array are always on the heap, when i tried it i never had an array on the stack if i didn't use stackalloc.
     
  13. runner78

    runner78

    Joined:
    Mar 14, 2015
    Posts:
    760
    Edit: arrays of unmanaged types on the stack could be the case in newer C# compiler/runtime, but not in older ones. How exactly this applies to Unity's mono would have to be tried out.