Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question System Optimization questions for Artificial Life Project [Extremely long post]

Discussion in 'Entity Component System' started by Naotagrey, May 20, 2021.

  1. Naotagrey

    Naotagrey

    Joined:
    Apr 17, 2016
    Posts:
    37
    I'm currently developing an Artificial Life simulation (non-DOTS, regular Unity) and have been following and looking into DOTS for a long time for the future of my project.

    Recently, I started trying to implement some mechanics of my simulation in ECS/DOTS when I lack the motivation to work on my present actual build.

    I've posted a few questions here already and the community has been incredibly helpful so far. I've already learned a lot and my project is starting to form as I add core systems in.
    However, I still lack a lot of knowledge about the cost of using certain patterns/structures and what kind of info would help in figuring that out.

    Info in order to give you all some context and make a concrete use case which we can base this discussion around:
    Warning: I'm going in-depth, jump to the Actual Topic section if you don't care about context. This is also going to be very verbose and I'll not post a lot of code unless we're really diving into the specifics or you request it to better comment on my specific implementation. Sorry for the long post :oops:


    Entity Archetypes (non-exhaustive):

    Agents: Represents a living agent which's next archetypes listed below will be associated with.
    They have a lot of Components describing their state

    Nodes: Hold values. Can be analogized to neurons, even if my use of them is a little more generic.
    Have both a [value] component and a [previousValue] component. They also have other components and tags to differentiate between different types and functions.

    Connections:
    Connect two Nodes [InputNode] and [OutputNode] components both containing references to Node Entities. Are used to propagate values between them and form networks (like a neural net). They also have a [connectionStrength] component, deciding the strength of the connection between Nodes.

    Modules: Hold Nodes. These will be what will read the world from the agent's perspective and write values to Nodes (senses) or read from Nodes to control the agent's behavior/characteristics (actions). Some modules will also not interact with their agents directly, but instead produce more complex processes than what would be possible just with connection (Example: a Memory Module saving an Input Node's value when another Input Node is high, setting that value to the Output Node)

    As a result, each agent can have an arbitrary amount and combination of modules, and the processing of their "brain" is managed by the particular systems in parallel, which ends up being far more efficient than the present Monobehavior implementation I had (obviously ahah).

    Here's an example of the kind of structure that could refer to a particular agent (only for example, don't get caught in specifics):

    Key takeouts: Multiple connections can have the same Node as Input or Output. Module Output Nodes can't be on the receiving end of a Connection, and vice versa. Free Nodes can be connected to either way.​

    Systems:

    These are some of the Systems I have implemented so far

    Node Preparation System: The [value] of Nodes are written to their [previousValue], and are then reset to their default value.

    Connection Preparation System: The [previousValue] of each Connection's [inputNode] is read and multiplied to their [connectionStrength], and then written in their [signal] component.

    Signal Propagation System: Reading each Connections' [signal] and summing the ones connecting to the same [OutputNode] together, in order to then write those sum to each Node's [value] component.

    Module Specific Systems: Each different kind of Modules have their own Systems, essentially reading certain values from their InputNodes or their Agent's components (Kind of like the Connection Preparation System), processing them (unique to each module kind), and Writing to their OutputNodes or to their Agent's components.


    Actual Topic:
    Sorry for the long long introduction, but I think it helps to understand the exact implementation.

    While I'm already seeing enormous improvements compared to my present implementation, I want to learn more about the process of optimizing systems and what kind of data structures are more suitable in certain situations.

    I have a few discussion points, I'd love to have your input on any of these:


    Does it make sense to have separated the Connection Preparation and the Signal Propagation Systems? In what situation would that not be the case anymore?

    I figured that it would be better to do so, but I'm not sure why. Especially that the propagation system HAS to run after the connection system, maybe it would make more sense to combine them into the same system and just have the reading as jobs that write the resulting signals in a NativeArray<float>?
    Is there a situation when one would be better?


    What's better between GetComponent() and a job reading from entities? What are the conditions favoring one over the other?

    Right now I'm using GetComponent<Node>(connection.NodeInput) In the Connection Preparation System as such :

    Code (CSharp):
    1. [UpdateAfter(typeof(NodePreparationSystem))]
    2. public class ConnectionPreparationSystem : SystemBase
    3. {
    4.     protected override void OnUpdate()
    5.     {
    6.        
    7.         Entities.ForEach((ref SignalComponent propagation, in ConnectionComponent connection) => {
    8.             if (!HasComponent<NodeComponent>(connection.NodeIn)) return;
    9.  
    10.             NodeComponent node = GetComponent<NodeComponent>(connection.NodeIn);
    11.  
    12.             propagation = new SignalComponent()
    13.             {
    14.                 value = node.previousValue * connection.strength
    15.             };
    16.  
    17.  
    18.         }).ScheduleParallel();
    19.     }
    20. }

    I've also considered reading all Nodes and doing the processing in NativeArrays, but it feels a lot heavier somehow. I'm also not sure I understand the cost of creating and storing NativeArrays during the Jobs and when to use that instead of Random-Acess-Read like when using GetComponent()

    How would that change under different situations like
    • A low number of Input Nodes, a high number of Connections (>10 outgoing number of connections per node)
    • A comparable number of both ( between 0-3 outgoing connections per node)


    Can we improve the performances considering we can easily create subgroups since Connections and Nodes "of different Agents" would never mix and inter-connect?

    Is there actually any value in doing that or does it just create added overhead for no benefit? My intuition would tell me that it could be very beneficial (smaller but more numerous NativeArrays ?) but I have no idea if that intuition is good or funded in the reallity of how CPU cores store and access information.​



    Considering my Signal Propagation System, is an Index NativeMultiHashMap the right structure compared to a Value NativeMultiHashMap?

    In the case of the Signal Propagation System, I want to read the signal of every connection, then work out which signal is targetting which Nodes in order to find the total signal received by each node.

    In this case is it better to build a separate NativeArray containing the signals, then a NativeMultiHashMap of which indices of that first array refer to each Nodes ?

    Compared to just directly building a NativeMultiHashMap of the direct signal values associated with the same target Nodes ?

    Is there a completely different approach I'm missing?​


    What's the cost of accessing a NativeHashMap element using a key vs looking up the Index of a Value in a NativeArray?


    In many cases, I need to build maps of which NativeArray indices refer to which Entities. (let's say when building an aray of input Nodes' values and figuring which one is associated with which module)

    Is building a NativeHashMap (using the entity.Index as Key and the entityInQueryIndex as value) better than doing the oposite (NativeArray[entityInQueryIndex] = entity.Index)?

    Considering we then need to get the queryIndex through either
    Code (CSharp):
    1. var index1 = NattiveArray.IndexOf(entity.Index);
    2.  
    3. var index2 = NativeHashMap.TryGet(entity.Index);
    In both case we need to check if the value was found at all, but that part should be comparable.​


    Again, sorry for the long post, I hope it's not going to scare off people that could've helped out :confused:
    I believe that I've laid out a complex scenario that's detailed enough so that we can have good discussions about the underlying data structures and performances associated with each.

    Let me know if more information is needed on some points, I'll try my best to answer.
     
    varnon likes this.
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Wow. Lots of details for a very advanced use case! I'll try to help you out. :)

    First, nothing I say about performance may necessarily be true. The profiler is the source of truth (assuming you use it right). :p

    Second, we need to talk about cache. It is kinda fundamental to this problem. So cache is designed to optimize two for two use cases: spatial locality and temporal locality. Spatial locality means that adjacent memory addresses are likely to be used together. Temporal locality means that if a memory address is used, it is likely to be used again in the near future. These not only apply to data, but also instructions your CPU executes.

    ECS primarily takes advantage of spatial locality for data, and temporal locality for instructions. You get spatial locality of instructions by making your code less branchy, which is typically a good idea for other reasons anyways. But for temporal locality of data, you want to run as much work on the data while you have it in the cache. And this is the part the trips everyone up: Temporal locality is scoped to a job, not a system. So the question isn't: "Same system?" but instead: "Same job?" and if the answer is no, then whether or not they are even in the same system will have more to do with scheduling concerns and code maintainability than anything cache-related.

    So ECS is designed for data spatial locality by iterating through entities and components tightly packed together in memory. However, that only works when doing chunk iteration (such as an Entities.ForEach). The moment you have to look up data on another entity, you have to jump to a totally different region and spatial locality breaks. This is a "random access" and is a lot slower, but often necessary in a lot of situations. You can't eliminate it, but you can reduce it. Now what is more peculiar is that GetComponent (which uses ComponentDataFromEntity internally) actually performs two lookups. The first is to find where in memory the entity is actually located. And the second is actually getting the component. This means that if you were to look up two separate components on the same entity back to back, you would have three random access and one temporal locality cache hit for finding the same entity. Compare this to accessing a random NativeArray index which only requires a single random access, and you may be able to think of clever optimizations using NativeArrays. NativeHashMaps are designed for reducing algorithmic complexity, and I can't recommend them if you can get the same algorithmic complexity using NativeArrays.

    You are super close to thinking up what I would consider the optimal solution for this problem. What makes this problem so tricky is that the giant web of possible relationships is impossible to order in memory that completely eliminates random access. This is a random access problem. But we still have one trick up our sleeve. Cache is small, but it isn't tiny. So unless a single agent has a ton of nodes and connections (megabytes worth), we can get at least full L2 cache spatial and temporal locality by packing all of an agent's nodes and connections together and evaluating them all at once. And if the data is less than a kilobyte, then we get the ultra fast L1 cache benefits.

    And the tool for that is dynamic buffers. Instead of nodes and connections being entities, they are going to be elements in an agent's dynamic buffers. And ideally, you want a single Entities.ForEach across all agents that evaluates all the nodes and connections in those dynamic buffers. Most likely, that algorithm will be small enough that you will get all four benefits of cache: spatial and temporal for both data and instructions.

    This does come at a cost with regards to your modules interacting with your nodes. Your modules now have to not only look up the dynamic buffer of the entity but also the index in that dynamic buffer, which means those lookups require 3 random accesses instead of 2 with GetComponent. But you just got rid of a bunch of random accesses, so this is likely a much smaller price to pay in comparison.

    If these networks are dynamic, it might be a lot trickier to maintain dynamic buffers instead of connections and nodes as entities. There's a convenience vs performance trade-off here, and something we can get into more details about if you wish.

    You could also try experimenting with shared components, but you will have to cache chunks and pay for lots of unused memory compared to dynamic buffers, so I doubt it is worth it unless network modifications are extremely common and scattered throughout the codebase.

    I believe I sort-of answered these to an extent. In any case, I think you may be barking up the wrong tree with these. They make more sense if you had one giant network rather than a small isolated network per agent.
     
    Timboc likes this.
  3. Naotagrey

    Naotagrey

    Joined:
    Apr 17, 2016
    Posts:
    37
    First of all thanks a lot for the time you took to answer! I was a little scared that I went in too hard ahahah

    There's a lot of very interesting info

    Okay! Didn't know that. I think I better understand the difference between jobs and systems now.


    Oh yeah, I remember reading about DynamicBuffers, but even after going back and reading more, there are a few things I'm not sure I understand.

    First of all, is the cache level selection automatic or do we have to work that out ourselves? If I'm not mistaken cache size varies between systems/chips, seems like it would be complicated.

    In my specific case, it wasn't too uncommon for agents to reach ~250 connections + ~200 nodes. Although that depends a lot on the simulation parameters.

    If we consider that each connection contain 2 ints (node indices) + 1 float (connection strength) => 12 bytes/connection
    for nodes, they each contain 2 floats (prev and next value) + 2 enum (type and activation function) => 12 bytes/node

    Not sure if there are some more header memory associated with dynamic buffers, but if we assume that it's negligible, for small networks, we could have 25 nodes + 25 connections => 600 B
    But for bigger networks like previously mentioned, 250 nodes + 200 connections => 5.4 KB

    Secondly, Dynamic Buffers seem to be stored like components with entities, does that mean that they are permanent? In my specific case maybe connections don't actively change during runtime (number and strength of connections don't usually change during an agent's lifetime), but node's values do. Would we have to rebuild/change the buffer every frame?

    Thirdly, DynamicBuffer's seem to change their entity's archetype when they change size (or maybe I misunderstood),
    This wouldn't pose too much of a problem since the number of nodes and connections shouldn't change during an entity's lifetime (except rare cases like user's input), but since we have such a large span of possible combinations maybe it would be possible to have predetermined sizes and jump to another if it becomes too big (even if in the end a certain number are unused)?

    Lastly, if I want to have supplemental data (like tags or such) attached to nodes that are not useful for actual processing, but are used in rarer cases (like to identify if the node is a module's input or output, in order to determine legal connections when adding one). What kind of form would that take? I know that would be very dependant on the specific use case, but I just want to know what kind of tools/structure could be possible.
    I would identify 3 cases:
    • Additional info associated with each element (Ex: an evolution marker (int) on each node)
    • Lists of indices classifying each element (Ex: telling which nodes can receive/send connections)
    • Unique data attached to individual arbitrary elements (Ex: a label attached to a node by a user)

    Again, thanks a lot for your time and help, DOTS/ECS is an amazing paradigm that I'm eager to learn about and explore
     
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    It is automatic. The mechanism is more complex and you will still often get some benefits of L1 even if your data set doesn't totally fit.

    Try ushorts for the node indices. Then you'll have 8 bytes per connection.

    Dynamic Buffers are stored in chunk memory until they get too big, and then they are stored on the heap instead. There's a little header with each buffer which states where it is stored and how large it is. You can modify individual elements of a buffer without having to rewrite the whole thing. Additionally, for the data sizes you are thinking about, you may wish to use [InternalBufferCapacity(0)] in order to force buffers to allocate on the heap at all times, since they would likely never fit in chunks anyways.

    You misunderstood.

    I would use a dynamic buffer of bit flags, such that each element corresponds to the node at the same index. I don't know how frequently used those other enums are but you may want to pull those out of the node buffer and into a node metadata buffer that also includes these bitflags.
     
  5. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,131
    EDIT: :) he posted before I did....(only saw after I sent my post, will leave it here...)

    I hope that @DreamingImLatios responds again, you will get a better answer than from me, I also have not used DOTS for more than a year...but as far as i recall:
    Q1: automatic by system, but like you calculated, this gives you an indication if you have a chance for certain caches (i.e. smaller = higher)
    Q2: not sure I understand correctly, if I did = permanent and you would need to update them each frame if values change (depending on your setup changeversion from ECS or other ways of tracking changes might be helpful)
    Q3: no, they do not change the archetype when they change size, when they fit in a chunk they are allocated with other components in the stack, if they dont fit only a pointer gets stored with the other components and the buffer is on the heap
     
  6. Naotagrey

    Naotagrey

    Joined:
    Apr 17, 2016
    Posts:
    37
    Thanks a lot both :D!

    I see I still have a lot to learn, but with each days passing, I learn a little more:p
     
  7. DV_Gen

    DV_Gen

    Joined:
    May 22, 2021
    Posts:
    14
    First, super interesting project. It finally got me to register my account and stop full time lurking. I hope to hear more about it. It sounds like we have had a lot of the same ideas.

    DreamingImLatios will be the one that has better performance advice. I can't speak to some of those details. In concepts though, well I also agree with him. The first thought I had when reading through you materials it's that it seems like you are separating things based on how we could conceptually think of things (nodes and connections) instead of in a way that actually is going to work for the computer. This sort of threw me off too when I first started working with neural networks. I was expecting each neuron to be an important thing or class of its own. Instead, what I found was essentially layered regressions and array operations... Which makes sense when you think about it.
    Your node preparation, connection preparation, and signal propagation systems, for example, all seem to conceptually make sense as separate things, but I think you will be better served treating them like a single operation.

    First, your node entity could take a dynamic buffer of inputs, sum them to get a new input value. Then apply that input to a dynamic buffer of weights, to get an output buffer ready. After all the nodes do this loop, another loop will be needed to take all the outputs and send them to new entities using another dynamic buffer of an entity index and input slot. I want to say there is a way to do it all in a single pass, but it's not coming to me at the moment.
    After these two jobs run, you can have other jobs to add / remove connections, change weights, etc.

    And then once I get to this point, I have to also start wondering if you even want an entity per node. It might be better to think of things as layers instead of individual nodes. It's been a while since I've done a neural network from scratch, but that is essentially what you are looking at. I'd have to think about the layer approach more. I would definitely look more at how neural networks are coded, and also statistical regressions - the two topics overlap quite a great deal. Really excited to hear more about this. I hope some of my thoughts were helpful.
     
    Naotagrey likes this.
  8. Naotagrey

    Naotagrey

    Joined:
    Apr 17, 2016
    Posts:
    37

    I'll have to give DynamicBuffers a try for sure, seems like it could be very powerful and be far better than the entity-based format I had.
    However, in my case, I'm developing a new genetic algorithm that has no layers, similar to the NEAT. It's always going to be a big mess (also a nice challenge in visualization and graph drawing algorithms ahah).

    I figure that what I might try is to have each agent having a DynamicBuffer of nodes and another of connections, instead of them being entities. This will already represent a big improvement IMO

    I am trying to stick to the old OOP mentality :oops: I still have a lot to learn ahaha
     
    DV_Gen likes this.