Search Unity

[RESOLVED]Best way to process Non-linear graph with ECS (EX: Cloth simulation)

Discussion in 'Entity Component System' started by Mr-Mechanical, Mar 29, 2020.

  1. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Imagine this:
    -Each Entity has a position and velocity component

    -Connections touch 2 entities and modifies their velocity components accordingly, however, all connections are processed in parallel.

    How do you process this with ECS, each velocity component may need to be accessed by multiple connections at the same (as they are all processed in parallel)

    2-Figure2-1 (1).png
    I've thought about this problem for a very long time, however I don't have a clean performant solution.

    EXTREME props towards anyone that can figure this out. The main problem here is that the velocity component may be manipulated by multiple connections at the same time.
     
  2. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    I've been struggling with this for a number of months, I hope this is something ECS can handle. I have designed systems that accommodate this problem but they have BAD performance. Any design suggestions will be greatly appreciated.
     
  3. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    5,203
    In theory you could use ComponentDataFromEntity to describe the references... But ultimately if you are writing a cloth system I don't think that one Entity per vertex is the way to go. I would have each entity describe a piece of cloth instead. Multiple DynamicBuffer is a reasonable structure. Or maybe even having the system manage the internal simulation state in a custom data structure. Ultimately accessing another element by index is faster than accessing it via ComponentDataFromEntity.

    DataOriented design is not about solving every problem with the same structure. With DOTS it is easy to create & managed your own data structures that are accessable from higher performance jobs.

    Unity.Physics is a good example of this. Collision detection is inherently a random access pattern. Thats why rigidbodies during simulation are represented in a tightly packed array. What ECS gives you is that it can efficiently load & communicate that data between systems. But it doesn't mean you have to try to fit everything directly into an entity.
     
    snacktime, SamOld and Mr-Mechanical like this.
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    There's two techniques I know for parallelizing this:
    1) For a grid cloth, run one pass over the vertically oriented connections, and then a second pass over the horizontally oriented connections.
    2) Use a multibox like I describe here: https://github.com/Dreaming381/Latios-Framework/blob/master/Physics/Physics/Types/CollisionLayer.cs
    Except with the multibox, instead of using it as a broadphase like I do, you do one pass over all buckets other than the cross-bucket and then a second pass on the cross-bucket. This is probably the technique I will use when I get around to cloth as it works with arbitrarily shaped meshes.
     
    Mr-Mechanical likes this.
  5. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Thank you for your response. I assume my main downfall is trying to use entities as a container too much. My approach before was to use a "buffer" of 2 force entities per connection-entity. Then there is a DynamicBuffer for each particle that sums all the force entities connecting to each particle (referencing a force entity that is also referenced by a connection entities). Though I'm not sure if entities are supposed to be randomly accessed like this.

    Code (CSharp):
    1. [BurstCompile]
    2. struct AddBeamForces : IJobForEachWithEntity<Velocity>
    3. {
    4.               [ReadOnly]
    5.              public BufferFromEntity<BeamForce> lookup;
    6.              [ReadOnly]
    7.             public ComponentDataFromEntity<Force> forces;
    8.              public void Execute(Entity entity, int index, ref Velocity velocity)
    9.             {
    10.                  var buffer = lookup[entity];
    11.                  float3 totalForce = new float3(0, 0, 0);
    12.                  for (int i = 0; i < buffer.Length; i++)
    13.                  {
    14.                      totalForce += forces[buffer.Value].Value;
    15.                  }
    16.                 velocity.Value = velocity.Value + totalForce; //mass is 1
    17.             }
    18. }

    The performance result I'm concerned about is the uneven distribution of workload for this AddBeamForces job. The jobs size are either very very small or large. This is probably because my design is sort of hacky. Screenshot (274).png
    I'm sure a better design is certainly possible. Perhaps using arrays as a "buffer" as suggested could alleviate this problem. I'm going to think about the best way to revise the job above. Perhaps represent forces as an array instead of entities. Then store indices instead of entity references in the Connections.

    Thank you so much for your help. Hopefully I can get to the bottom of this soon.
     
  6. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Thank you for your physics code that I can use for reference. This is greatly appreciated. In my case all the constraints are resolved in an averaging manner (jacobian) rather than the sequential manner (gauss seidel), this graph coloring suggestion would work very very well for gauss seidel solver. I'll see if maybe I can apply the same concept for the jacobian solver I am using. In my case graph coloring seems too difficult for me though.
     
    Last edited: Mar 29, 2020
  7. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    In that case there's the third option of double-buffering.

    I wouldn't use that codebase as much of a reference other than how to set up and use a multibox broadphase. It's pretty outdated and I hope to have a new version up soon.
     
    Mr-Mechanical likes this.
  8. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Is there way to ensure the jobs are split into small sizes with Entities.ForEach? Maybe that will fix my performance problem. The issue here some job sizes are really large like shown above while others are small, causing a lot of idling on most cpu cores during AddBeamForces job.
     
  9. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    Entities.ForEach works at chunk granularity. You would have to limit the number of entities per chunk. I typically find the little bubbles fill up with other jobs in large enough projects where performance actually matters. Looking at the timeline screenshot, I don't think you are there yet to worry.
     
    Mr-Mechanical likes this.
  10. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    The processing time is misleading, as the system has to run at 2000hz. So I have 0.5ms seconds of simulation time to spend per frame.
     
  11. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Perhaps the job workload will even out if I increase the number of particles.
     
  12. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,255
    Is that 2000Hz synchronized or just 40 iterations at 50 Hz?
     
  13. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    Basically all systems execute 2000hz according to adjusted FixedUpdate() calling systems manually.
     
  14. SamOld

    SamOld

    Joined:
    Aug 17, 2018
    Posts:
    333
    Joachim_Ante already alluded to this, but I would suggest moving away from trying to treat this as an ECS problem. If you want to insist on having each vertex be an entity, then treat it as a three stage problem.
    1. A system reads out the vertex entities into your own data structure that's better suited to the simulation problem.
    2. A job runs the simulation in your own structure.
    3. Another system copies the results back to your entities.
    I've never written a cloth simulation so I don't know what the best structure would be, but it sounds like you know what you're talking about with that already.

    This is how the Unity Physics system works, and it's often a good idea to do something like this when you need to process some form of many to many interaction. This should be considered any time the overhead of copying to and from the other structure might be smaller than the efficiency boost you get from having a structure that's better suited to your algorithm.

    This also makes your code more modular, which I consider to be a bonus. If you do it this way, you end up having a jobified cloth simulation module that's completely independent from the ECS. Maybe you'll want to use that later on data that's coming from a different source. For example, in some cases you may want to store a whole piece of cloth as one component, and in others you may want each vertex to be its own entity. A single modular engine could process both.
     
    Mr-Mechanical likes this.
  15. Mr-Mechanical

    Mr-Mechanical

    Joined:
    May 31, 2015
    Posts:
    507
    As I increase the workload the cpu workload becomes more distributed, I am closer to my performance target than expected. My benchmark was actually very broken. Testing with build I get results 10-30 percent better than very carefully hand optimized code. The code snippet above actually works great to my surprise. I really appreciate all of the feedback and I am happy to resolve this issuem
     
    SamOld likes this.
  16. tonsillectomy

    tonsillectomy

    Joined:
    Jun 2, 2015
    Posts:
    19
    Nice thread. A quick feedback: I would encourage trying out a Gauss-Seidel graph colouring technique as described here: https://github.com/rpandya1990/Gauss-seidel-Parallel-Implementation (I mean the red and black one).

    - You could ditch the entity per vertex approach and instead store the velocities in a NativeArray<float3>
    - Each vertex updates its velocity relative to its neighbouring vertices (up, down, left and right)
    - Iterate through the array so that you first update the even vertices (0, 2, 4, 6, 8 etc.) and then calculate the odd vertices (1, 3, 5, 7 ,9 etc.) with the newly updated even vertices
    - The even vertices read only from the odd vertices and vice versa, so it's safe to multithread the calculations using IJobParallelFor
    - You could think about this Gauss-Seidel approach as how modern tv monitors work, versus how the old CRT monitors update the picture (Jacobian)

    Happy coding! :)
     
    Mr-Mechanical likes this.