Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Resolved Thread safe strategies for handling race conditions?

Discussion in 'Entity Component System' started by RamblingRoot, Aug 8, 2021.

  1. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    Been looking for resources and strategies to handle some basic race conditions, but am finding it difficult to find much of anything, or it's dated. Hoping that the forums will offer some help.

    Basic explanation of system: fluid sim using cellular automata methods in a world grid.
    So I have three specific types of basic race conditions in a system:

    1. Bool Check, if previous value or current value is true, set to true. Basically, I'm trying to check if a physics object is "inFluid", just a bool check to see if the world grid entities overlapping the object contain fluid. The race condition occurs when two threads are checking two different world grid entities at the same time, one is in fluid and one is not, but if the threads write the new values at the same time, it could end up still coming up false. I'm less concerned about this one, as a race condition is extremely rare and I have a few theories on other thread safe ways I could check if something is in fluid.

    2. Iterative addition, value += newValue. Basically I need to transfer fluid from a given entity to another to allow physics objects to "push" and interact with fluid. The race condition occurs when multiple threads are trying to transfer fluid from different source blocks to the same destination block, at the same time.

    3. byte | operation, byte |= value. In order to allow physics objects to interact with the fluid sim I'm using a byte to contain whether flow is allowed to pass through the sides of a world grid fluid block (8 if top flow is blocked, 4 for left, 2 for right, and 1 for bottom). The race condition occurs when multiple threads are trying to perform a | byte operation to flip one of the values to on. I'm getting cases where a small amount of fluid will leak through physics objects where it shouldn't be. This is due to two threads trying to each flip different byte values from 0 to a value at the same time and only one actually getting flipped, so instead of a block having fluid flow of both the top and right sides blocked, it only has the top blocked.

    I've tried implementing nativequeues to store and track each of these operations and then perform them on a single thread in a threadsafe way (not in parallel) and it fixed all the bugs, but it came at a higher cost than I would like, so I'm hoping I can find some way of doing these operations in parallel in a thread safe way. I'm hoping that there is some new API methods or some preferred strategies that people are currently using to handle these sorts of race conditions, but I haven't been able to find anything clear/recent on the google besides some vague suggestions, most of the stuff that comes up is basically saying to just not have race conditions in the first place, but these iterative operations are basically required for my fluid sim to work, not sure how else I could do it.
     
  2. IgreygooI

    IgreygooI

    Joined:
    Mar 13, 2021
    Posts:
    48
    1. Take a advantage of System.Threading.Interlocked, use atomic operations on each cell.(I have not tried this on a cell-by-cell basis, thus not sure about the cost)

    2. Divide the World grid into chunks, and put a read-write lock to protect each chunk. So either one writer or multiple reader could access any cell within a chunk at the same time.

    These should solve the any incorrect result produced by he data race. But they are still indeterministic. Based on thread timing, they might produce a different result.

    Are you trying to do everything here inside Burst-compiled job, or?
     
    MintTree117 likes this.
  3. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    Don't know anything about system.threading.interlocked, but it sounds like it could be what I'm looking for, I'll look into it. atomic operations would be great, i'll have to experiment.

    The problem with splitting the world into chunks would be that the entities on the border of each chunk need to potentially read/write to it's neighbors in a different chunk. This could be solved by running the sim update in a checkered pattern (think a chess board where you update the black chunks first, followed by the white chunks using two different jobs). However there is an additional problem with my specific use case. The physics objects i'm talking about can be very large, potentially spanning the entire screen and going across multiple chunks. For the 2nd case I specified, to solve cases where fluid is completely covered by a solid portion of the physics object, I move the entire amount to the outside border of the physics object. So this means that for an entire row of a section of the physics object, any number of blocks could be trying to transfer fluid to the same border block, potentially located several chunks over.

    I don't think I care too much about it being deterministic, I just can't allow race conditions. And considering each of this operations are iterative in nature, and don't depend on the the current value to determine what is being added (or which byte is being turned on), I don't see how the final result could be indeterministic for my specific cases?

    And yes, I'm using burst-compiled IJobParallelFor jobs for the cases mentioned.
     
  4. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    I should also specify, for the 3rd case of flipping byte values, In a different job I clean up and reset all of the flowblocked byte values to zero, so the job with the race condition will only ever flip on a value in the byte if the flow in a particular direction should be blocked, never off. Since this job is handling physics object interactions with the world grid fluid sim, the flowblocked byte needs to be reset to account for movement of the physics object after each update
     
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,223
    This would imply you are writing the bool result to the physics object. In that case, you should be iterating through physics objects and for each physics object, check all cells.

    If for whatever reason you need to write to both the physics object and the grid entity, the solution I typically use involves a "cross-bucket". I explain the concept of it here: https://github.com/Dreaming381/Lati...rt 2 - Build Collision Layer 1.md#multiboxing

    You could store in a NativeArray how much a cell gives to each of its neighbors. Then a second job adds that all up.

    There's not enough information on your data access patterns here for me to give you a suggestion.
     
  6. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    for case 1, your suggestion makes sense, I was originally trying to save processing time by including all physics objects in the same job, as there is potential for a large number. Although, even moving to this approach wouldn't remove the potential of a race-condition.

    Lets see if I can provide more context (this will shed some light on the byte issue as well). So my game is a grid based 2d sandbox game, but which also has the ability to detach sections of blocks as separate physics objects but are no longer bound to the 2d grid (they each have their own separate 2d grid of blocks). So when I say physics object, I essentially mean the same 2d grid block system/possible shapes, but detached to a separate free-moving 2d grid, self-contained to the physics objects, instead of being locked to the 2d world grid. I have a whole system set up that works very well to allow this. However, it obviously complicates things with my world grid physics systems. There is now the possibility of multiple physics objects entering the same fluid block on the world grid simultaneously (max of four can partly enter the same fluid grid block), it also means that the detached physics objects need to tell the fluid system where water can flow which is what I'm using the particular flow-blocked byte for. Most of this would be thread safe if I moved to a per physics object job, but there is a special case where the physics object perfectly sits on the world grid (has an x or y offset of zero). but I actually think I could design around this issue and move to a per physics object job architecture for particular parts to possibly solve the byte race condition.

    So since a physics object is really it's own 2d grid world of blocks, just much smaller, and moving freely relative to the locked world grid, to check if it is inFluid, I have to check each block in the physics object grid to it's nearest world grid block, but in parallel, which is what is producing the race condition. So there would be potential for multiple blocks of the same physics object grid to be reading and writing to the inFluid bool of that physics object. Hope this makes sense.

    for case 2, I actually do this in the overall fluid sim already, and its thread safe, my problem specified here is caused by trying to get physics objects to interact with the existing grid fluid system. Part of this interaction involves the potential of multiple grid blocks transferring there fluid to the same destination blocks. Imagine a large physics object that completely covers four fluid grid blocks in the same row. My logic wants to send all the fluid in those blocks to the fluid blocks on either side of the physics object, this would mean that four source fluid blocks are trying to transfer their fluid to two destination blocks, potentially at the same time. To complicate things, the destination block is highly variable and dependent on the physics object which can change size and move locations relative to the world fluid grid. I also have the potential of multiple physics objects sending fluid to the same block on the world grid (this part of the problem could be solved by moving to a per-physics object job as you mentioned, but this won't solve the first issue).

    There is more complications I'm not mentioning, I have an entire physics system built in that will produce all the forces from interaction with the fluid system (flow, buoyancy, displacement, mass adjustments for fluid in and above a physics object for certain cases), these all involve the system needing to know the relative locations of other physics objects. I don't want to write a novel, so I'll stop there, but it sounds like moving sections of the jobs to be per-physics object could remove some of the race conditions, hopefully not with too much extra cost.

    I'll try changing the approach to per-physics object and then check out the threading.interlocked option.

    so far getting some promising leads, so thanks
     
  7. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,223
    Thanks for the context. I see why you are getting tripped up. Ultimately, your job is too granular. You really want to be iterating over physics objects, not the individual cells, especially since you are writing the result to the physics object as a whole. Try something like this (pseudocode):
    Code (CSharp):
    1. public void Execute(int i)
    2. {
    3.     var physicsObject = getObjectAtIndex(i);
    4.     bool isInFluid = false;
    5.     foreach (var cell in physicsObject)
    6.     {
    7.         isInFluid |= checkCellAgainstFluid(cell);
    8.     }
    9.     writeResult(i, isInFluid);
    10. }
    For this, I would store the fluid that needs to exit the cells into a NativeMultiHashMap or a NativeArray<UnsafeHashMap>. Then you can iterate by key to do the summing in a separate job.
     
  8. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    wouldn't "isInFluid |= checkCellAgainstFluid(cell)" still potentially produce race conditions since it is both reading and writing to isInFluid?

    the hashmap suggestion is interesting, i'll have to look into that as well, never really used a hashmap before, but I get the concept. But I'm curious, why a hashmap specifically as opposed to something like a nativequeue or some other native container?
     
  9. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    Oh nevermind, I misread, the multiple physics objects would be in parallel, not the cells of the objects. Makes sense, I could try that.

    Although it's more the cells in a physics objects that would be the most beneficial to multithread, I also do heavier calculations on a per physics object basis that require checking other properties of the physics objects cells vs the world grid, stuff like fluid force effects (flow, buoyancy, etc), which if done on a single thread for a large object would be fairly heavy on the cpu. Also the variability of size between objects could cause everything to get hung up waiting for a single thread that is handling a very large physics object.

    All of this really needs to happen during fixedUpdate() as well, as it is playing with the unity physics system, so hanging up waiting on a thread would cause some bad feedback and ruin the fps
     
  10. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,223
    Profile this first and make sure if it is a real issue. If the issue is that the check if a single physicsObjectCell is in a fluidObjectCell is too expensive on a large physicsObject, use a parallel job where you store the result of that calculation per cell, and then have a second job that combines the results into the larger physicsObject. But the former case could have an early-out for large physicsObjects which may be faster.

    Anyways, profile. And pick your battles carefully.
     
  11. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    a lot of trade offs with this for sure. I could always try out and profile/test everything, but my logic is already pretty complicated and inter-dependent, so changing it for tests would be a lot of work. I'd like to try figure out the likely best-option solution ahead of time ha.

    So when you say a second job to combine the results into the larger physics object, this would have to be on a single thread I assume, to avoid race conditions. what would be the best way to have a performant single thread job that is also bursted? I've heard something about being able to use .Run() on a job instead of schedule to have it run on a single thread until it's finished, would this be the best way?
     
  12. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    Or I guess, it might make more sense to have a iJobParallelFor job that runs for each physics object, and then iterates linearly inside of that job for each cell inside the physics object. I think this is what you probably were saying
     
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,223
    You shouldn't need to change anything. Just open up the profiler window, switch to the timeline view, press play, set up a complex scene, then enable profiler capture for a little bit. Then stop the game. Now you have a view of what all your worker threads are doing and how well-distributed your job scheduling is.

    Yes it was.
     
  14. RamblingRoot

    RamblingRoot

    Joined:
    Apr 2, 2020
    Posts:
    24
    So, I've managed to get a solution working just by adding the change values and the index of the component to implement the change to a nativeQueue, and then just iterating through on a single thread to make the changes in order to avoid race conditions. Similar to what I tried originally, but I had done something incorrectly so I thought it was too slow. Seems like its performing good enough. I still may come back to this to see if I can optimize further, but for now I'm happy with it.

    Thanks for the help.