Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Fast way to clear memory in parallel?

Discussion in 'Burst' started by Zergling103, Feb 11, 2023.

  1. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Can
    UnsafeUtility.MemClear
    be called within a burst-compiled job to quickly clear memory? If not, what other options are there?

    What I need:

    I have a bunch of large
    NativeArray<T>
    that are used as temporary storage to accumulate values, where:
    • These arrays will be used every tick the simulations run to accumulate values for other jobs.
    • The arrays will rarely or never change size from tick to tick.
    • The contents of the array must be reset to 0 at the start of each tick (before the arrays are used for accumulation for this tick).

    What I tried or considered:
    • Creating temporary arrays with the
      NativeArrayOptions.ClearMemory
      option.
      This locks up the main thread, with most of the time spent allocating and zeroing out the memory. Threaded code is also wasting time freeing the memory. This allocation and de-allocation is undesirable for this use case.
    • Use persistent allocation, write an IParallelFor job that sets array elements to
      default(T)
      .
      This has the advantage of the clearing array contents being threaded and scheduled with other jobs. However, assigning 0 to each element individually is inefficient.
    • Use persistent allocation, clear the arrays using
      UnsafeUtility.MemClear
      .
      While this removes the time spent on allocation and deallocation, zeroing out memory still takes substantial time and freezes up the main thread.
    • Adding data to signal to downstream jobs that the array contents should be discarded. I could write branches in jobs that checks a boolean value that decides whether to read array contents, or start from 0. This would bloat code and potentially slow execution by a tiny amount for each execution of the job.

    What I am hoping for but can't find:
    Ideally, I am hoping to find or create a job-like memory clearing process that:
    • Uses low-level memory clearing operations like
      UnsafeUtility.MemClear
      (presumably this uses
      memclr
      or similar).
    • The memory clearing operation, for sufficiently large arrays, can be split up among multiple threads, similar to IParallelFor. Perhaps the array is split into N chunks which are cleared by N threads.
    • Can be scheduled like a job, and used as a dependency for other jobs.
    • Does not execute on, or stall, the main thread whatsoever.
    Any help would be appreciated, thank you!
     
  2. Zuntatos

    Zuntatos

    Joined:
    Nov 18, 2012
    Posts:
    612
    Do be wary of potential scaling problems, on consumer PCs you may saturate RAM bandwidth with 2-4 threads doing memclear/memset. Adding more threads won't help there. It may be best to do the clearing in parallel with some compute work, if at all possible.
     
  3. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Can you clarify what you mean by "compute work"? Given that all of this is work done by a computer, that doesn't narrow it down much. :p
     
  4. kdchabuk

    kdchabuk

    Joined:
    Feb 7, 2019
    Posts:
    47
    I found UnsafeUtility.MemClear to be very fast and, as far as I can tell, it works fine in a Burst-compiled job, i.e. off the main thread. For example:
    Code (CSharp):
    1. [BurstCompile]
    2. struct MemClearJob<T> : IJob where T : unmanaged
    3. {
    4.     public NativeArray<T> array;
    5.  
    6.     public unsafe void Execute()
    7.     {
    8.         UnsafeUtility.MemClear(array.GetUnsafePtr(), (long)array.Length * sizeof(T));
    9.     }
    10. }
    Do you mean that it freezes the main thread when you call Complete(), since the jobs are not done yet? Parallel jobs may help a bit, but memory speed will likely be your bottleneck.

    If you can't give more time for memory to clear, then I think option 4 would be next best.
     
    Last edited: Feb 11, 2023
  5. kdchabuk

    kdchabuk

    Joined:
    Feb 7, 2019
    Posts:
    47
    Since I don't know how you are accumulating these values, here's a hypothetical "signal" mechanism. I'll just assume your array elements are each a sum, counting up, as a uint and I'll assume that you only need 31 of the 32 bits. You can use that last bit as a flag for stale data using bitwise operations (&, <<, >>).

    Then you can encode
    tick & 1
    into the most-significant bit and send the frame number into each relevant job. If
    tick & 1
    doesn't match that bit, then you know it is old data (from the previous frame) and should be reset to 0 and also set that bit to the current
    tick & 1
    (so other computations don't reset it again in the same frame). When you're done and want your sum, you'll use bitwise-and with the mask
    0x7fffffff
    to ignore that bit.

    Here's some code to get across the idea:
    Code (CSharp):
    1. public uint ResetIfStale(uint value, uint tick)
    2. {
    3.   if ((tick & 1) != (value >> 31))
    4.     value = (tick & 1) << 31;
    5.   return value;
    6. }
    Burst can inline this function and it doesn't require extra memory reads so the performance cost would be very small.
     
  6. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Sorry, I should have been more clear. I assumed that UnsafeUtility.MemClear could not be used as a job and was using it on the main thread. Otherwise, the problems I outlined would not apply as I'd treat them like jobs and not call Complete() until the next frame (the jobs would execute while the game is rendering and other main thread code is running).
    If it is safe to use in a job thread, then the problem is resolved and I have my solution. I'll try implementing it and let you know!
     
  7. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Something along these lines is what I was hinting at. I'd have to add special handling for the first read operation (which take the form of
    values[i] += x
    ) and make sure that I move that special handling around if the job pipelines change - OR - make each read-write operation perform this check.

    I discovered after developing other parts of the job pipeline that I could assign 0 to the elements when they are used for the last time for a given tick, which is a second possibility.
     
    Last edited: Feb 15, 2023
  8. vectorized-runner

    vectorized-runner

    Joined:
    Jan 22, 2018
    Posts:
    383
    Why do you need to clear the memory though, when you can just use array length for accumulation
     
  9. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    If I am interpreting you correctly, you are suggesting that when I want to accumulate values, instead of adding to an element I should increment an index and then set the value at that index, then sum all of the array values at the end, and setting the index to 0 at the start?

    This isn't a practical solution in my case. There are 1000s of memory locations where values are summed, and implementing this approach would mean multiplying that memory footprint by the number of values being summed per memory location. I also believe it is more efficient to add to an existing memory location than to add multiple memory locations together.
     
  10. SF_FrankvHoof

    SF_FrankvHoof

    Joined:
    Apr 1, 2022
    Posts:
    780
    It is, as long as there are no race-conditions on that chunk of memory (the memory is 'free' to write to at that point in time).

    Edit: a [ClearOnJobCompletion]-attribute might be nice for the Devs to create though (like [DeallocateOnJobCompletion], but only clearing the RAM to default(T)).
     
    Zergling103 likes this.
  11. Zergling103

    Zergling103

    Joined:
    Aug 16, 2011
    Posts:
    392
    Yes of course, though this is a general rule and isn't special for this case.