Search Unity

C# Jobs: Loosening Safety Checks

Discussion in 'Data Oriented Technology Stack' started by james7132, Feb 23, 2018.

  1. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    Knowing full well this is unsafe, working directly with pointers is much faster than with the outward facing Native* APIs. For example, the seemingly trivial job of adding a constant value to all values in a NativeArray can be rewritten as follows:

    Code (CSharp):
    1. public struct AddValueJob : IJob {
    2.   public int Start;
    3.   public int Count;
    4.   public float Value;
    5.   public NativeArray<float> ValueArray;
    6.  
    7.   public unsafe void Execute() {
    8.     var vPtr = (float*)(ValueArray.GetUnsafePtr()) + Start;
    9.     float* end = vPtr + Count;
    10.     while (vPtr < end) {
    11.       *vPtr++ += Value;
    12.     }
    13.   }
    14. }
    This change should remove the overhead of calling the indexer on the NativeArray, performing bounds checking, memcpying the value out of the array, the extra iteration overhead of needing to compute the Base + Offset of every element in the range, and read/write checks on every access to the array. By all means, this is about as unsafe as it gets, but it definitely is much faster. I've seen a near order of magnitude speedup in certain jobs built like this. Safety checks like bounds checking can be broadly applied outside of the loop or even within the constructor of the Job to offset the lack of safety in this approach.

    Right now IJobParallelFor takes a single index on Execute, which is called once on each value per batch, which doesn't allow us to do something like this over the range of the batch.

    I would like to be able to parallelize jobs like this more easily by having an alternative IJobParallelFor that is provided a start/count or start/end pair and is only called once per batch. Or alternatively, can someone clarify if the future Burst compiler optimizations will produce similar IL (i.e. inlining the Execute function and removing iteration overhead).

    I see there's a Unity.Jobs.LowLevel.Unsafe namespace, and it seems to be able to implement our own custom batch jobs, but there doesn't seem to be adequate documentation on how to use any of it at this current time.

    ------------------------------------------------------

    I've also been looking at creating a NativeStack, a fixed capacity Stack built on a unmanaged array of elements, and it's implementation hinges on an atomic counter built using a unmanaged pointer to a single integer, and static methods from the Interlocked class: https://msdn.microsoft.com/en-us/library/system.threading.interlocked_methods(v=vs.110).aspx.

    However, each time I aim to launch a job including one, it states that it uses unsafe pointers and thus cannot be used in Jobs. I see the NativeContainer example in the documentation uses IntPtr, which is also what I am using as references to both the array and the counter. Is this an expected output? Here's a link to the source code for the counter: https://github.com/james7132/Danmak.../Assets/DanmakU/Runtime/Core/AtomicCounter.cs. It has a few bugs in it (just noticed I'm not using the right align for it's allocation), is it because the struct itself is marked unsafe?
     
    Last edited: Feb 23, 2018
    tarahugger likes this.
  2. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,671
  3. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    I do agree that this is indeed something that shouldn't be done in the common case; however, that makes me that much more curious as to what optimizations are made for Burst or IL2CPP that would produce code of similar or better performance than this.

    On the other note: is there any more clarification on the second note regarding implementing our own NativeContainers?
    Is the error simply from marking the struct itself as unsafe?
     
  4. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,671
    Creating custom containers like NativeStack etc is fine. I just recommend to have complete unit test coverage for any code like that. The containers are usually implemented with unsafe code, they are responsible to provide the job safety gurantees so its important to actually test all cases where behaviour is as expected and also have test coverage against incorrect usage throwing the expected exceptions.
     
  5. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    https://docs.unity3d.com/2018.1/Doc...LowLevel.Unsafe.NativeContainerAttribute.html

    Following the example code above, I'm getting an error saying "InvalidOperationException: DestroyDanmaku.ActiveCount.m_DisposeSentinel is not a value type. Job structs may not contain any reference types."

    I can see why this normally is an issue, but I'm not sure how to get around this issue: you need a DisposeSentinel reference in Dispose to properly check the disposal status as seen in the example. Is there any special exception case that needs to be met for this to be ignored, or has the methodology changed since that example was written?
     
  6. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,671
    james7132, Krajca and recursive like this.
  7. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    This is amazing! Thanks for providing this information ahead of time!
     
  8. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    For those interested, since the above Gist doesn't fully give a complete in-code example, I was able to get the batched version of IJobParallelFor (what I'm calling IJobBatchedFor) working: https://github.com/james7132/DanmakU/blob/develop/Assets/DanmakU/Runtime/IJobBatchedFor.cs.

    I did indeed see a notable speed up (2-4x) from removing the overhead of allocating stack frames from multiple calls into Execute. There's likely a negligible gain if the Job system could merge overlapping ranges or predicatively adjust the ranges (in multiples of the batch size) based on previous executions of the same job.
     
    Last edited: Feb 26, 2018
  9. OswaldHurlem

    OswaldHurlem

    Joined:
    Jan 6, 2017
    Posts:
    40
    I'm curious what your intention is in creating a NativeStack which is accessible by multiple threads?
     
  10. james7132

    james7132

    Joined:
    Mar 6, 2015
    Posts:
    110
    I'm writing a pooled system for bullets in bullet hell games. The internal structure is as Structure of Arrays, where a bullet is indicated by it's index within the arrays, and each array corresponds to one field in the bullet. There's an active count of bullets: if the index of a bullet is less than the active index, it is alive. Destroying a bullet involves decremented the active count, and copying the last living bullet into the index of the bullet that was destroyed.

    For performance reasons, I queue up the to-be-destroyed bullets in a Queue<int> instead of iterating through and checking a "IsDestroyed" field. Before each update, flush the queue, destroying all bullets in one go. This was done entirely on the main thread.

    As I added bounds checking to keep bullets within a reasonable play area, I needed to do a O(n) search across a potentially large pool of bullets and destroy the ones that are out of bounds. A seemingly cheap operation, but it's in general faster if done as a IJobParallelFor from my testing. This poses the issue that Queue is both inaccessible from Jobs and also not thread-safe.

    I then turned the destruction of bullets into a job that does the aforementioned linear destruction check. As it shuffles values around, it's not safe to do entirely from a parallelized job, so it runs as a one off job. This was expensive, but not as expensive as running the bounds checking on a single thread. To shrink the size of this job, I need a stack or queue that I could safely build up from multiple threads.

    I certainly could have waited for the Unity ECS and the NativeQueue implementation they have, as it definitely could resolve my issues, but where's the fun in that? I know the maximum count of bullets that could be pushed into the stack/queue, so a fixed capacity version is fine by me. A NativeQueue built on a circular buffer is more difficult to build in a lock-free way as it needs to track both a start and end pointer, whereas a NativeStack built on fixed size buffer and an atomically updated end pointer just needs to track one. Hence why I wanted to build a NativeStack. Thus far, I've mostly succeeded... in crashing Unity or breaking my existing code. As previously said, a full test suite for hitting all the possible edge cases is highly desirable, and is what I'm working on now.

    This has further uses as I currently do a linear pass to check if any bullets need to do collision checking: a parallel job to check if a bullet's AABB intersects known colliders + a linear pass on the main thread to validate which ones to make Physics queries for (via *Cast calls). If I could queue up only the ones that need to make queries, that linear pass is no longer needed. I heard that physics queries within jobs may also be available at a later beta, so this entire setup may be even further reduced depending on its implementation.
     
    Last edited: Feb 27, 2018
    recursive and OswaldHurlem like this.
  11. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,356
    Probably a simpler way to handle the out of bounds checking would be a spatial hash. Or just generally some op that checks if the bullet is out of bounds when you move it. You are already working on the object to move it, so the extra work to hash/distance check whatever is going to be cheap compared to searching later.