Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question NativeArray of ParallelWriters?

Discussion in 'Entity Component System' started by BogdanM, Aug 17, 2022.

  1. BogdanM

    BogdanM

    Joined:
    Sep 3, 2017
    Posts:
    8
    I am writing a rendering system using the Entities Library and one of my jobs is set to partition all entities that need rendering into specific queues (based on the renderer type and LoD). Many entities get culled, so these queues are needed as I have a dynamic number of entities that need drawing. Once the queues are ready I use .ToArray() to set a ComputeBuffer and call Graphics.DrawMeshInstancedIndirect. (it's very fast)

    Now my issue is having to send an unknown number of writers to the parallel Job. Sending one NativeQueue<T>.ParallelWriter is fine, but cannot create a NativeArray<ParallelWriter> as it's already marked as a NativeContainer. The solution I use now is a WritersWrapper struct that contains 20 parallel writers and has a this[index] accessor, essentially faking an array... but this is very cumbersome to keep updated as I need to adjust the code based on the data volume.

    What can be a solution for a dynamic number of ParallelWriters?

    Note1: I tried to use NativeParallelMultiHashMap to push my data at certain indices... but extracting back the data with a 'while' on the Enumerator is extremely slow. (there is no efficient GetValuesArrayForKey)

    Note2: If I remember correctly the Nordeus/Unite Austin Demo was using some unsafe code to handle pointers for this kind of Enqueuing... but that was in 2017, I'm hoping there is a cleaner solution now.
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Normally I would recommend NativeStream or UnsafeStream, but both have a pitfall in that they won't scale well as the number of different renderer and LOD combinations increase.

    NativeParallelMultiHashMap has TryGetFirstValue() and TryGetNextValue(). That's pretty fast in Burst.

    There's this which you can stuff inside a NativeArray: https://github.com/Dreaming381/Lati...re/Core/Containers/UnsafeParallelBlockList.cs

    But what I am really wondering is why aren't you using the Hybrid Renderer? I know there are reasons, like bad Android performance, but DMII has the same problem.
     
  3. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,131
    I used to used entities (key) with a buffer (values) on them. This was back in 2019 or so, when NMHM had performance issues (clearing and iterating values).

    it should still be a sound solution as you can just reinterpret the buffers .asnativearray
     
    BogdanM likes this.
  4. BogdanM

    BogdanM

    Joined:
    Sep 3, 2017
    Posts:
    8
    Just by looking at the Streams documentation I don't see how it's more useful as I'd need multiple Writers for those Streams as well, in order to segregate the data properly.

    Regarding the NPMHM, that is the approach I took with TryGetNextValue() and it went from 0.3ms to 3ms from the NativeQueue.ToArray() approach, so that's why I consider it slow.

    Regarding the Hybrid Renderer, I am using a high volume of entities that I keep extremely packed, like 650-ish chunk capacity, and I can't afford to have LocalToWorld components or other 'general purpose' components. On top of that I render with a custom shader that handles the animation of the mesh (pretty much based on the Nordeus Presentation).

    The Custom Native Containers do seem promising, so thank you for that. One must handle the unsafe parts of memory management, but it might just be the only approach for such custom needs.
     
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    Completely reasonable. But just as a heads up, in a future version of Entities, chunks will be capped with a max capacity of 128 entities.
     
  6. BogdanM

    BogdanM

    Joined:
    Sep 3, 2017
    Posts:
    8
    Could you point me to some documentation on this? It sounds very strange that they would limit optimization on this front. (unless the chunk will be resized down to reduce waste)
     
  7. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,983
    It was stated somewhere here on the forums. It is to simplify the implementation of "Enabled Components" and make them more performant. In practice, few people have entities small enough to pack chunks with more than 128 entities.
     
    BogdanM likes this.
  8. BogdanM

    BogdanM

    Joined:
    Sep 3, 2017
    Posts:
    8
    Coming back to this in case someone runs into the same issue.
    I did find a very performant solution in the end, using a custom Native Container but it needed some changes to the API.

    Essentially the Custom Native Container I made, simply holds a pointer to the first element in an Array of Parallel Writers, and when a thread wants to Enqueue, it has to also provide the index of the Queue (ParallelWriter) it wants to Enqueue. This did not work right away, as the Array of Writers was still allocated in the main thread and the ForEach was passing to the worker threads only my pointer wrapper container, which lead to race conditions.

    The Solution was to pass the Thread Index (which is injected in the Container Object using the [NativeSetThreadIndex] Attribute) to the actual ParallelWriter in the Array when calling Enqueue, but this is not currently supported in the Collections Package.

    Code (CSharp):
    1.     /// <summary>
    2.     /// Wrapper for an Array of CustomNativeQueue Parallel Writers. Array (de)allocation must be managed outside of this struct.
    3.     /// </summary>
    4.     /// <typeparam name="T"></typeparam>
    5.     [StructLayout(LayoutKind.Sequential)]
    6.     [NativeContainer]
    7.     [BurstCompatible(GenericTypeArguments = new[] { typeof(int) })]
    8.     unsafe struct ParallelWriterArray<T> where T : struct
    9.     {
    10.         [NativeDisableUnsafePtrRestriction]
    11.         CustomNativeQueue<T>.ParallelWriter* m_Start;
    12.         int m_Count;
    13.  
    14.         [NativeSetThreadIndex]
    15.         int m_ThreadIndex;
    16.  
    17. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    18.         AtomicSafetyHandle m_Safety;
    19.         [NativeSetClassTypeToNullOnSchedule]
    20.         DisposeSentinel m_DisposeSentinel;
    21. #endif
    22.  
    23.         public ParallelWriterArray(CustomNativeQueue<T>.ParallelWriter[] array)
    24.         {
    25.             m_Start = null;
    26.             m_Count = 0;
    27.             m_ThreadIndex = 0;
    28.  
    29. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    30.             DisposeSentinel.Create(out m_Safety, out m_DisposeSentinel, 0, Allocator.Persistent);
    31. #endif
    32.  
    33.             if (array != null)
    34.             {
    35.                 m_Start = (CustomNativeQueue<T>.ParallelWriter*)UnsafeUtility.AddressOf(ref array[0]);
    36.                 m_Count = array.Length;
    37.             }
    38.         }
    39.  
    40.         public void Dispose()
    41.         {
    42. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    43.             DisposeSentinel.Dispose(ref m_Safety, ref m_DisposeSentinel);
    44. #endif
    45.             m_Start = null;
    46.             m_Count = 0;
    47.         }
    48.  
    49.         public void EnqueueAtIndex(int index, T element)
    50.         {
    51.             if (m_Start != null && index < m_Count)
    52.             {
    53. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    54.                 AtomicSafetyHandle.CheckReadAndThrow(m_Safety);
    55. #endif
    56.  
    57.                 UnsafeUtility.ArrayElementAsRef<CustomNativeQueue<T>.ParallelWriter>(m_Start, index).Enqueue(element, m_ThreadIndex);
    58.             }
    59.             else
    60.             {
    61.                 throw new IndexOutOfRangeException("ParallelWriterArray: Trying to access inexistent ParallelWriter");
    62.             }
    63.         }
    64.     }

    CustomNativeQueue is a copy-paste of Unity.Collections.NativeQueue with an added overload to the ParallelWriter Enqueue

    Code (CSharp):
    1.             public void Enqueue(T value)
    2.             {
    3. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    4.                 AtomicSafetyHandle.CheckWriteAndThrow(m_Safety);
    5. #endif
    6.                 NativeQueueBlockHeader* writeBlock = NativeQueueData.AllocateWriteBlockMT<T>(m_Buffer, m_QueuePool, m_ThreadIndex);
    7.                 UnsafeUtility.WriteArrayElement(writeBlock + 1, writeBlock->m_NumItems, value);
    8.                 ++writeBlock->m_NumItems;
    9.             }
    10.  
    11.             public void Enqueue(T value, int threadIndex)
    12.             {
    13. #if ENABLE_UNITY_COLLECTIONS_CHECKS
    14.                 AtomicSafetyHandle.CheckWriteAndThrow(m_Safety);
    15. #endif
    16.                 NativeQueueBlockHeader* writeBlock = NativeQueueData.AllocateWriteBlockMT<T>(m_Buffer, m_QueuePool, threadIndex);
    17.                 UnsafeUtility.WriteArrayElement(writeBlock + 1, writeBlock->m_NumItems, value);
    18.                 ++writeBlock->m_NumItems;
    19.             }
    Notice the only difference is that the thread index is passed as a parameter instead of using the member field.
     
    Last edited: Sep 29, 2022