Search Unity

ECS and Voxel engine

Discussion in 'Entity Component System' started by RemiArturowicz, Feb 8, 2019.

  1. illinar

    illinar

    Joined:
    Apr 6, 2011
    Posts:
    863
    VERY interesting results here...

    No chunks, straight up NativeArray<int>, converting 3D index to 1D and reading an int 1 million times: 1.7ms.

    HashMap 1000,000 capacity 100,000 int64 also converted keys contained, 1 million reads: 2ms

    In a job.

    Almost no difference. Imagine with chunks. If you have chunks the HashMap is faster. So I'm going for hashmap as the easiest, simplest, and potentially fastest.

    EDIT: Actually HashMap performance will vary a lot based on it's size. The bigger it is the slower it gets. And some other factors are in play. I can't reproduce 2ms result in a new test, the best I get is 4-8ms.
     
    Last edited: Dec 28, 2019
    NotaNaN likes this.
  2. illinar

    illinar

    Joined:
    Apr 6, 2011
    Posts:
    863
    To put it into perspective though (!) accessing managed array 1 million times takes 31ms on my budget PC. So pretty impressive what HashMap and NativeArray "can do" in a job.

    So with Burst even this monstrosity is as good as your standard "array[ i ]" :
    VoxelBuffers[ChunkBuffers[ChunkBufferEntity][GetChunkIndex(chunkPosition)].ChunkEntity][GetVoxelIndex(voxelPosition)].Value;


    Would you like a ridiculous number? Without burst 2ms on Native array become 130ms, so all credits go to Burst.

    EDIT: can't make NativeHashMap that fast again. Native array is the fastest now. The idea is to have just one array with a world offset and move data in it when new chunks are loaded. Should be easy in my case. I will iterate over objects and will move their voxels. So I don't need to read and move all the "air" voxels. This can be split across many frames, so should be smooth.
     
    Last edited: Dec 28, 2019
    andywatts, Sarkahn and NotaNaN like this.
  3. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    My idea turned out to be a bust more or less. I created a small test just to see if it would work and how fast. The idea was to take a massive list of block operations that are at arbitrary world positions and divide them up by chunk index.

    I started with a million of these:

    Code (CSharp):
    1.  
    2. [System.Serializable]
    3. public struct ChunkOperation
    4. {
    5.     public int3 Pos;
    6.     public int Value;
    7. }
    8.  

    They were initialized to completely random world positions within a range of 10x10x10 chunks. I passed them into a processing job:

    Code (CSharp):
    1.         var workWriter = _workMap.AsParallelWriter();
    2.  
    3.         Entities
    4.             .WithoutBurst()
    5.             .ForEach((in DynamicBuffer<ChunkWork> workBuffer) =>
    6.             {
    7.                 inputDeps = new DivideWork
    8.                 {
    9.                     ChunkSize = ChunkSize,
    10.                     Operations = workBuffer.Reinterpret<ChunkOperation>().AsNativeArray(),
    11.                     WorkWriter = workWriter
    12.                 }.Schedule(workBuffer.Length, 256, inputDeps);
    13.             }).Run();
    The job calculates each operation's chunk index and stuffs it into a NativeMultiHashMap:

    Code (CSharp):
    1.     [BurstCompile]
    2.     struct DivideWork : IJobParallelFor
    3.     {
    4.         [WriteOnly]
    5.         public NativeMultiHashMap<int3, ChunkOperation>.ParallelWriter WorkWriter;
    6.  
    7.         [ReadOnly]
    8.         public NativeArray<ChunkOperation> Operations;
    9.  
    10.         public int3 ChunkSize;
    11.  
    12.         public void Execute(int index)
    13.         {
    14.             var work = Operations[index];
    15.             var worldPos = work.Pos;
    16.  
    17.             var chunkIndex = (int3)math.floor(worldPos / ChunkSize);
    18.             // Convert world position to local chunk position
    19.             work.Pos = (worldPos % ChunkSize + ChunkSize) % ChunkSize;
    20.  
    21.             WorkWriter.Add(chunkIndex, work);
    22.         }
    23.     }
    The idea being from there I can easily write the sorted operations to their respective chunks in parallel...but just writing to the map already takes ~12ms. Just to make sure it was actually the parallel write to the NMHM slowing it down I tried doing the same thing but pushing the results to a NativeArray and it goes down to ~1.5ms.

    Kind of a bummer that NMHM is so slow for this purpose, but it seems the idea of taking arbitrary limitless 3d positions and sorting them nicely into buckets is not as simple as I was hoping - who'd have thought! Unfortunately the math required to avoid using the NMHM is beyond me so I'll have to rethink it...or just say screw it and settle for always writing to chunks in sequence.
     
    Last edited: Dec 30, 2019
    Antypodish, NotaNaN and illinar like this.
  4. illinar

    illinar

    Joined:
    Apr 6, 2011
    Posts:
    863
    You can also try 1D long key instead of int3. Get hash on int3 should be the bottleneck when used in NHM.

    It is an interesting idea though. Intuitively I doubt that sorting overhead is smaller than random access or cache miss overhead, but my intuition is wrong about DOTS all the time.

    So if NMHM is still slow with long keys (or local int keys), you can still convert all your operations to 1D indexes and sorting them in a native array becomes very easy in theory. Native list rather. The bummer is that you cant have a native array of native lists.

    Otherwise you'd have a bunch of lists
    NativeArray[NativeList[0-1000], NativeList[1000-2000], etc];
    Then you'd convert voxel position to the 1D index, and you'd divide it by 1000 and you'd know which list to put it into.

    buckets[voxelOperation.voxelIndex/1000].Add[voxelOperation];


    Then you'd go through buckets in a parallel job. I'm pretty sure thought that all these extra write operations will be a much bigger overhead than reading and writing voxel data directly randomly with cache misses.
     
    Last edited: Dec 30, 2019
    NotaNaN and Sarkahn like this.
  5. Sarkahn

    Sarkahn

    Joined:
    Jan 9, 2013
    Posts:
    440
    Well for the sake of completeness I wrote the second half of the "DivideWork" method and wrote a linear version for comparison. Both cases are operating in an ideal scenario where all the chunks have already been initialized so they can just do their work.

    Linear version:

    Code (CSharp):
    1.     [BurstCompile]
    2.     struct PerformWork : IJob
    3.     {
    4.         [ReadOnly]
    5.         public NativeArray<ChunkOperation> Operations;
    6.  
    7.         [ReadOnly]
    8.         public NativeHashMap<int3, Entity> ChunkMap;
    9.  
    10.         [WriteOnly]
    11.         public BufferFromEntity<ChunkData> ChunkDataFromEntity;
    12.  
    13.         public int3 ChunkSize;
    14.  
    15.         public void Execute()
    16.         {
    17.             for( int i = 0; i < Operations.Length; ++i )
    18.             {
    19.                 var work = Operations[i];
    20.                 var worldPos = work.Pos;
    21.  
    22.                 var chunkIndex = (int3)math.floor(worldPos / ChunkSize);
    23.                 // Convert world position to local chunk position
    24.                 int3 p = (worldPos % ChunkSize + ChunkSize) % ChunkSize;
    25.                 int localIndex = p.x + ChunkSize.x * (p.y + ChunkSize.y * p.z);
    26.  
    27.                 var chunkData = ChunkDataFromEntity[ChunkMap[chunkIndex]];
    28.                 chunkData[localIndex] = work.Value;
    29.             }
    30.         }
    31.     }
    Parallel version:

    Code (CSharp):
    1.  
    2.     [BurstCompile]
    3.     struct DivideWork : IJobParallelFor
    4.     {
    5.         [WriteOnly]
    6.         public NativeMultiHashMap<int3, ChunkOperation>.ParallelWriter WorkWriter;
    7.  
    8.         [ReadOnly]
    9.         public NativeArray<ChunkOperation> Operations;
    10.  
    11.         public int3 ChunkSize;
    12.  
    13.         public void Execute(int index)
    14.         {
    15.             var work = Operations[index];
    16.             var worldPos = work.Pos;
    17.        
    18.             var chunkIndex = (int3)math.floor(worldPos / ChunkSize);
    19.             // Convert world position to local chunk position
    20.             work.Pos = (worldPos % ChunkSize + ChunkSize) % ChunkSize;
    21.  
    22.             WorkWriter.Add(chunkIndex, work);
    23.         }
    24.     }
    25.  
    26.     [BurstCompile]
    27.     struct PerformWorkParallel : IJobNativeMultiHashMapVisitKeyValue<int3, ChunkOperation>
    28.     {
    29.         [NativeDisableParallelForRestriction]
    30.         public BufferFromEntity<ChunkData> ChunkDataFromEntity;
    31.  
    32.         [ReadOnly]
    33.         public NativeHashMap<int3, Entity> ChunkMap;
    34.  
    35.         public int3 ChunkSize;
    36.  
    37.         public void ExecuteNext(int3 key, ChunkOperation work)
    38.         {
    39.             var p = work.Pos;
    40.  
    41.             int localIndex = p.x + ChunkSize.x * (p.y + ChunkSize.y * p.z);
    42.             var chunkData = ChunkDataFromEntity[ChunkMap[key]];
    43.             chunkData[localIndex] = work.Value;
    44.         }
    45.     }
    Complete source of my tests:
    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using Unity.Burst;
    4. using Unity.Collections;
    5. using Unity.Collections.LowLevel.Unsafe;
    6. using Unity.Entities;
    7. using Unity.Jobs;
    8. using Unity.Mathematics;
    9. using UnityEngine;
    10. using UnityEngine.Profiling;
    11. using Random = UnityEngine.Random;
    12.  
    13.  
    14. [System.Serializable]
    15. public struct ChunkData : IBufferElementData
    16. {
    17.     public int Value;
    18.     public static implicit operator int(ChunkData c) => c.Value;
    19.     public static implicit operator ChunkData(int v) => new ChunkData { Value = v };
    20. }
    21.  
    22. [System.Serializable]
    23. public struct ChunkWork : IBufferElementData
    24. {
    25.     public ChunkOperation Value;
    26.     public static implicit operator ChunkOperation(ChunkWork c) => c.Value;
    27.     public static implicit operator ChunkWork(ChunkOperation v) => new ChunkWork { Value = v };
    28. }
    29.  
    30. [System.Serializable]
    31. public struct ChunkOperation
    32. {
    33.     public int3 Pos;
    34.     public int Value;
    35. }
    36.  
    37. public class ProcessHugeBuffersTest : JobComponentSystem
    38. {
    39.     const int ChunkSizeX = 16;
    40.     const int ChunkSizeY = 16;
    41.     const int ChunkSizeZ = 16;
    42.     static int3 ChunkSize => new int3(ChunkSizeX, ChunkSizeY, ChunkSizeZ);
    43.  
    44.     const int ChunkCountX = 10;
    45.     const int ChunkCountY = 10;
    46.     const int ChunkCountZ = 10;
    47.  
    48.  
    49.     NativeHashMap<int3, Entity> _chunkMap;
    50.     NativeMultiHashMap<int3, ChunkOperation> _workMap;
    51.  
    52.     void CreateWork()
    53.     {
    54.         _workMap = new NativeMultiHashMap<int3, ChunkOperation>(WorkCount, Allocator.Persistent);
    55.  
    56.         int maxX = ChunkSizeX * ChunkCountX;
    57.         int maxY = ChunkSizeY * ChunkCountY;
    58.         int maxZ = ChunkSizeZ * ChunkCountZ;
    59.  
    60.         var workEntity = EntityManager.CreateEntity(typeof(ChunkWork));
    61.         //EntityManager.SetName(workEntity, "Chunk Work");
    62.         var workBuffer = EntityManager.GetBuffer<ChunkWork>(workEntity);
    63.         workBuffer.ResizeUninitialized(WorkCount);
    64.         for (int i = 0; i < WorkCount; ++i)
    65.         {
    66.             var operation = new ChunkOperation
    67.             {
    68.                 Value = 5,
    69.                 Pos = new int3(Random.Range(0, maxX - 1), Random.Range(0, maxY - 1), Random.Range(0, maxZ - 1)),
    70.             };
    71.  
    72.             workBuffer[i] = operation;
    73.         }
    74.     }
    75.  
    76.     void CreateChunks()
    77.     {
    78.         _chunkMap = new NativeHashMap<int3, Entity>(ChunkCountX * ChunkCountY * ChunkCountZ, Allocator.Persistent);
    79.  
    80.         for (int x = 0; x < ChunkCountX; ++x)
    81.             for (int y = 0; y < ChunkCountY; ++y)
    82.                 for (int z = 0; z < ChunkCountZ; ++z)
    83.                 {
    84.                     int3 index = new int3(x, y, z);
    85.                     var e = EntityManager.CreateEntity(typeof(ChunkData));
    86.                     //EntityManager.SetName(e, $"Chunk {index.ToString()}");
    87.                     var data = EntityManager.GetBuffer<ChunkData>(e);
    88.                     data.ResizeUninitialized(ChunkSizeX * ChunkSizeY * ChunkSizeZ);
    89.                     for (int i = 0; i < data.Length; ++i)
    90.                         data[i] = 0;
    91.  
    92.                     _chunkMap[index] = e;
    93.                 }
    94.     }
    95.  
    96.     protected override void OnCreate()
    97.     {
    98.         CreateChunks();
    99.         CreateWork();
    100.     }
    101.  
    102.     protected override void OnDestroy()
    103.     {
    104.         _chunkMap.Dispose();
    105.         _workMap.Dispose();
    106.     }
    107.  
    108.     [BurstCompile]
    109.     struct DivideWork : IJobParallelFor
    110.     {
    111.         [WriteOnly]
    112.         public NativeMultiHashMap<int3, ChunkOperation>.ParallelWriter WorkWriter;
    113.  
    114.         [ReadOnly]
    115.         public NativeArray<ChunkOperation> Operations;
    116.  
    117.         public int3 ChunkSize;
    118.  
    119.         public void Execute(int index)
    120.         {
    121.             var work = Operations[index];
    122.             var worldPos = work.Pos;
    123.          
    124.             var chunkIndex = (int3)math.floor(worldPos / ChunkSize);
    125.             // Convert world position to local chunk position
    126.             work.Pos = (worldPos % ChunkSize + ChunkSize) % ChunkSize;
    127.  
    128.             WorkWriter.Add(chunkIndex, work);
    129.         }
    130.     }
    131.  
    132.     [BurstCompile]
    133.     struct PerformWorkParallel : IJobNativeMultiHashMapVisitKeyValue<int3, ChunkOperation>
    134.     {
    135.         [NativeDisableParallelForRestriction]
    136.         public BufferFromEntity<ChunkData> ChunkDataFromEntity;
    137.  
    138.         [ReadOnly]
    139.         public NativeHashMap<int3, Entity> ChunkMap;
    140.  
    141.         public int3 ChunkSize;
    142.  
    143.         public void ExecuteNext(int3 key, ChunkOperation work)
    144.         {
    145.             var p = work.Pos;
    146.  
    147.             int localIndex = p.x + ChunkSize.x * (p.y + ChunkSize.y * p.z);
    148.             var chunkData = ChunkDataFromEntity[ChunkMap[key]];
    149.             chunkData[localIndex] = work.Value;
    150.         }
    151.     }
    152.  
    153.     [BurstCompile]
    154.     struct PerformWork : IJob
    155.     {
    156.         [ReadOnly]
    157.         public NativeArray<ChunkOperation> Operations;
    158.  
    159.         [ReadOnly]
    160.         public NativeHashMap<int3, Entity> ChunkMap;
    161.  
    162.         [WriteOnly]
    163.         public BufferFromEntity<ChunkData> ChunkDataFromEntity;
    164.  
    165.         public int3 ChunkSize;
    166.  
    167.         public void Execute()
    168.         {
    169.             for( int i = 0; i < Operations.Length; ++i )
    170.             {
    171.                 var work = Operations[i];
    172.                 var worldPos = work.Pos;
    173.  
    174.                 var chunkIndex = (int3)math.floor(worldPos / ChunkSize);
    175.                 // Convert world position to local chunk position
    176.                 int3 p = (worldPos % ChunkSize + ChunkSize) % ChunkSize;
    177.                 int localIndex = p.x + ChunkSize.x * (p.y + ChunkSize.y * p.z);
    178.  
    179.                 var chunkData = ChunkDataFromEntity[ChunkMap[chunkIndex]];
    180.                 chunkData[localIndex] = work.Value;
    181.             }
    182.         }
    183.     }
    184.  
    185.     const int WorkCount = 10000000;
    186.  
    187.     protected override JobHandle OnUpdate(JobHandle inputDeps)
    188.     {
    189.         inputDeps = DoWorkParallel(inputDeps);
    190.  
    191.         //inputDeps = DoWorkLinear(inputDeps);
    192.  
    193.         return inputDeps;
    194.     }
    195.  
    196.     JobHandle DoWorkParallel(JobHandle inputDeps)
    197.     {
    198.         var workMap = _workMap;
    199.         var workWriter = workMap.AsParallelWriter();
    200.         var chunkMap = _chunkMap;
    201.      
    202.         Entities
    203.             .WithoutBurst()
    204.             .ForEach((in DynamicBuffer<ChunkWork> workBuffer) =>
    205.             {
    206.                 inputDeps = new DivideWork
    207.                 {
    208.                     ChunkSize = ChunkSize,
    209.                     Operations = workBuffer.Reinterpret<ChunkOperation>().AsNativeArray(),
    210.                     WorkWriter = workWriter
    211.                 }.Schedule(workBuffer.Length, 1024, inputDeps);
    212.             }).Run();
    213.  
    214.         inputDeps = new PerformWorkParallel
    215.         {
    216.             ChunkDataFromEntity = GetBufferFromEntity<ChunkData>(false),
    217.             ChunkMap = chunkMap,
    218.             ChunkSize = ChunkSize
    219.         }.Schedule(_workMap, 512, inputDeps);
    220.  
    221.         inputDeps = Job.WithCode(() =>
    222.         {
    223.             workMap.Clear();
    224.         }).Schedule(inputDeps);
    225.  
    226.         return inputDeps;
    227.     }
    228.  
    229.     JobHandle DoWorkLinear(JobHandle inputDeps)
    230.     {
    231.         var chunkMap = _chunkMap;
    232.         Entities
    233.             .WithoutBurst()
    234.             .WithReadOnly(_chunkMap)
    235.             .ForEach((in DynamicBuffer<ChunkWork> workBuffer) =>
    236.             {
    237.                 inputDeps = new PerformWork
    238.                 {
    239.                     ChunkSize = ChunkSize,
    240.                     Operations = workBuffer.Reinterpret<ChunkOperation>().AsNativeArray(),
    241.                     ChunkDataFromEntity = GetBufferFromEntity<ChunkData>(false),
    242.                     ChunkMap = chunkMap,
    243.                 }.Schedule(inputDeps);
    244.             }).Run();
    245.  
    246.         return inputDeps;
    247.     }
    248.  
    249. }
    250.  

    The results are from the profiler attached to a standalone build.

    10000 (Ten thousand operations):
    Linear: ~2.3ms
    Parallel: Divide ~0.1ms, Work ~0.1ms, ~0.2ms total

    100000 (One hundred thousand Operations):
    Linear: ~4.2ms
    Parallel: Divide ~1.15, Work ~1.15, ~3ms total

    1000000 (One million operations):
    Linear: ~41ms
    Parallel: Divide ~12.5ms, Work ~40ms, ~52.5ms total

    10000000 (Ten million operations):
    Linear: ~408ms
    Parallel: Divide ~116ms, Work ~443ms, ~559ms total

    ...doh. Literally the opposite of what I was going for, the parallel version performs better with low operation counts and gets worse as it scales up.

    But how can I use a 1D key on my NHM? The chunk index is meant to be "infinite" on the x and z axis, so converting that to a 1D key (without collisions) in the hash map is exactly the problem I need the hash map itself to solve, right?
     
    Last edited: Dec 30, 2019
    NotaNaN likes this.
  6. illinar

    illinar

    Joined:
    Apr 6, 2011
    Posts:
    863
    1D index can work even with an infinite world if you take an area inside that world. Your conversion method will have to be aware of this area size and location. Sth like that:

    Code (CSharp):
    1.         public static int GetVoxelIndex(int3 pos)
    2.         {
    3.             pos += areaOffset; //Offset from world's 0;
    4.             return pos.x + areaSizeY * (pos.y + areaSizeZ * pos.z);
    5.         }

    If player get's close to the edge of that area you would need to move it and reindex voxels or readd them to hashmaps, but you have a lot of time so no performance impact, just some complexity.


    Could someone just benchmark this:

    Case #1: 1,000,000 sequential reads/writes from 1,000,000 item array. Indices taken from an array of precomputed sequential indices.
    Case #2: 1,000,000 random reads/writes from 1,000,000 item array. Indices taken from an array of precomputed random indices.

    From my benchmarks I learned that one should benchmark the base premise first. Like that Hashmap is much slower than NativeArray, or that cache misses are bad when reading randomly from a big array in this case.
     
    Last edited: Dec 31, 2019
    Sarkahn and NotaNaN like this.