How to solve the 2D or 3D sub array algorithm problem with DOTS?

Arowx · Oct 20, 2019

So you have an 2d or 3d array of data and your algorithm works on neighbouring chunks of data around the current position.

Ideally I would use a struct that contains an array of neighbours and populate it when I build the dataset.

In DOTS that array needs to be a NativeArray, which cannot contain a sub-array, how do you get around this problem?

Ideally DOTS would have 2D and 3D Native Arrays and allow the current array element to have access to a a sub-region around it that would fit within the cache.

Is there a way to do this with chunks, some way to specify what data is streamed into the cache and the order of that data in memory for optimum bandwidth?

DreamingImLatios · Oct 20, 2019

For something like this, I usually double buffer the full array and pass it to every job if I'm doing some kind of kernel operation. Otherwise, I will parallelize in columns, and then in rows in a dependent job. If this doesn't meet your needs, then you are going to have to elaborate more regarding your specific problem.

Arowx · Oct 21, 2019

DreamingImLatios said: ↑

For something like this, I usually double buffer the full array and pass it to every job if I'm doing some kind of kernel operation. Otherwise, I will parallelize in columns, and then in rows in a dependent job. If this doesn't meet your needs, then you are going to have to elaborate more regarding your specific problem.
Click to expand...

Imagine you need to work on a sub grid of 3x3 or greater tiles within a 2d array. The algorithm could be board game, procedural, roguelike or texture based. A set of neighbour tiles data are processed for the centre tile.

DreamingImLatios · Oct 21, 2019

That's a 3x3 kernel operation, so in that case I would go with the double buffer strategy. If you aren't sure what I mean by "kernel", it is a term I am borrowing from image processing.

Kmiecis · Oct 21, 2019

Well. There is always an option to create some extension to access 1d DynamicBuffer as 2d by standard accessing it as y * height + x when always knowing the height value. Same goes for 3d. Or you can take a look at, for example, NativeString64 struct and do some unsafe coding with fixed parameter to do your custom indexed struct. If you happem to know their length will be constant of course.

tarahugger · Oct 21, 2019

further to what Kmiecis mentioned.

Code (CSharp):

using Unity.Mathematics;

namespace Unity.Collections

{

public static class SpatialIndexingUtility

{

public static int3 Get3DIndices(int idx, int3 size)

{

int yzLength = size.y * size.z;

int x = idx / (yzLength);

idx -= (x * yzLength);

int y = idx / size.z;

int z = idx % size.z;

return new int3(x, y, z);

}

public static int GetIndex(int x, int y, int z, int3 size)

{

return x * (size.y * size.z) * size.z + z;

}

public static int GetIndex(int2 indices, int3 size)

{

return indices.x * (size.y * size.z) + 0 * size.z + indices.y;

}

public static int GetIndex(int3 indices, int3 size)

{

return indices.x * (size.y * size.z) + indices.y * size.z + indices.z;

}

}

}

or something like this.

Code (CSharp):

using System;

using Unity.Collections.LowLevel.Unsafe;

using Unity.Collections;

using Unity.Entities;

using Unity.Mathematics;

namespace Unity.Collections

{

public unsafe struct NativeArray3D<T> : IDisposable where T : struct

{

[NativeDisableUnsafePtrRestriction]

private void* _ptr;

private int _yzLength;

public int3 Length;

public int3 Extents;

[NativeDisableParallelForRestriction, NativeDisableContainerSafetyRestriction]

public NativeArray<T> Internal;

public NativeArray3D(int3 size, Allocator allocator) : this(size.x, size.y, size.z, allocator) { }

public NativeArray3D(int x, int y, int z, Allocator allocator) : this(x,y,z, new NativeArray<T>(x * y * z, allocator)) { }

public NativeArray3D(int x, int y, int z, DynamicBuffer<T> buffer) : this(x, y, z, buffer.AsNativeArray()) { }

public NativeArray3D(int3 size, DynamicBuffer<T> buffer) : this(size.x, size.y, size.z, buffer.AsNativeArray()) { }

public NativeArray3D(int3 size, NativeArray<T> arr) : this(size.x, size.y, size.z, arr) { }

public NativeArray3D(int3 size, void* ptr, Allocator allocator) : this(size.x, size.y, size.z, ConvertExistingDataToNativeArray(size, ptr, allocator)) { }

private static NativeArray<T> ConvertExistingDataToNativeArray(int3 size, void* ptr, Allocator allocator)

{

return NativeArrayUnsafeUtility.ConvertExistingDataToNativeArray<T>(ptr, size.x * size.y * size.z, allocator);

}

public NativeArray3D(int x, int y, int z, NativeArray<T> arr)

{

_ptr = arr.GetUnsafePtr();

_yzLength = y * z;

Internal = arr;

Length = new int3(x, y, z);

Extents = Length / 2;

}

public ref T this[int i]

=> ref UnsafeUtilityEx.ArrayElementAsRef<T>(_ptr, i);

public ref T this[int x, int y, int z]

=> ref UnsafeUtilityEx.ArrayElementAsRef<T>(_ptr, GetIndex(x, y, z));

public ref T this[int2 indices]

=> ref UnsafeUtilityEx.ArrayElementAsRef<T>(_ptr, GetIndex(indices));

public ref T this[int3 indices]

=> ref UnsafeUtilityEx.ArrayElementAsRef<T>(_ptr, GetIndex(indices));

public ref T this[float3 indices, int boxSize]

=> ref UnsafeUtilityEx.ArrayElementAsRef<T>(_ptr, GetIndex(indices, boxSize));

public int GetIndex(float3 indices, int boxSize) => GetIndex((int3)math.round((indices - (boxSize / 2f)) / boxSize));

public int GetIndex(int x, int y, int z) => x * _yzLength + y * Length.z + z;

public int GetIndex(int2 indices) => indices.x * _yzLength + 0 * Length.z + indices.y;

public int GetIndex(int3 indices) => indices.x * _yzLength + indices.y * Length.z + indices.z;

public int3 GetIndices(int idx)

{

int x = idx / (_yzLength);

idx -= (x * _yzLength);

int y = idx / Length.z;

int z = idx % Length.z;

return new int3(x, y, z);

}

public void Dispose()

{

Internal.Dispose();

}

}

}

Regarding your actual question about chunks and cache efficiency - i'd love to know how this approach plays compared to some of the other ideas mentioned.

I am storing the internal 1D inside a DynamicBuffer, which can then be used within jobs by pulling it out and converting it appropriately - since you only have the pointer you still need to know the size but can pass that into the job separately or store it in another component. In terms of efficiency its probably in the same boat as DynamicBuffers/NativeArrays as far as cache behavior.

You could also consider storing it as a ChunkComponent and share it across all the entities within a chunk.

starikcetin · Oct 21, 2019

@tarahugger Put this on github please. It is valuable.

Razmot · Oct 21, 2019

that's very practical for prototyping and non critical code, but it can be detrimental to performance in loops:

- if you do a classic for (x) { for (y) { for (z) } } you end up recalculating x for each y and recalculating x and y for each z.

- alternatively, you can use a X slice and then iterate on Y and Z only . You really need to think about how the 1D array is organised for fast iteration dependent on your needs.

- and another trick :
public static readonly int3 IDX_MUL = int3(1, SIZE, SIZE * SIZE);
int3 v = whatever;
int idx = csum(v * IDX_MUL); //csum is component sum, so v.x+v.y+v.z , and it's a burst intrinsic operation

-last one :
[MethodImpl(MethodImplOptions.AggressiveInlining)] can have a huge effect on burst performance

Arowx · Oct 22, 2019

OK trying this with just a simple 1D NativeArray, and the problem is the job size. It works fine until you hit a job size boundary e.g. you set your job size to 1024 then you can find i+1 up to 1023 then it fails as another job will have the range 1024.

The error message mentions double buffering but no link to documentation on how to do this. It sounds like the solution is to duplicate data multiple times into offset read buffers that align to the current index range, is that right?

So for a 2D range where you check four neighbours you will need 4 offset copies of the dataset. Is there not some mechanism for ensuring the range of data you need is available from an array as per my original question?

tertle · Oct 22, 2019

Double buffering is a very common technique. Basically you just duplicate your array. Google it but TLDR: have 1 array you read from then another array you write to, this way you avoid corrupting your read data.

Arowx · Oct 22, 2019

tertle said: ↑

Double buffering is a very common technique. Basically you just duplicate your array. Google it but TLDR: have 1 array you read from then another array you write to, this way you avoid corrupting your read data.
Click to expand...

By breaking down my program into two jobs I was able to make the the data sets read only this enabled the data to be accessed across job boundaries.

Job1 reads A and updates B and C
Job2 reads B and C and updates A

Nyanpas · Oct 24, 2019

I have used 1D-arrays to get surrounding tiles by knowing the offsets from the width/height/depth of the area and using that to calculate which tiles are neighbouring. I don't know if it is a heretical solution but it works in a single for-loop at least.

Search Unity

Unity ID

Useful Searches

How to solve the 2D or 3D sub array algorithm problem with DOTS?