VoxelEngine in ECS slower than OOP

WolvBoar · Aug 20, 2019

Hello. I'm an author of this post - https://forum.unity.com/threads/dynamicbuffer-is-awfully-slow.728663/ - different user because currently I'm writing from my company. If you would take a look at linked thread you would see that I had problems with dynamic buffers performance.
The scenario looked like this:
- Entity World - Dynamic Buffer with 128x128x128 elements each 2 bytes
- Entity Chunk 8x8x8 each with Dynamic Buffer with 16x16x16 elements, each element with 240bytes.
In every frame I'm using Entity World data to fill each chunk dynamic buffers with render data(vertex,triangles).
I'm not rendering it yet. The problem I have is that without any call to dynamic buffer I got more than a 1000fps. Note that I'm still creating that 240bytes per element every frame but do not store it in dynamic buffer. If I add only one call to World dynamic buffer in a Job that is iterating over all 8x8x8 chunks performance is cut in half. Adding write call to any of chunks dynamic buffers decrease performance down to 2-4 fps. Now some clarification:
1. Why are you processing this every frame?
For benchmark reasons. I have a pretty solid voxel engine written in OOP and I was benchmarking it the same way. Currently my OOP engine is able to work with the same amount of data but keeping constant 300-400 fps with pushing data to meshRenderer and meshCollider every frame with light, uv, culling and water updates.
2. Why dynamic buffers instead of each voxel as an entity?
This would require more memory due to storing position for each voxel instead and I need direct access to neighbour voxels - water updates, light propagation, culling.

At first I thought that there is something slow in implementation of dynamic buffers but something came to my mind. When I'm using oop and I'm trying to access an element in list/hashset/dictionary, I will get a reference to that element and it will be copied into cpu cache. I'm not sure it is only a theory but in ecs if a job got a nativeList then the entire nativeList is copied to cpu cache doesn't it? This would explain decrease in performance. My laptop cpu have 4mb cache. World Entity dynamic buffer is 4mb(128x128x128x2) and we already taken entire cache without counting each chunk dynamic buffer which each is ~0,3mb. This is forcing cpu to swap huge arrays of data in and out of cache. If this is true then I have some problems getting around it.

I need access to surrounding voxels. This mean I need to know which voxel is right,left,top etc. because I need it for culling, water, light propagation. The way I did it now it is super easy to work with due to having entire voxel array access but as we can see it is not memory friendly. Any ideas how to work around it?
I could store voxels in dynamic buffers per chunk but then how should I go about accessing neighbour chunks too pass light/water beetwen them?

WolvBoar · Aug 20, 2019

The idea has fallen. I removed this 4mb dynamic buffer and left only the smaller ones. Still awfully low performance. Any ideas why is that? Is dynamic buffer really that slow? To not douplicate previous thread. Any ideas how to create voxel world with ability to access neighbours without dynamic buffers?

SeriousHatArthur · Aug 22, 2019

Anyone? Seriously it can be even more simplified. One for loop 16*16*16 in IJobParallelFor(256 batched in 4). One preallocated readonly NativeArray with 1 element. Try accessing it from this for loop. You can do multiple math operations and fps dont even move but only accessing this native array with even only one element cause decrease in performance.

Joachim_Ante · Aug 22, 2019

I'm not sure it is only a theory but in ecs if a job got a nativeList then the entire nativeList is copied to cpu cache doesn't it?
Click to expand...

NativeList or DynamicBuffer is not copied when scheduling a job. The data itself is referenced.

It's difficult to say what exactly is slow, without actually profiling it.

If you think it is something specifically in DynamicBuffer then i suggest making a small benchmark and comparing it to a more direct implementation with arrays, and post the project folder here so we can reproduce and look into it.

Enzi · Aug 22, 2019

I've never been too invested in voxel programming but I have my experience with spatial algorithms and the biggest bottleneck is random access and the only way to work around it is to have a sequence of processes/jobs that don't end in random access, as silly as that sound.

In case you are not aware, you are not writing ECS code, more like OOP inside a job. The first thing I see is that you have 3 for loops across x, y, z which are not multi-threaded. Atop of that, but I can't verify that without seeing the rest of the code, an array that is not sorted for the (known) sequence it gets accessed. This can't be fast because it basically runs on 1 thread.
With how many voxels are you testing?

What you can also do is to get rid of all these buffers and make components. Have a "voxel chunk" entity with a buffer of "voxel" entities so you can process them in parallel.

Search Unity

VoxelEngine in ECS slower than OOP

WolvBoar

WolvBoar

SeriousHatArthur

Joachim_Ante

Unity Technologies

Enzi

Search Unity

Unity ID

Useful Searches

VoxelEngine in ECS slower than OOP

WolvBoar

WolvBoar

SeriousHatArthur

Joachim_Ante

Unity Technologies

Enzi