Search Unity

  1. Unity 2019.2 is now released.
    Dismiss Notice

DynamicBuffer is awfully slow

Discussion in 'Data Oriented Technology Stack' started by SeriousHatArthur, Aug 15, 2019 at 9:47 PM.

  1. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Hello. This maybe a bug on my end but I see huuuge decrease in performance when using DynamicBuffers.

    Code (CSharp):
    1. [BurstCompile]
    2.     struct RenderJob : IJobForEachWithEntity<ChunkData>
    3.     {
    4.         [ReadOnly] public DynamicBuffer<Voxel> Voxels;
    5.         [NativeDisableParallelForRestriction] public BufferFromEntity<Vertex> VerticleData;
    6.         [NativeDisableParallelForRestriction] public BufferFromEntity<Triangle> TriangleData;
    7.  
    8.         public void Execute(Entity entity, int index, ref ChunkData chunkData)
    9.         {
    10.             if (chunkData.HasChanged)
    11.             {
    12.                 int3 size = chunkData.Size;
    13.                 int3 position = chunkData.Position;
    14.                 var verticleData = VerticleData[entity];
    15.                 var triangleData = TriangleData[entity];
    16.                 verticleData.Clear();
    17.                 triangleData.Clear();
    18.  
    19.                 var voxelArray = Voxels.AsNativeArray();
    20.  
    21.                 for (int x = 0; x < size.x; x++)
    22.                 {
    23.                     for (int y = 0; y < size.y; y++)
    24.                     {
    25.                         for (int z = 0; z < size.z; z++)
    26.                         {
    27.                            int flatPosition = (position.x + x) + size.x * ((position.y + y) + size.z * (position.z + z));
    28.                             if (voxelArray[flatPosition].Id != 0)
    29.                             {
    30.                                 //int triangleCount = verticleData.Length;
    31.                                 //verticleData.Add(new Vertex() { Value = new Vector3(x + 0f, y + 0f, z + 0f), });
    32.                                 /*verticleData.Add(new Vertex() { Value = new Vector3(x + 0f, y + 1f, z + 0f), });
    33.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 1f, y + 1f, z + 0f), });
    34.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 1f, y + 0f, z + 0f), });*/
    35.  
    36.                                 /*triangleData.Add(new Triangle() { Value = triangleCount });
    37.                                 triangleData.Add(new Triangle() { Value = triangleCount + 1 });
    38.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    39.  
    40.                                 triangleData.Add(new Triangle() { Value = triangleCount });
    41.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    42.                                 triangleData.Add(new Triangle() { Value = triangleCount + 3 });*/
    43.  
    44.                                 /*triangleCount = verticleData.Length;
    45.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 1f, y + 0f, z + 1f), });
    46.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 1f, y + 1f, z + 1f), });
    47.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 0f, y + 1f, z + 1f), });
    48.                                 verticleData.Add(new Vertex() { Value = new Vector3(x + 0f, y + 0f, z + 1f), });*/
    49.  
    50.                                /* triangleData.Add(new Triangle() { Value = triangleCount });
    51.                                 triangleData.Add(new Triangle() { Value = triangleCount + 1 });
    52.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    53.  
    54.                                 triangleData.Add(new Triangle() { Value = triangleCount });
    55.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    56.                                 triangleData.Add(new Triangle() { Value = triangleCount + 3 });*/
    57.                             }
    58.                         }
    59.                     }
    60.                 }
    61.                 chunkData.HasChanged = true;
    62.                 chunkData.ReadyToRender = true;
    63.             }
    64.         }
    65.     }
    As you can see most of it is commented. If i leave line - if (voxelArray[flatPosition].Id != 0) performance drops from 1600fps+ to 300-400fps. Without comments, full code drops down to 2 fps. What can cause this? There may be a thread lock somewhere that I don't know about or is this performance drop expected?
     
  2. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Firstly a side note, you have this. is haschanged meant to be false.

    if (chunkData.HasChanged)
    {
    // ..
    chunkData.HasChanged = true;
    }

    Anyway onto the question, I don't see anything that would cause you issues. Are you benching this in editor or runtime?
    While runtime would be much faster, I don't see why you'd drop to 2fps unless

    a) size is huge
    b) you have a lot of chunkData
     
  3. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,312
    Nothing to do with DynamicBuffer, AsNativeArray returns a NativeArray which is what you are reading from.

    How many voxel chunks are there and what are the dimensions of a voxel chunk? My guess would be you are indexing into that array more then you think, or underestimating the cost of random array access.
     
  4. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    @tertle I know it should be false but I set it like that for benchmarking. Editor.
    @tertle @snacktime 128x128x128 each voxelchunk is iterating over 16x16x16
    DynamicBuffer will drop fps lower than NativeArray but nativearray is also causing pretty high fps drop. It is weird because similar code not wrote in ecs works much faster even on single core :|
     
  5. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    I mean it isn't needed for my project because I can't see a scenario when I would need full world updated every frame. Most likely a single chunk update when changed which will not cause any issues but I didn't expected such a performance drop and was wondering if it is something that I did. I started learning esc today so it is highly possible. Also my cpu is pretty old but still should handle this well i7 - 2600k overclocked
     
  6. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Well it is up to 41,943,040 adds with safety checks in a single frame if you iterate the entire thing.

    If you compile to a build what is the performance like (safety checks are removed.)

    Also that said, are you sure it's this job causing the performance issue not something reacting to it? What are you doing with the chunk data? Applying it to a mesh?
     
  7. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Just tested - around 2 fps.
    Reduced to 32x32x32 ~ 40 fps :/
    No I'm not applying it to any mesh. I removed other systems. Also I can comment out all "adding" and leave only voxelArray[flatPosition].Id != 0 and performance would drop, not that much but difference with this line and without is around 30x-40x in editor.
     
    Last edited: Aug 15, 2019 at 11:57 PM
  8. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Is burst enabled? Screenshot profiler (timeline)
     
  9. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
  10. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,312
    You aren't comparing apples to oranges I don't think. You say NativeArray is faster but for the line you specifically mentioned that is a NativeArray. That it was obtained from a buffer makes no difference. So there is something else that's creating the difference.

    Also why I asked how many chunks and not dimensions is because dimensions alone is ambiguous. I don't know if the unit of measurement is a Unity meter or a voxel chunk.
     
  11. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Expand the jobs tab.

    Also side note you should use float3 not Vector3
     
  12. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    I was using Vector3 because later(this is disabled currently) I'm using unsafe code to copy memory directly from DynamicBuffers to Mesh.vertices etc. I could try to use float3 and check if it is changing something.
     
  13. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Changed Vector3 to float3 - no noticeable performance difference.
     
  14. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,312
    I can't remember if when you hit a sync point the entire job time shows up under waiting for job group, @tertle might remember that. Why I mention this is maybe you have a sync point hitting way too soon, where it's stalling the main thread for almost the entire job run time.

    But your job is obviously really expensive the above aside. So I'm going back to my first guess on this.
     
  15. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Ok so what I've realized is after testing your code either something going really wrong or you are literally creating a 2048x2048x2048 world (128x128x128 chunks of 16x16x16) and trying to update it all in a single frame (a whopping 8.5 billion voxels).

    The job is super quick. I can update a section of 128x128x128 (2.1mill voxels) in a fraction of a ms even with safety system on.
     
  16. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    128x128x128 is a number of voxels. There is 8x8x8 chunks.
     
  17. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Something must be wrong because I'm adding here only 2 faces and not even pushing it to rendering. I have similar code that work only on 2 threads without ecs and it was pushing 256x256x128 voxels with rendering and all 6 sides which mean addidional 4 groups of adding triangles and vertices.
     
  18. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Code (CSharp):
    1. [DisableAutoCreation]
    2. [UpdateAfter(typeof(VoxelGenerationSystem))]
    3. public class VoxelRenderingSystem : JobComponentSystem
    4. {
    5.     private Entity voxelWorldEntity;
    6.  
    7.     protected override void OnCreate()
    8.     {
    9.         voxelWorldEntity = GetSingletonEntity<VoxelWorldData>();
    10.     }
    11.  
    12.     protected override JobHandle OnUpdate(JobHandle inputDeps)
    13.     {
    14.         var job = new RenderJob()
    15.         {
    16.             Voxels = GetBufferFromEntity<Voxel>(true)[voxelWorldEntity],
    17.             VerticleData = GetBufferFromEntity<Vertex>(false),
    18.             TriangleData = GetBufferFromEntity<Triangle>(false),
    19.         };
    20.         var handler = job.Schedule(this, inputDeps);
    21.         return handler;
    22.     }
    23.  
    24.     [BurstCompile]
    25.     struct RenderJob : IJobForEachWithEntity<ChunkData>
    26.     {
    27.         [ReadOnly] public DynamicBuffer<Voxel> Voxels;
    28.         [NativeDisableParallelForRestriction] public BufferFromEntity<Vertex> VerticleData;
    29.         [NativeDisableParallelForRestriction] public BufferFromEntity<Triangle> TriangleData;
    30.  
    31.         public void Execute(Entity entity, int index, ref ChunkData chunkData)
    32.         {
    33.             if (chunkData.HasChanged)
    34.             {
    35.                 int3 size = chunkData.Size;
    36.                 int3 position = chunkData.Position;
    37.                 var verticleData = VerticleData[entity];
    38.                 var triangleData = TriangleData[entity];
    39.                 verticleData.Clear();
    40.                 triangleData.Clear();
    41.  
    42.                 var voxelArray = Voxels.AsNativeArray();
    43.                 for (int z = 0; z < size.z; z++)
    44.                 {
    45.                     for (int y = 0; y < size.y; y++)
    46.                     {
    47.                         for (int x = 0; x < size.x; x++)
    48.                         {
    49.                             int flatPosition = (position.x + x) + size.x * ((position.y + y) + size.z * (position.z + z));
    50.                             if (voxelArray[flatPosition].Id != 0)
    51.                             {
    52.                                 int triangleCount = verticleData.Length;
    53.                                 verticleData.Add(new Vertex() { Value = new float3(x + 0f, y + 0f, z + 0f), });
    54.                                 verticleData.Add(new Vertex() { Value = new float3(x + 0f, y + 1f, z + 0f), });
    55.                                 verticleData.Add(new Vertex() { Value = new float3(x + 1f, y + 1f, z + 0f), });
    56.                                 verticleData.Add(new Vertex() { Value = new float3(x + 1f, y + 0f, z + 0f), });
    57.  
    58.                                 triangleData.Add(new Triangle() { Value = triangleCount });
    59.                                 triangleData.Add(new Triangle() { Value = triangleCount + 1 });
    60.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    61.  
    62.                                 triangleData.Add(new Triangle() { Value = triangleCount });
    63.                                 triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    64.                                 triangleData.Add(new Triangle() { Value = triangleCount + 3 });
    65.  
    66.                                 triangleCount = verticleData.Length;
    67.                                 verticleData.Add(new Vertex() { Value = new float3(x + 1f, y + 0f, z + 1f), });
    68.                                 verticleData.Add(new Vertex() { Value = new float3(x + 1f, y + 1f, z + 1f), });
    69.                                 verticleData.Add(new Vertex() { Value = new float3(x + 0f, y + 1f, z + 1f), });
    70.                                 verticleData.Add(new Vertex() { Value = new float3(x + 0f, y + 0f, z + 1f), });
    71.  
    72.                                  triangleData.Add(new Triangle() { Value = triangleCount });
    73.                                  triangleData.Add(new Triangle() { Value = triangleCount + 1 });
    74.                                  triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    75.  
    76.                                  triangleData.Add(new Triangle() { Value = triangleCount });
    77.                                  triangleData.Add(new Triangle() { Value = triangleCount + 2 });
    78.                                  triangleData.Add(new Triangle() { Value = triangleCount + 3 });
    79.                             }
    80.                         }
    81.                     }
    82.                 }
    83.                 chunkData.HasChanged = true;
    84.                 chunkData.ReadyToRender = true;
    85.             }
    86.         }
    87.     }
    88. }
    89.  
    90.  
    91. public struct Voxel : IBufferElementData
    92. {
    93.     public short Id;
    94. }
    95.  
    96. public struct Vertex : IBufferElementData
    97. {
    98.     public float3 Value;
    99. }
    100.  
    101. public struct Triangle : IBufferElementData
    102. {
    103.     public int Value;
    104.     public static implicit operator int(Triangle e) { return e.Value; }
    105. }
    And the world creation:
    Code (CSharp):
    1. public class SystemManager : MonoBehaviour
    2. {
    3.     //public GameObject Prefab;
    4.     //public Dictionary<int3, MeshFilter> Meshes = new Dictionary<int3, MeshFilter>();
    5.  
    6.     void Awake()
    7.     {
    8.         Application.targetFrameRate = 30;
    9.         var world = new World("VoxelWorld");
    10.         ScriptBehaviourUpdateOrder.UpdatePlayerLoop(world);
    11.         World.Active = world;
    12.  
    13.         var entityManager = world.EntityManager;
    14.         var chunkRendererType = ComponentType.ReadWrite<ChunkData>();
    15.         var voxelWorldType = ComponentType.ReadWrite<VoxelWorldData>();
    16.  
    17.         var worldArchetype = entityManager.CreateArchetype(voxelWorldType);
    18.         var chunkRendererArchetype = entityManager.CreateArchetype(chunkRendererType);
    19.  
    20.         var worldEntity = entityManager.CreateEntity(worldArchetype);
    21.  
    22.         entityManager.SetComponentData<VoxelWorldData>(worldEntity, new VoxelWorldData() { Size = new int3(128, 128, 128) });
    23.         var buffer = entityManager.AddBuffer<Voxel>(worldEntity);
    24.  
    25.         for (int x = 0; x < 128; x += 16)
    26.         {
    27.             for (int y = 0; y < 128; y += 16)
    28.             {
    29.                 for (int z = 0; z < 128; z += 16)
    30.                 {
    31.                     var chunkRenderer = entityManager.CreateEntity(chunkRendererArchetype);
    32.                     entityManager.SetComponentData<ChunkData>(chunkRenderer, new ChunkData() { Position = new int3(x, y, z), Size = new int3(16, 16, 16), HasChanged = true });
    33.                     entityManager.AddBuffer<Vertex>(chunkRenderer);
    34.                     entityManager.AddBuffer<Triangle>(chunkRenderer);
    35.                 }
    36.             }
    37.         }
    38.  
    39.         var groupSystem = world.GetOrCreateSystem<SimulationSystemGroup>();
    40.         var generationSystem = world.GetOrCreateSystem<VoxelGenerationSystem>();
    41.         var renderSystem = world.GetOrCreateSystem<VoxelRenderingSystem>();
    42.         //var renderBridgeSystem = world.GetOrCreateSystem<VoxelRenderBridgeSystem>();
    43.         //renderBridgeSystem.OnReadyToRender += OnReadyToRender;
    44.  
    45.         groupSystem.AddSystemToUpdateList(renderSystem);
    46.         groupSystem.AddSystemToUpdateList(generationSystem);
    47.         //groupSystem.AddSystemToUpdateList(renderBridgeSystem);
    48.  
    49.         groupSystem.SortSystemUpdateList();
    50.     }
    I set targetFramerate for testing, it was previously set to 3000 :| This code is giving me 2fps after build.
     
  19. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,312
    There is no world where several billion random array indexes won't produce something similar to what you are getting ECS or no ECS.
     
  20. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    @snacktime "The job is super quick. I can update a section of 128x128x128 (2.1mill voxels) in a fraction of a ms even with safety system on." Didn't this mean that I am actually doing something wrong here? :/
     
  21. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    That was me. After my unity crashed when I tried the 2048x it slowed down a lot so not sure what happened first time. I can definitely replicate your issue now.

    I don't think it's the access as much as it's just too costly to add up to 41,943,040 time which is what you seem to have concluded.

    My advice instead, resize to max array.

    Code (CSharp):
    1. if (chunkData.HasChanged)
    2. {
    3.     int3 size = chunkData.Size;
    4.     verticleData.Clear();
    5.     verticleData.ResizeUninitialized(size.x * size.y * size.z * 8);
    var count = 0;

    Code (CSharp):
    1.                     var count = 0;
    2.  
    3.                     for (int z = 0; z < size.z; z++)
    4.                     {
    5.                         for (int y = 0; y < size.y; y++)
    6.                         {
    7.                             for (int x = 0; x < size.x; x++)
    8.                             {
    9.                                 int flatPosition = (position.x + x) + size.x * ((position.y + y) + size.z * (position.z + z));
    10.                                 if (voxelArray[flatPosition].Id != 0)
    11.                                 {
    12.                
    13.                                    int triangleCount = verticleData.Length;
    14.                                    verticleData[count] = new Vertex() { Value = new float3(x + 0f, y + 0f, z + 0f), };
    15.                                    verticleData[count+1] =new Vertex() { Value = new float3(x + 0f, y + 1f, z + 0f), };
    16.                                    verticleData[count+2] =new Vertex() { Value = new float3(x + 1f, y + 1f, z + 0f), };
    17.                                    verticleData[count+3] =new Vertex() { Value = new float3(x + 1f, y + 0f, z + 0f),
    18.                                    // ...
    19.                                    count += 8;
    20.  
    21.  
    22. };
    Then trim it

    Code (CSharp):
    1. verticleData.RemoveRange(count, verticleData.Length - count);
    Have not tested but I suspect it'll be much better performance.
     
  22. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    We went from 2 to 3 fps! :D Tomorrow I will test not ecs code with the same task to confirm how it is doing - but I'm almost sure that 32x32x32 voxel updated every frame without any mesh creation/rendering should not bring i7 2600k down to 40fps by just random array access :/
     
  23. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Also this gives me ~1000fps - 128x128x128
    Code (CSharp):
    1.  for (int z = 0; z < size.z; z++)
    2.                 {
    3.                     for (int y = 0; y < size.y; y++)
    4.                     {
    5.                         for (int x = 0; x < size.x; x++)
    6.                         {
    7.                             int flatPosition = (position.x + x) + size.x * ((position.y + y) + size.z * (position.z + z));
    8.                         }
    9.                     }
    10.                 }
    and this gives my ~30 fps - 128x128x128
    Code (CSharp):
    1.  for (int z = 0; z < size.z; z++)
    2.                 {
    3.                     for (int y = 0; y < size.y; y++)
    4.                     {
    5.                         for (int x = 0; x < size.x; x++)
    6.                         {
    7.                             int flatPosition = (position.x + x) + size.x * ((position.y + y) + size.z * (position.z + z));
    8.                             verticleData.Add(new Vertex() { Value = new float3(x + 0f, y + 0f, z + 0f), });
    9.                         }
    10.                     }
    11.                 }
     
  24. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    When I built my voxel engine last year in ECS

    I only ever updated 1 chunk/frame I think (might have been 4 but it was limited at the least.)

    It was all done with buffers but this wasn't actually because of an issue like this, just updating mesh data to a mesh is too costly to update more than 4/frame if you don't want fps dips.
     
    Last edited: Aug 16, 2019 at 1:17 AM
    francois85 likes this.
  25. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    I will look into it one more day and then just drop it - I don't need to update it every frame. Can you share some info about how you stored voxels in your engine? If per entity as a chunk how did you managed chunk vs chunk visibility(to cover faces beetwen 2 chunks).
     
  26. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    1,542
    Haven't really looked at it in 9 months but from memory everything was stored in their own chunks of 32x32x32 or I may have changed it to 64x64x16 at some point (I wasn't building much depth to project so I thought it might be more efficient.)

    This was all stored in buffers on separate entities.

    As for chunk/chunk visibility, I did not bother. I used larger chunks and decided the extra complexity of removing the faces of touching chunks was not worth the small performance gains. Later I added back face culling (as that video demonstrates) which coincidentally removes half the chunk/chunk faces anyway so it was even less of a concern.
     
  27. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Thanks for inside :) Small update 90fps - 128x128x128 - I just casted DynamicBuffers to NativeArrays and went from 2 fps to 90fps. Now 32x32x32 instead of working with 30fps without rendering works in 200fps WITH rendering it with meshFilter and creating new mesh every frame :|

    Now it is more like I would expect it to be - pushing >1mln triangles onto the screen and iterating over 262 144‬ voxels creating ~ 5mln random inserts into an array and maintaining 30fps :D
     
    Last edited: Aug 16, 2019 at 1:26 AM
    tertle likes this.
  28. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    Small bump. Maybe someone will find something new in this case. Like I wrote in the post above. It is running much faster after casting it to native array but still it is slow. On my laptop with quadcore cpu Im getting around 17 FPS with 32x32x32 voxels and 2 writes to dynamicbuffer per voxel. If i remove only those calls to dynamicbuffer game suddenly run with 1000 FPS.
     
  29. elcionap

    elcionap

    Joined:
    Jan 11, 2016
    Posts:
    87
    Did increasing the capacity of the buffer before adding help your performance?
    Something like this before your loop:

    Code (CSharp):
    1. var numWritesPerVoxel = 2;
    2. var maxWrites = size.x * size.y * size.z * numWritesPerVoxel;
    3. var writesAvailable = verticleData.Capacity - verticleData.Length;
    4.  
    5. if (maxWrites > writesAvailable) {
    6.     verticleData.Capacity += maxWrites - writesAvailable;
    7. }
    8.  
    []'s
     
  30. SeriousHatArthur

    SeriousHatArthur

    Joined:
    Thursday
    Posts:
    18
    It didnt. I think I found another thing that was throtling the performance. Foreach with entity was launching chunked jobs that cause all chunk to be updated on a single thread. I used different type of job and now I have around 20fps for 128x128x128 on this 4 core laptop.