Search Unity

Burst/SIMD performs significantly slower than regular C# in Editor

Discussion in 'Data Oriented Technology Stack' started by Arycama, Dec 16, 2018.

  1. Arycama

    Arycama

    Joined:
    May 25, 2014
    Posts:
    82
    I'm using Unity 2018.3b12, and Burst SIMD instructions do not seem to work correctly in editor. Adding them seems to make my code extremely slow compared to a C# Parallel.For loop.

    Generating a 128^3 noise volume using a C# Parallel.For loop took a400ms in Editor and 200ms in a build.

    Converting this to a IJobParallelFor increased the Editor time to 1800ms, but resulted in a 2x speedup in builds. I assume the increase in editor time is mostly due to the safety checks.

    I then converted the noise function to use Unity.Mathematics and the Burst Inspector showed it generating assembly code with SIMD instructions. However, running it in Editor (With Burst Compilation and Jobs Enabled) resulted in the code now taking 5500ms, over ten times slower than the C# Parallel.For loop.

    Benchmarks (Editor/Build)
    Parallel For: 400ms/200ms
    Parallel Job: 1800ms/100ms
    Mathematics: 5500ms/100ms.

    So it seems that the editor is not using the burst-generated code at all, and executes regular C# without any SIMD functions. Is this the intended functionality, or could there be something in my code which is causing the Editor not to use Burst/SIMD, except when in builds?

    Toggling "Jobs/Enable Burst Compilation" or "Jobs/Use Burst Jobs" also doesn't have any affect on editor performance. Is this possibly ignored when using IJobParallelFor?

    Thanks for any help.

    Relevant code is below:

    Terrain Cell, builds a noise volume using a Parallel job, then runs it through a marching cubes algorithm to generate a mesh.
    Code (CSharp):
    1. public class NewTerrainCell : TerrainCellBase
    2. {
    3.     [SerializeField]
    4.     private int divisions = 32;
    5.  
    6.     [SerializeField, Tooltip("Contains information for generating noise values")]
    7.     private FractalNoiseData fractalNoise;
    8.  
    9.     private void Start()
    10.     {
    11.         var length = divisions + 3;
    12.         var interval = size / divisions;
    13.  
    14.         var densityLength = length * length * length;
    15.         var densities = new NativeArray<float>(densityLength, Allocator.TempJob);
    16.         var hash = new NativeArray<int>(Noise.hash, Allocator.TempJob);
    17.  
    18.         var densityJob = new ValueNoise3DJob(densities, hash, transform.position, length, interval, fractalNoise);
    19.         var handle = densityJob.Schedule(densityLength, 32);
    20.         handle.Complete();
    21.  
    22.         var meshDescriptor = MarchingCubesSimple.Generate(densities.ToArray(), interval, length);
    23.  
    24.         var mesh = meshDescriptor.CreateMesh();
    25.         GetComponent<MeshFilter>().sharedMesh = mesh;
    26.  
    27.         densities.Dispose();
    28.         hash.Dispose();
    29.     }
    30. }
    Noise function, executed once for each point in the 3d grid.
    Code (CSharp):
    1. [BurstCompile]
    2. public struct ValueNoise3DJob : IJobParallelFor
    3. {
    4.     [WriteOnly]
    5.     private NativeArray<float> densities;
    6.  
    7.     [ReadOnly]
    8.     private NativeArray<int> hash;
    9.  
    10.     [ReadOnly]
    11.     private int size, octaves;
    12.  
    13.     [ReadOnly]
    14.     private float interval, frequency, amplitude, lacunarity, gain;
    15.  
    16.     [ReadOnly]
    17.     private float3 position;
    18.  
    19.     public ValueNoise3DJob(NativeArray<float> densities, NativeArray<int> hash, float3 position, int size, float interval, FractalNoiseData noiseData)
    20.     {
    21.         this.hash = hash;
    22.         this.densities = densities;
    23.         this.size = size;
    24.         this.octaves = noiseData.Octaves;
    25.         this.interval = interval;
    26.         this.position = position;
    27.  
    28.         frequency = noiseData.Frequency;
    29.         amplitude = noiseData.Amplitude;
    30.         lacunarity = noiseData.Lacunarity;
    31.         gain = noiseData.Gain;
    32.     }
    33.  
    34.     const int hashMask = 255;
    35.     const float hashRecip = 1f / 255 * 2;
    36.  
    37.     public void Execute(int index)
    38.     {
    39.         var x = index % size;
    40.         var y = (index / size) % size;
    41.         var z = index / (size * size);
    42.  
    43.         var point = position + float3(x, y, z) * interval - interval;
    44.  
    45.         // Fractal noise
    46.         var density = -point.y;
    47.         var freq = frequency;
    48.         var ampl = amplitude;
    49.  
    50.         for (var i = 0; i < octaves; i++)
    51.         {
    52.             var p = point * freq;
    53.             var pi = int3(floor(p));
    54.  
    55.             var t = p - pi;
    56.             pi &= hashMask;
    57.  
    58.             var i1 = pi + 1;
    59.  
    60.             var h0 = hash[pi.x];
    61.             var h1 = hash[i1.x];
    62.             var h00 = hash[h0 + pi.y];
    63.             var h10 = hash[h1 + pi.y];
    64.             var h01 = hash[h0 + i1.y];
    65.             var h11 = hash[h1 + i1.y];
    66.             var h000 = hash[h00 + pi.z];
    67.             var h100 = hash[h10 + pi.z];
    68.             var h010 = hash[h01 + pi.z];
    69.             var h110 = hash[h11 + pi.z];
    70.             var h001 = hash[h00 + i1.z];
    71.             var h101 = hash[h10 + i1.z];
    72.             var h011 = hash[h01 + i1.z];
    73.             var h111 = hash[h11 + i1.z];
    74.  
    75.             // Smooth
    76.             t = (3 - 2 * t) * t * t;
    77.  
    78.             var input0 = float4(h000, h010, h001, h011);
    79.             var input1 = float4(h100, h110, h101, h111);
    80.  
    81.             var res0 = lerp(input0, input1, t.x);
    82.             var res1 = lerp(res0.xz, res0.yw, t.y);
    83.             var res2 = lerp(res1.x, res1.y, t.z);
    84.  
    85.             density += (res2 * hashRecip - 1) * ampl;
    86.  
    87.             ampl *= gain;
    88.             freq *= lacunarity;
    89.         }
    90.  
    91.         densities[index] = density;
    92.     }
    93. }
     
    Mr-Mechanical likes this.
  2. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,646
    You can see in the profiler window if jobs are using burst.
    Go to Profiler window, timeline view instead of hierarchy. The jobs should be postfixed with (Burst).

    If they are not your jobs are not running with burst. The default behaviour when the burst package is installed is that it will run with burst enabled in the editor. These performance results are unexpected, so most likely some configuration makes it so burst is not running in the editor.
     
    Mr-Mechanical likes this.
  3. Arycama

    Arycama

    Joined:
    May 25, 2014
    Posts:
    82
    Thanks for the response.

    I checked the profiler in Timeline. If I Schedule the job and then Complete it in the same frame, I get this result with no burst suffix:

    Unity_2018-12-17_22-39-53.png

    From what I understand, even if I'm attempting to get the result in the same frame, burst should still work and use all cores to complete the task. My CPU usage in task manager goes to 100% (Intel i7-4710MQ Quad core CPU with hyper threading), so at least the threading part is working correctly. Though the fact that only two of the 7 worker threads are being used is strange.

    If I use a simple coroutine "while (!job.IsCompleted) yield return null;" Then none of the worker threads have any activity in the profiler at all. The task takes about the same time to complete (nearly 9 seconds in this case), but the rest of Unity is still responsive. For reference again, this task takes around 400ms using a C# Parallel For loop in Editor.

    When compiling a regular (Non-parallel) job, it correctly shows up in the profiler with the (Burst) suffix. Is there an issue with the way I am using the IJobParallelFor? (All the relevant code is in the first post) It has the burst complile attribute, and appears to work fine in builds as I gain a lot of performance there.