Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Understanding profiler information

Discussion in 'C# Job System' started by ATMLVE, Aug 25, 2023.

  1. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    I'm working on optimizing the terrain generation of a procedural planet system. It uses LOD quadtrees so the meshes are frequently updated. Main questions are bolded. I'll post the jobs code below, but for now I'm more looking for conceptual help to get leads to follow.

    I've implemented a job system to calculate each vertex's position, as my original method was so slow. It's still slow now but I'm trying to interpret what the profiler is telling me.

    Below is what the profiler looks like on a heavy frame when I'm recalculating everything, I do have settings higher than I'd like in order to see what's really chewing memory:



    In the middle section, the main thread, the big blocks of functions labeled amidst all the color are mostly job stuff like JobHandle, WaitForJobGroupID, but also garbage collection. Why would the jobs not actually running in here? I see the large garbage collection and I realize it's pretty brutal to run this many jobs as each one needs an array. Because it's running quads, each quad has a set number of vertices depending on it's detail level so I think I should be able to re-use the same native array many times, is that the case and would that help? I've worked some with lists and arrays and know instantiating tons of arrays is generally a big no-no.

    In either case, each block not running my little batches of jobs is either empty or full of blue script job stuff.

    Another question is down below I see 10 workers, but why is only worker 2 doing anything and why is it doing it so choppily? Below is a zoomed-in section of an arbitrary point:



    Is there theoretically a way my code could be factored such that all those little blocks of yellowbrown on worker2 could be run successively and across several more threads, or am I misunderstanding what I should be able to accomplish?

    The code that runs the job, once per every single quad:
    Code (CSharp):
    1.         var vertexArray = new Unity.Collections.NativeArray<Vector3>(vertices, Unity.Collections.Allocator.TempJob);
    2.  
    3.         var job = new EvaluateVertexPointJob
    4.         {
    5.             jobVertices = vertexArray,
    6.             transformMatrix = transformMatrix,
    7.             quadIndex = quadIndex,
    8.             planetSize = 50
    9.         };
    10.         var jobHandle = job.Schedule(vertices.Length, vertices.Length);
    11.         jobHandle.Complete();
    12.         vertexArray.CopyTo(vertices);
    13.         vertexArray.Dispose();


    The code for the job only is below. Self-conscious disclaimer, I didn't write most of this, and a lot it has been mercilessly butchered just to make the code do the math it should be doing and output something without errors. I refactored the several different scripts I had and squashed them all into one supermonster with complicated references replaced by hardcoded placeholders just to see if I could get anything out of it.
    Code (CSharp):
    1. using System;
    2. using Unity.Collections;
    3. using Unity.Jobs;
    4. using Unity.Mathematics;
    5. using UnityEngine;
    6.  
    7. public struct EvaluateVertexPointJob : IJobParallelFor
    8. {
    9.     public NativeArray<Vector3> jobVertices;
    10.     public Matrix4x4 transformMatrix;
    11.     public int quadIndex;
    12.     public float planetSize;
    13.  
    14.     public void Execute(int i)
    15.     {
    16.         var pointOnCube = transformMatrix.MultiplyPoint(Presets.quadTemplateVertices[quadIndex][i]);
    17.         var pointOnUnitSphere = pointOnCube.normalized * planetSize;
    18.  
    19.         //Debug.Log("int i = " + i);
    20.         //Debug.Log("jobVertices.Length = " + jobVertices.Length);
    21.  
    22.         int[] Source = {
    23.             151, 160, 137, 91, 90, 15, 131, 13, 201, 95, 96, 53, 194, 233, 7, 225, 140, 36, 103, 30, 69, 142,
    24.             8, 99, 37, 240, 21, 10, 23, 190, 6, 148, 247, 120, 234, 75, 0, 26, 197, 62, 94, 252, 219, 203,
    25.             117, 35, 11, 32, 57, 177, 33, 88, 237, 149, 56, 87, 174, 20, 125, 136, 171, 168, 68, 175, 74, 165,
    26.             71, 134, 139, 48, 27, 166, 77, 146, 158, 231, 83, 111, 229, 122, 60, 211, 133, 230, 220, 105, 92, 41,
    27.             55, 46, 245, 40, 244, 102, 143, 54, 65, 25, 63, 161, 1, 216, 80, 73, 209, 76, 132, 187, 208, 89,
    28.             18, 169, 200, 196, 135, 130, 116, 188, 159, 86, 164, 100, 109, 198, 173, 186, 3, 64, 52, 217, 226, 250,
    29.             124, 123, 5, 202, 38, 147, 118, 126, 255, 82, 85, 212, 207, 206, 59, 227, 47, 16, 58, 17, 182, 189,
    30.             28, 42, 223, 183, 170, 213, 119, 248, 152, 2, 44, 154, 163, 70, 221, 153, 101, 155, 167, 43, 172, 9,
    31.             129, 22, 39, 253, 19, 98, 108, 110, 79, 113, 224, 232, 178, 185, 112, 104, 218, 246, 97, 228, 251, 34,
    32.             242, 193, 238, 210, 144, 12, 191, 179, 162, 241, 81, 51, 145, 235, 249, 14, 239, 107, 49, 192, 214, 31,
    33.             181, 199, 106, 157, 184, 84, 204, 176, 115, 121, 50, 45, 127, 4, 150, 254, 138, 236, 205, 93, 222, 114,
    34.             67, 29, 24, 72, 243, 141, 128, 195, 78, 66, 215, 61, 156, 180
    35.         };
    36.         const int RandomSize = 256;
    37.         const double Sqrt3 = 1.7320508075688772935;
    38.         const double Sqrt5 = 2.2360679774997896964;
    39.         int[] _random;
    40.  
    41.  
    42.         /// Skewing and unskewing factors for 2D, 3D and 4D,
    43.         /// some of them pre-multiplied.
    44.         const double F2 = 0.5 * (Sqrt3 - 1.0);
    45.  
    46.         const double G2 = (3.0 - Sqrt3) / 6.0;
    47.         const double G22 = G2 * 2.0 - 1;
    48.  
    49.         const double F3 = 1.0 / 3.0;
    50.         const double G3 = 1.0 / 6.0;
    51.  
    52.         const double F4 = (Sqrt5 - 1.0) / 4.0;
    53.         const double G4 = (5.0 - Sqrt5) / 20.0;
    54.         const double G42 = G4 * 2.0;
    55.         const double G43 = G4 * 3.0;
    56.         const double G44 = G4 * 4.0 - 1.0;
    57.  
    58.         /// <summary>
    59.         /// Gradient vectors for 3D (pointing to mid points of all edges of a unit
    60.         /// cube)
    61.         /// </summary>
    62.  
    63.  
    64.  
    65.  
    66.         //Debug.Log("jobVertices.Length = "+jobVertices.Length);
    67.         //Debug.Log("i = "+i);
    68.  
    69.         if (i < jobVertices.Length && i > 1)
    70.         {
    71.             jobVertices[i] = CalculatePointOnPlanet(pointOnUnitSphere, planetSize);
    72.  
    73.         }
    74.  
    75.  
    76.         Vector3 CalculatePointOnPlanet(Vector3 pointOnUnitSphere, float planetRadius)
    77.         {
    78.             float firstLayerValue; // store the first layer outside the main for loop so that it can be used for subsequent layers
    79.             float elevation;
    80.  
    81.             firstLayerValue = EvaluateSimpleNoise(pointOnUnitSphere); // this does what the main for loop below does, just separately to be stored separately
    82.             elevation = firstLayerValue;
    83.  
    84.             return (1 + elevation) * planetRadius * pointOnUnitSphere;
    85.         }
    86.  
    87.         float EvaluateSimpleNoise(Vector3 point)
    88.         {
    89.             float noiseValue = 0;
    90.             float frequency = 0.2f;
    91.             float amplitude = 1;
    92.  
    93.             // this code iterates layers of noise on top of each other. It starts by defining a placeholder variable v for iteration.
    94.             // v takes the current point of the unit sphere, multiplys by frequency and the offset within the noise, then evaluates that point within the noise function to get a height.
    95.             // that height, from -1 to 1, is converted to a value of 0 to 1, then multiplied by amplitude.
    96.             for (int j = 0; j < 2; j++)
    97.             {
    98.                 float v = Evaluate(point * frequency);
    99.                 noiseValue += (v + 1) * 0.5f * amplitude;
    100.                 frequency *= 0.2f; // when roughness is greater than 1, the frequency will increase with each filter, adding finer detail (I think)
    101.                 amplitude *= 0.2f; // amplitude is a modififer assigned to each point. When persistence is less than 1, the amplitude decreases with each layer.
    102.  
    103.  
    104.                 /// <summary>
    105.                 /// Generates value, typically in range [-1, 1]
    106.                 /// </summary>
    107.                 float Evaluate(UnityEngine.Vector3 point)
    108.                 {
    109.                     Randomize(0);
    110.  
    111.                     int[][] Grad3 =
    112.                     {
    113.                         new[] {1, 1, 0}, new[] {-1, 1, 0}, new[] {1, -1, 0},
    114.                         new[] {-1, -1, 0}, new[] {1, 0, 1}, new[] {-1, 0, 1},
    115.                         new[] {1, 0, -1}, new[] {-1, 0, -1}, new[] {0, 1, 1},
    116.                         new[] {0, -1, 1}, new[] {0, 1, -1}, new[] {0, -1, -1}
    117.                     };
    118.  
    119.                     double x = point.x;
    120.                     double y = point.y;
    121.                     double z = point.z;
    122.                     double n0 = 0, n1 = 0, n2 = 0, n3 = 0;
    123.  
    124.                     // Noise contributions from the four corners
    125.                     // Skew the input space to determine which simplex cell we're in
    126.                     double s = (x + y + z) * F3;
    127.  
    128.                     // for 3D
    129.                     int i = FastFloor(x + s);
    130.                     int j = FastFloor(y + s);
    131.                     int k = FastFloor(z + s);
    132.  
    133.                     double t = (i + j + k) * G3;
    134.  
    135.                     // The x,y,z distances from the cell origin
    136.                     double x0 = x - (i - t);
    137.                     double y0 = y - (j - t);
    138.                     double z0 = z - (k - t);
    139.  
    140.                     // For the 3D case, the simplex shape is a slightly irregular tetrahedron.
    141.                     // Determine which simplex we are in.
    142.                     // Offsets for second corner of simplex in (i,j,k)
    143.                     int i1, j1, k1;
    144.  
    145.                     // coords
    146.                     int i2, j2, k2; // Offsets for third corner of simplex in (i,j,k) coords
    147.  
    148.                     if (x0 >= y0)
    149.                     {
    150.                         if (y0 >= z0)
    151.                         {
    152.                             // X Y Z order
    153.                             i1 = 1;
    154.                             j1 = 0;
    155.                             k1 = 0;
    156.                             i2 = 1;
    157.                             j2 = 1;
    158.                             k2 = 0;
    159.                         }
    160.                         else if (x0 >= z0)
    161.                         {
    162.                             // X Z Y order
    163.                             i1 = 1;
    164.                             j1 = 0;
    165.                             k1 = 0;
    166.                             i2 = 1;
    167.                             j2 = 0;
    168.                             k2 = 1;
    169.                         }
    170.                         else
    171.                         {
    172.                             // Z X Y order
    173.                             i1 = 0;
    174.                             j1 = 0;
    175.                             k1 = 1;
    176.                             i2 = 1;
    177.                             j2 = 0;
    178.                             k2 = 1;
    179.                         }
    180.                     }
    181.                     else
    182.                     {
    183.                         // x0 < y0
    184.                         if (y0 < z0)
    185.                         {
    186.                             // Z Y X order
    187.                             i1 = 0;
    188.                             j1 = 0;
    189.                             k1 = 1;
    190.                             i2 = 0;
    191.                             j2 = 1;
    192.                             k2 = 1;
    193.                         }
    194.                         else if (x0 < z0)
    195.                         {
    196.                             // Y Z X order
    197.                             i1 = 0;
    198.                             j1 = 1;
    199.                             k1 = 0;
    200.                             i2 = 0;
    201.                             j2 = 1;
    202.                             k2 = 1;
    203.                         }
    204.                         else
    205.                         {
    206.                             // Y X Z order
    207.                             i1 = 0;
    208.                             j1 = 1;
    209.                             k1 = 0;
    210.                             i2 = 1;
    211.                             j2 = 1;
    212.                             k2 = 0;
    213.                         }
    214.                     }
    215.  
    216.                     // A step of (1,0,0) in (i,j,k) means a step of (1-c,-c,-c) in (x,y,z),
    217.                     // a step of (0,1,0) in (i,j,k) means a step of (-c,1-c,-c) in (x,y,z),
    218.                     // and
    219.                     // a step of (0,0,1) in (i,j,k) means a step of (-c,-c,1-c) in (x,y,z),
    220.                     // where c = 1/6.
    221.  
    222.                     // Offsets for second corner in (x,y,z) coords
    223.                     double x1 = x0 - i1 + G3;
    224.                     double y1 = y0 - j1 + G3;
    225.                     double z1 = z0 - k1 + G3;
    226.  
    227.                     // Offsets for third corner in (x,y,z)
    228.                     double x2 = x0 - i2 + F3;
    229.                     double y2 = y0 - j2 + F3;
    230.                     double z2 = z0 - k2 + F3;
    231.  
    232.                     // Offsets for last corner in (x,y,z)
    233.                     double x3 = x0 - 0.5;
    234.                     double y3 = y0 - 0.5;
    235.                     double z3 = z0 - 0.5;
    236.  
    237.                     // Work out the hashed gradient indices of the four simplex corners
    238.                     int ii = i & 0xff;
    239.                     int jj = j & 0xff;
    240.                     int kk = k & 0xff;
    241.  
    242.                     // Calculate the contribution from the four corners
    243.                     double t0 = 0.6 - x0 * x0 - y0 * y0 - z0 * z0;
    244.                     if (t0 > 0)
    245.                     {
    246.                         t0 *= t0;
    247.                         int gi0 = _random[ii + _random[jj + _random[kk]]] % 12;
    248.                         n0 = t0 * t0 * Dot(Grad3[gi0], x0, y0, z0);
    249.                     }
    250.  
    251.                     double t1 = 0.6 - x1 * x1 - y1 * y1 - z1 * z1;
    252.                     if (t1 > 0)
    253.                     {
    254.                         t1 *= t1;
    255.                         int gi1 = _random[ii + i1 + _random[jj + j1 + _random[kk + k1]]] % 12;
    256.                         n1 = t1 * t1 * Dot(Grad3[gi1], x1, y1, z1);
    257.                     }
    258.  
    259.                     double t2 = 0.6 - x2 * x2 - y2 * y2 - z2 * z2;
    260.                     if (t2 > 0)
    261.                     {
    262.                         t2 *= t2;
    263.                         int gi2 = _random[ii + i2 + _random[jj + j2 + _random[kk + k2]]] % 12;
    264.                         n2 = t2 * t2 * Dot(Grad3[gi2], x2, y2, z2);
    265.                     }
    266.  
    267.                     double t3 = 0.6 - x3 * x3 - y3 * y3 - z3 * z3;
    268.                     if (t3 > 0)
    269.                     {
    270.                         t3 *= t3;
    271.                         int gi3 = _random[ii + 1 + _random[jj + 1 + _random[kk + 1]]] % 12;
    272.                         n3 = t3 * t3 * Dot(Grad3[gi3], x3, y3, z3);
    273.                     }
    274.  
    275.                     // Add contributions from each corner to get the final noise value.
    276.                     // The result is scaled to stay just inside [-1,1]
    277.                     return (float)(n0 + n1 + n2 + n3) * 32;
    278.                 }
    279.  
    280.  
    281.                 void Randomize(int seed)
    282.                 {
    283.                     _random = new int[RandomSize * 2];
    284.  
    285.                     if (seed != 0)
    286.                     {
    287.                         // Shuffle the array using the given seed
    288.                         // Unpack the seed into 4 bytes then perform a bitwise XOR operation
    289.                         // with each byte
    290.                         var F = new byte[4];
    291.                         UnpackLittleUint32(seed, ref F);
    292.  
    293.                         for (int i = 0; i < Source.Length; i++)
    294.                         {
    295.                             _random[i] = Source[i] ^ F[0];
    296.                             _random[i] ^= F[1];
    297.                             _random[i] ^= F[2];
    298.                             _random[i] ^= F[3];
    299.  
    300.                             _random[i + RandomSize] = _random[i];
    301.                         }
    302.  
    303.                     }
    304.                     else
    305.                     {
    306.                         for (int i = 0; i < RandomSize; i++)
    307.                             _random[i + RandomSize] = _random[i] = Source[i];
    308.                     }
    309.                 }
    310.  
    311.                 double Dot(int[] g, double x, double y, double z)
    312.                 {
    313.                     return g[0] * x + g[1] * y + g[2] * z;
    314.                 }
    315.  
    316.                 static int FastFloor(double x)
    317.                 {
    318.                     return x >= 0 ? (int)x : (int)x - 1;
    319.                 }
    320.  
    321.                 /// <summary>
    322.                 /// Unpack the given integer (int32) to an array of 4 bytes  in little endian format.
    323.                 /// If the length of the buffer is too smal, it wil be resized.
    324.                 /// </summary>
    325.                 /// <param name="value">The value.</param>
    326.                 /// <param name="buffer">The output buffer.</param>
    327.                 static byte[] UnpackLittleUint32(int value, ref byte[] buffer)
    328.                 {
    329.                     if (buffer.Length < 4)
    330.                         Array.Resize(ref buffer, 4);
    331.  
    332.                     buffer[0] = (byte)(value & 0x00ff);
    333.                     buffer[1] = (byte)((value & 0xff00) >> 8);
    334.                     buffer[2] = (byte)((value & 0x00ff0000) >> 16);
    335.                     buffer[3] = (byte)((value & 0xff000000) >> 24);
    336.  
    337.                     return buffer;
    338.                 }
    339.  
    340.             }
    341.  
    342.             noiseValue = Mathf.Max(0, noiseValue) * 0.03f;
    343.             return noiseValue;
    344.         }
    345.     }
    346.  

    Oh and here's what the hierarchy looks like on this frame but I don't know if this is of use:
     
    Last edited: Aug 25, 2023
  2. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    So, I didn't dive super deep into your code, but I'll give you some headers about stuff that you should look at because it looks suspicious.

    In your randomize function (which according to the hierarchy view is the one that's taking the most time), you're allocating a lot of memory (see the GC Allocs column in hierachy view). You're allocating 256 ints per call to random, and you're always giving in the same seed. Either preallocate and fill that array beforehand, or create buffer's that the jobs can reuse. Same with Source and Grad3.

    Your lambdas/local functions should also be static, with all their arguments sent in to them, otherwise C# will heap allocate space for you to store the implicitly captured variables in (more GC allocs).

    You should see a speed-up once you've gotten rid of all your GC allocations. At that point you should also be able to use burst on the job, which will net you an even greater speed up.

    Regarding why you're only using 1 thread, could that be because when you schedule you're scheduling with a batch size of vertices.Length. Does it improve if you change your schedule call to: job.Schedule(vertices.Length, 1); ?

    Also, if you enable callstack on memory in the profiler, you can see where memory is allocated from in the timeline view. That makes it easier to hunt down what your memory allocations are.
     
    Last edited: Aug 25, 2023
    ATMLVE and CodeSmile like this.
  3. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    3,721
    Copy that about the inline functions. Those should be static. And you definitely should not allocate that "Source" array in each execute, make that static too! In fact, make it a native array to avoid the GC altogether.

    If you want to do Burst a favor, be consistent and replace all the managed [] arrays with native collection types. The same goes for math types - don't use Vector3 in a bursted job, use the float3 from the Unity.Mathematics package, and the same goes for all math operations (Mathf.Max() => math.max()). And specifically that random! Why did you make your own in the first place? Could you use just any kind of random? If so, use math.random over yours, especially since yours may call Resize on the array.

    Same goes for "FastFloor" and "Dot" - try replacing them with the appropriate math functions and check which is faster. Even though they may very well be faster than the Mathf and Vector3 implementations, chances are high that the (burstable) Mathematics implementations of these methods put yours to shame. ;)

    But the most performance critical aspect could easily be this:

    Code (CSharp):
    1. var jobHandle = job.Schedule(vertices.Length, vertices.Length);
    2. jobHandle.Complete();
    You are starting background jobs and force them to complete immediately. Instead, you should aim to schedule the jobs and either monitor the job handle for completion if you can delay receiving the results a couple frames later. If you need the results within the same frame, schedule in Update() and call Complete() in LateUpdate() to allow for (hopefully ample) time for the jobs to complete in the background while other main thread Update() calls are running.
     
    ATMLVE likes this.
  4. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Thanks all for your help! I've implemented some of your suggestions to an extent and already seen a good bit of improvement. You were exactly right on my batch size, I have everything running on all ten threads now. I've made everything static too.

    To answer the the bit about things like Source and _random, those come from a noise script that I didn't write but I also hardcoded some things just to make it run (hence why for instance it's generating the same random numbers every time). For real optimization there I'll need to dive into that on my own sometime.

    The profiler now looks quite a bit different, but I'm even more confused on what it's telling me. Right now the hierarchy says a huge chunk of time is spent just waiting:


    Why does that not appear to be evident in the timeline? It still seems like it's doing something constantly - the idle gaps in the main thread are full across all the job threads:
     
  5. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    I would be surprised if you actually needed to do that much optimization after you get rid of the final GC allocations and begin to use burst. That thing is a beast, and can easily give 10x perf improvements just by enabling it on a job. Obviously, if you need to optimize further, then there's probably stuff you can do, but in my experience, just enabling burst solves 98% of my performance problems.

    The hierarchy window only shows you the hierarchy of one thread at the time, and the default thread is the main thread. You see in the timeline that the main thread is idling 11 times, around 250ms each time, 250ms * 11 = 2750ms = ~2802ms which is the time the main thread is reported to be idling in the hierarchy window.
     
    Last edited: Aug 25, 2023
    CodeSmile and ATMLVE like this.
  6. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    I believe I got rid of a lot of the GC allocations. Unfortunately when I try and implement [BurstCompile] I get array index out of bounds exceptions. This doesn't make sense to me from my experience since all the other code is identical and works without it. Is there anything about burst compiling that could be the reason why for me to look into?
     
  7. CodeSmile

    CodeSmile

    Joined:
    Apr 10, 2014
    Posts:
    3,721
    Since you use a parallel job, the index passed into Execute is the one you are supposed to use for your arrays and none other, You can however disable this check with an attribute whose name I can‘t recall, I think NativeDisableParallelForRestriction which goes on the array fields.

    Note that in that case you are responsible for ensuring that there won‘t be any (read/write) race conditions where at times one thread may write an index before another reads it, and other times it‘s the other way around, leading to different results.

    Burst is really worth any extra refactoring you may need to do!
     
    ATMLVE likes this.
  8. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    NativeDisableParallelForRestriction sounds like the correct one yeah, but I thought that would trigger also if you didn't use burst?

    A quick way to check if it's that is also to temporarily turn off the various burst safety systems. Like the jobsdebugger, and burst safety checks (note, you should turn these back on when you've done investigating, they're really helpful for catching errors).
     
    ATMLVE likes this.
  9. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    This was my issue. I was using a Vector3[ ][ ] in the job, passing in a value for the first index to get it's array. That second index is already looping through the main iterator i, so I just assigned the first vector to a native array and passed it into the job and it worked. And oh my goodness you weren't kidding, it runs like lightning and there's still optimizing to do.

    This is definitely cases solved for now, although to get this to work I tore my code apart, now I have to build everything back up again. I may run into more finnicky errors like this iterator issue, if so I'll be back. Thanks all here for your help.
     
  10. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Progress update, I've refactored everything to work with jobs and results are very nice.

    My main loop that runs once per second, before and after jobs/burst:



    GC allocation is still up a bit but the actual lag is way down. At lower terrain resolutions the hitch is imperceptible. One major issue is there is a memory leak, so performance gets worse over time but I still have to hunt that down.

    These are my surface normals jobs:
    Code (CSharp):
    1. using Unity.Burst;
    2. using Unity.Collections;
    3. using Unity.Jobs;
    4. using Unity.VisualScripting;
    5. using UnityEngine;
    6.  
    7. [BurstCompile]
    8. public struct SurfaceNormalsJob : IJobParallelFor
    9. {
    10.     public NativeArray<int> jobTrianglesA;
    11.     public NativeArray<int> jobTrianglesB;
    12.     public NativeArray<int> jobTrianglesC;
    13.  
    14.     public NativeArray<Vector3> jobPointA;
    15.     public NativeArray<Vector3> jobPointB;
    16.     public NativeArray<Vector3> jobPointC;
    17.  
    18.     public NativeArray<int> jobEdgeFansVertexIndicesA;
    19.     public NativeArray<int> jobEdgeFansVertexIndicesB;
    20.     public NativeArray<int> jobEdgeFansVertexIndicesC;
    21.  
    22.     public NativeArray<Vector3> jobNormals;
    23.  
    24.     public void Execute(int i) // i loops through the vertices
    25.     {
    26.         Vector3 triangleNormal = SurfaceNormalFromIndices(jobPointA[i], jobPointB[i], jobPointC[i]); // calculate the normal of this triangle
    27.  
    28.         // Explanation with previous code
    29.         {
    30.             // This would check if the edgefansIndices entry at the index of the first vertex of the triangle was 0 (aka if the vertex was inside the chunk and not at the edge)
    31.             // If it was, then we would just grab the normals' index at that vertex, and apply the normal to that vertex.
    32.             // However within this burst compile job, I cannot access whatever index I want from normals,
    33.             // so instead below, I take the extra step to check if the current iterator i is equal to the vertex currently being considered. If it is, I apply it.
    34.             // An issue with this method though is that it doesn't work.
    35.  
    36.             //if (edgefansIndices[vertexIndexA] == 0) edgefansVertexIndicesA[i] = edgefansIndices[vertexIndexA];
    37.             //{
    38.             //    normals[vertexIndexA] += triangleNormal;
    39.             //}
    40.             //if (edgefansIndices[vertexIndexB] == 0)
    41.             //{
    42.             //    normals[vertexIndexB] += triangleNormal;
    43.             //}
    44.             //if (edgefansIndices[vertexIndexC] == 0)
    45.             //{
    46.             //    normals[vertexIndexC] += triangleNormal;
    47.             //}
    48.         }
    49.  
    50.         if (jobEdgeFansVertexIndicesA[i] == 0)
    51.         {
    52.             jobNormals[jobTrianglesA[i]] += triangleNormal;
    53.         }
    54.         if (jobEdgeFansVertexIndicesB[i] == 0)
    55.         {
    56.             jobNormals[jobTrianglesB[i]] += triangleNormal;
    57.         }
    58.         if (jobEdgeFansVertexIndicesC[i] == 0)
    59.         {
    60.             jobNormals[jobTrianglesC[i]] += triangleNormal;
    61.         }
    62.  
    63.         static Vector3 SurfaceNormalFromIndices(Vector3 jobPointA, Vector3 jobPointB, Vector3 jobPointC)
    64.         {
    65.             // Get an aproximation of the vertex normal using two other vertices that share the same triangle
    66.             Vector3 sideAB = jobPointB - jobPointA;
    67.             Vector3 sideAC = jobPointC - jobPointA;
    68.             Vector3 side = Vector3.Cross(sideAB, sideAC);
    69.             return side.normalized;
    70.         }
    71.     }
    72. }
    73.  
    74. public struct SurfaceBorderNormalsJobBorder : IJobParallelFor
    75. {
    76.     public NativeArray<int> jobTrianglesA;
    77.     public NativeArray<int> jobTrianglesB;
    78.     public NativeArray<int> jobTrianglesC;
    79.  
    80.     public NativeArray<Vector3> jobPointA;
    81.     public NativeArray<Vector3> jobPointB;
    82.     public NativeArray<Vector3> jobPointC;
    83.  
    84.     public int quadRes;
    85.  
    86.     public NativeArray<Vector3> jobNormals;
    87.  
    88.     public void Execute(int i) // i loops through the vertices
    89.     {
    90.         if (i < jobTrianglesA.Length)
    91.         {
    92.  
    93.  
    94.             int vertexIndexA = jobTrianglesA[i];
    95.             int vertexIndexB = jobTrianglesB[i];
    96.             int vertexIndexC = jobTrianglesC[i];
    97.  
    98.             Vector3 triangleNormal = SurfaceNormalFromIndices(jobPointA[i], jobPointB[i], jobPointC[i]); // calculate the normal of this triangle
    99.  
    100.             // Previous code
    101.             {
    102.                 // This would check if the edgefansIndices entry at the index of the first vertex of the triangle was 0 (aka if the vertex was inside the chunk and not at the edge)
    103.                 // If it was, then we would just grab the normals' index at that vertex, and apply the normal to that vertex.
    104.                 // However within this burst compile job, I cannot access whatever index I want from normals,
    105.                 // so instead below, I take the extra step to check if the current iterator i is equal to the vertex currently being considered. If it is, I apply it.
    106.                 // An issue with this method though is that it doesn't work.
    107.  
    108.                 //if (edgefansIndices[vertexIndexA] == 0) edgefansVertexIndicesA[i] = edgefansIndices[vertexIndexA];
    109.                 //{
    110.                 //    normals[vertexIndexA] += triangleNormal;
    111.                 //}
    112.                 //if (edgefansIndices[vertexIndexB] == 0)
    113.                 //{
    114.                 //    normals[vertexIndexB] += triangleNormal;
    115.                 //}
    116.                 //if (edgefansIndices[vertexIndexC] == 0)
    117.                 //{
    118.                 //    normals[vertexIndexC] += triangleNormal;
    119.                 //}
    120.             }
    121.  
    122.             if (vertexIndexA >= 0 && (vertexIndexA % (quadRes + 1) == 0 ||
    123.                 vertexIndexA % (quadRes + 1) == quadRes ||
    124.                 (vertexIndexA >= 0 && vertexIndexA <= quadRes) ||
    125.                 (vertexIndexA >= (quadRes + 1) * quadRes && vertexIndexA < (quadRes + 1) * (quadRes + 1))))
    126.             {
    127.                 jobNormals[jobTrianglesA[i]] += triangleNormal;
    128.             }
    129.             if (vertexIndexB >= 0 && (vertexIndexB % (quadRes + 1) == 0 ||
    130.                 vertexIndexB % (quadRes + 1) == quadRes ||
    131.                 (vertexIndexB >= 0 && vertexIndexB <= quadRes) ||
    132.                 (vertexIndexB >= (quadRes + 1) * quadRes && vertexIndexB < (quadRes + 1) * (quadRes + 1))))
    133.             {
    134.                 jobNormals[jobTrianglesB[i]] += triangleNormal;
    135.             }
    136.             if (vertexIndexC >= 0 && (vertexIndexC % (quadRes + 1) == 0 ||
    137.                 vertexIndexC % (quadRes + 1) == quadRes ||
    138.                 (vertexIndexC >= 0 && vertexIndexC <= quadRes) ||
    139.                 (vertexIndexC >= (quadRes + 1) * quadRes && vertexIndexC < (quadRes + 1) * (quadRes + 1))))
    140.             {
    141.                 jobNormals[jobTrianglesC[i]] += triangleNormal;
    142.             }
    143.  
    144.             static Vector3 SurfaceNormalFromIndices(Vector3 jobPointA, Vector3 jobPointB, Vector3 jobPointC)
    145.             {
    146.                 // Get an aproximation of the vertex normal using two other vertices that share the same triangle
    147.                 Vector3 sideAB = jobPointB - jobPointA;
    148.                 Vector3 sideAC = jobPointC - jobPointA;
    149.                 Vector3 side = Vector3.Cross(sideAB, sideAC);
    150.                 return side.normalized;
    151.             }
    152.         }
    153.     }
    154. }
    155.  
    156.  
     
  11. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    Quick note, you haven't burst compiled your SurfaceBorderNormalsJobBorder. An easy way to check if a job is burst compiled (without checking the code obviously) is to check the timeline view in the profiler and see if that job has a greenish tint.

    Also, now that you've gotten your code a bit more under control you can change your math to using unity mathematics. You can use NativeArray.Reinterpret<float3>() to translate from the vector3 arrays.

    Don't have that much else to say without seeing the surrounding code. Also, regarding the memory leak, if it's a memory leak from the NativeArray's you can check where it's allocated from by using the Show Full Stack Traces:

    upload_2023-9-1_8-32-44.png
     
    ATMLVE likes this.
  12. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Thanks for your reply. I'd originally tried putting [BurstCompile] down there and it said I only needed it once on the page, but I'm clearly misremembering as I did try later and it worked. The snippet I posted is using burst compile for all jobs. As for the memory leak, that was silly simple, I just wasn't disposing of some arrays I wasn't even using.

    For the math, can you direct me towards some info on specifically what you mean? And are you saying to use NativeArray.Reinterpret<float3>() on my allocated native arrays and it'll make the jobs run faster?
     
  13. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    I have another question. So I have a monobehevior script that generates a bunch of arrays, once, and then it's done.

    Then on my once-per-second update, I check which chunks I need to update and do all this code I've been working on here, and by far the largest performance hit for me right now is allocating my native arrays for the jobs. What I'm doing is taking those original monobehevior arrays and allocating native arrays from them for every chunk I need to generate or update.

    Can I boost my performance by making those initial arrays native arrays from the getgo, and then making copies of them? Would that not help since I'm still allocating a new native array in either case? And if that's the case, is there anything else I can do for this use case?
     
  14. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    If you change all the places you use Vector3 in your jobs to float3, and change your math functions to use Unity.Mathematics.math, then you might see a speed improvement. However, when you change your job NativeArray<Vector3> 'input parameters' to NativeArray<float3> you'll probably get a compiler error at the place where you assign them. However, at that place you can do NativeArray.Reinterpret<float3>. Here's an example:
    Code (CSharp):
    1. void Update()
    2. {
    3.      var myVector3Array = new NativeArray<Vector3>(N, Allocator.TempJob);
    4.      new MyFloat3Job
    5.      {
    6.            MyFloat3Array = myVector3Array.Reinterpret<float3>()
    7.      }
    8.      .Schedule();
    9.  
    10.      // ...
    11. }
    12.  
    13. [BurstCompile]
    14. struct MyFloat3Job
    15.    : IJob
    16. {
    17.      public NativeArray<float3> MyFloat3Array;
    18.  
    19.      // ...
    20. }
    21.  
    Note, be a bit careful with NativeArray.Reinterpret as it's reinterpreting the underlying memory. In my experience it's no problem to reinterpret from Vector3 to float3 (or visa versa), but if you're reinterpreting between other types you need to ensure that the types match what you expect. Also, a better alternative is to make your NativeArray<float3> from the start if that's an option for you.

    Hard to say without more info, but in general, pre-allocating and reusing memory is good. Also, do you need to make copies of your arrays?
     
    ATMLVE likes this.
  15. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Thank you, after some necessary changes I'm now sitting waiting for jobs again so my work isn't nearly done. In a similar vein, how can I use the same NativeArray across multiple jobs at once? I have an array that I initialize as readonly that I only need information from for the job, I do not write to it. But no matter what I try I get the same error, "The previously scheduled job A writes to the Unity.Collections.NativeArray`1[System.Int32] B"

    Also for my jobs, I'm creating empty native arrays to put into the job which fills them with values and outputs that. Is that sort of thing a 'correct practice' or do I have the wrong idea?
     
  16. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Sorry for the double post but let me lay out my example:

    I have a large terrain mesh made of many chunks for the LOD system. As the player moves, they cause small chunks to be deleted and new smaller chunks to be created for the LOD system: every second, the code checks if any chunks should be updated and updates them as needed (In a real scenario it would be based on how far the player moves - besides the point).

    For my job, right now I have every chunk updating in series - I update all of one chunk, then the next chunk, then the next, etc. This is slow because every chunk needs its vertices calculated, then its triangles, then its normals. It was actually fine before doing it this way, but after reintegrating noise calculation and optimizing everything else so much, the time it takes to calculate the vertices is now a huge bottleneck (at least I think that's what this is displaying):


    Here below is my vertices job scheduler with comments. There is a lot that goes into the job but most of it is actually hardcoded stuff. I would love to schedule all of these at once, then complete them all once they're all started. The problem is I can't do that with NativeArrays because of the issue in my last post. Even though the only thing at all that I'm actually modifying is jobVertices, which starts out blank in the first place and seems so silly I need to even create an array and pass it in for that purpose in the first place. Every other array I only need to read, not write to.

    Code (CSharp):
    1.    public void StartVerticesJob()
    2.     {
    3.         vertexArray = new NativeArray<Vector3>(chunkVertices.Length, Allocator.TempJob);
    4.         vertexQuadArray = new NativeArray<Vector3>(Presets.quadTemplateVertices[quadIndex], Allocator.TempJob);
    5.  
    6.         var verticesJob = new EvaluateVertexPointJob
    7.         {
    8.             jobVertices = vertexArray,                         // *always starts blank
    9.             jobQuadTemplateVertices = vertexQuadArray,         // array that is set per chunk of vertices and is one of 16 arrays for every job. Could have the entire thing one array, and just tell job which area to look at
    10.             transformMatrix = transformMatrix,                 // variable that depends on chunk position - I believe what ultimately determines where in the unit cube these vertices are
    11.             _random = terrainFace.planetScript._random, // *hardcoded array
    12.             Source = terrainFace.planetScript.Source,   // *hardcoded array
    13.             Grad3 = terrainFace.planetScript.Grad3,     // *hardcoded array
    14.             noiseFilterValues = terrainFace.noiseFilterValues,// *hardcoded array of the noise filter values, would be static during real gameplay
    15.             planetSize = terrainFace.planetScript.planetSize, // *hardcoded variable
    16.             valueCount = terrainFace.noiseFilterValuesCount   // *hardcoded variable
    17.         };
    18.  
    19.         jobHandlerVertices = verticesJob.Schedule(Mathf.Max(chunkVertices.Length, terrainFace.planetScript._random.Length), 512);
    20.     }

    This is the actual job, the 3D noise calculator. I haven't converted anything to float3 / unity.mathematics yet:
    Code (CSharp):
    1. using System;
    2. using Unity.Collections;
    3. using Unity.Jobs;
    4. using Unity.Mathematics;
    5. using UnityEngine;
    6. using Unity.Burst;
    7.  
    8. [BurstCompile]
    9. public struct EvaluateVertexPointJob : IJobParallelFor
    10. {
    11.     public NativeArray<Vector3> jobVertices;
    12.     [ReadOnly] public NativeArray<Vector3> jobQuadTemplateVertices;
    13.     public Matrix4x4 transformMatrix;
    14.     public float planetSize;
    15.  
    16.     [ReadOnly] public NativeArray<float> noiseFilterValues;
    17.     public int valueCount;
    18.  
    19.     [ReadOnly] public NativeArray<int> Source;
    20.     [ReadOnly] public NativeArray<int> Grad3;
    21.     public NativeArray<int> _random;
    22.  
    23.     public void Execute(int i)
    24.     {
    25.         if (i < jobVertices.Length)
    26.         {
    27.             var pointOnCube = transformMatrix.MultiplyPoint(jobQuadTemplateVertices[i]);
    28.             var pointOnUnitSphere = pointOnCube.normalized * planetSize;
    29.  
    30.             const int RandomSize = 256;
    31.  
    32.             /// Skewing and unskewing factors for 2D, 3D and 4D,
    33.             /// some of them pre-multiplied.
    34.             const double F3 = 1.0 / 3.0;
    35.             const double G3 = 1.0 / 6.0;
    36.  
    37.             Randomize(0, _random, Source);
    38.  
    39.             jobVertices[i] = CalculatePointOnPlanet(pointOnUnitSphere, planetSize, _random, Grad3, noiseFilterValues, valueCount);
    40.  
    41.             static float3 CalculatePointOnPlanet(float3 pointOnUnitSphere, float planetRadius, NativeArray<int> _random, NativeArray<int> Grad3, NativeArray<float> noiseFilterValues, int valueCount)
    42.             {
    43.                 float elevation = EvaluateSimpleNoise(pointOnUnitSphere, _random, Grad3, noiseFilterValues, valueCount); // this does what the main for loop below does, just separately to be stored separately
    44.  
    45.                 //if (noiseFilterValues[0] == 1)
    46.                 //{
    47.                 //    // this below is a small if/else statement. If useFirstLayerAsMask is true, then mask takes the value of firstLayerValue,
    48.                 //    // which is the current iteration i but just on the first layer only. If not, then mask is set to 1 to give it no effect.
    49.                 //    //float mask = (noiseFilterValues[j].useFirstLayerAsMask) ? firstLayerValue : 1;
    50.                 //    elevation += EvaluateSimpleNoise(pointOnUnitSphere, _random, Grad3, noiseFilterValues);// * mask;
    51.                 //}
    52.  
    53.                 return (1 + elevation) * planetRadius * pointOnUnitSphere;
    54.             }
    55.  
    56.             static float EvaluateSimpleNoise(Vector3 point, NativeArray<int> _random, NativeArray<int> Grad3, NativeArray<float> noiseFilterValues, int valueCount)
    57.             {
    58.                 float firstLayerValue = 1;
    59.                 float elevation = 0;
    60.                 float noiseValue = 0;
    61.  
    62.                 for (int k = 0; k < noiseFilterValues.Length / valueCount; k++) // starts at 1 because the first layer is store seperately
    63.                 {
    64.                     int n = 0;
    65.                     int iterator = k * valueCount;
    66.                     float layerEnabled = noiseFilterValues[iterator]; n++;
    67.                     if (layerEnabled == 1)
    68.                     {
    69.                         float useFirstLayerAsMask = noiseFilterValues[iterator + n]; n++;
    70.                         float numLayers = noiseFilterValues[iterator + n]; n++;
    71.                         float noiseStrength = noiseFilterValues[iterator + n]; n++;
    72.                         float baseRoughness = noiseFilterValues[iterator + n]; n++;
    73.                         float noiseRoughness = noiseFilterValues[iterator + n]; n++;
    74.                         float persistence = noiseFilterValues[iterator + n]; n++;
    75.                         float minValue = noiseFilterValues[iterator + n]; n++;
    76.                         float weight = noiseFilterValues[iterator + n];
    77.                         Vector3 center = new Vector3(0, 0, 0);
    78.  
    79.                         noiseValue = 0;
    80.                         float frequency = baseRoughness / 100;
    81.                         float amplitude = 1;
    82.  
    83.                         // this code iterates layers of noise on top of each other. It starts by defining a placeholder variable v for iteration.
    84.                         // v takes the current point of the unit sphere, multiplys by frequency and the offset within the noise, then evaluates that point within the noise function to get a height.
    85.                         // that height, from -1 to 1, is converted to a value of 0 to 1, then multiplied by amplitude.
    86.                         for (int j = 0; j < numLayers; j++)
    87.                         {
    88.                             float v = Evaluate((point * frequency) + center, _random, Grad3);
    89.                             noiseValue += (v + 1) * 0.5f * amplitude;
    90.                             frequency *= noiseRoughness / 50; // when roughness is greater than 1, the frequency will increase with each filter, adding finer detail (I think)
    91.                             amplitude *= persistence / 50; // amplitude is a modififer assigned to each point. When persistence is less than 1, the amplitude decreases with each layer.
    92.                         }
    93.  
    94.                         noiseValue = Mathf.Max(0, noiseValue - minValue) * noiseStrength / 100;
    95.  
    96.                         if (k == 0)
    97.                         {
    98.                             firstLayerValue = noiseValue;
    99.                         }
    100.                         else if (useFirstLayerAsMask == 1)
    101.                         {
    102.                             noiseValue = noiseValue * firstLayerValue;
    103.                         }
    104.                     }
    105.                     elevation += noiseValue;
    106.                 }
    107.                 return elevation;
    108.             }
    109.  
    110.  
    111.  
    112.  
    113.             //static float EvaluateRigidNoise(Vector3 point, NativeArray<int> _random, NativeArray<int> Grad3)
    114.             //{
    115.             //    float noiseValue = 0;
    116.             //    float frequency = baseRoughness / 100;
    117.             //    float amplitude = 1;
    118.             //    weight = 1; // create a weight for detail
    119.  
    120.             //    // this code iterates layers of noise on top of each other. It starts by defining a placeholder variable v for iteration.
    121.             //    // v takes the current point of the unit sphere, multiplys by frequency and the offset within the noise, then evaluates that point within the noise function to get a height.
    122.             //    // that height, from -1 to 1, is converted to a value of 0 to 1, then multiplied by amplitude.
    123.             //    for (int j = 0; j < numLayers; j++)
    124.             //    {
    125.             //        //for rigid noise, instead of converting -1to1 to 0to1, it takes the absolute value, subtracts it from 1, then increase it exponentially to get peakier values
    126.             //        float v = 1 - Mathf.Abs(Evaluate((point * frequency) + center, _random, Grad3));
    127.             //        v *= v;
    128.             //        v *= weight; // weight and v should increase in detail the higher up they are. 2:30 in ep 04
    129.             //        weight = Mathf.Clamp01(v * weight / 50); // multiply by the weight but ensure it stays between 0 and 1
    130.  
    131.             //        noiseValue += v * amplitude;
    132.             //        frequency *= noiseRoughness; // when roughness is greater than 1, the frequency will increase with each filter, adding finer detail (I think)
    133.             //        amplitude *= persistence; // amplitude is a modififer assigned to each point. When persistence is less than 1, the amplitude decreases with each layer.
    134.             //    }
    135.             //    noiseValue = Mathf.Max(0, noiseValue - minValue);
    136.             //    return noiseValue * noiseStrength / 100;
    137.             //}
    138.  
    139.             static float Evaluate(float3 point, NativeArray<int> _random, NativeArray<int> Grad3)
    140.             {
    141.                 double x = point.x;
    142.                 double y = point.y;
    143.                 double z = point.z;
    144.                 double n0 = 0, n1 = 0, n2 = 0, n3 = 0;
    145.  
    146.                 // Noise contributions from the four corners
    147.                 // Skew the input space to determine which simplex cell we're in
    148.                 double s = (x + y + z) * F3;
    149.  
    150.                 // for 3D
    151.                 int i = FastFloor(x + s);
    152.                 int j = FastFloor(y + s);
    153.                 int k = FastFloor(z + s);
    154.  
    155.                 double t = (i + j + k) * G3;
    156.  
    157.                 // The x,y,z distances from the cell origin
    158.                 double x0 = x - (i - t);
    159.                 double y0 = y - (j - t);
    160.                 double z0 = z - (k - t);
    161.  
    162.                 // For the 3D case, the simplex shape is a slightly irregular tetrahedron.
    163.                 // Determine which simplex we are in.
    164.                 // Offsets for second corner of simplex in (i,j,k)
    165.                 int i1, j1, k1;
    166.  
    167.                 // coords
    168.                 int i2, j2, k2; // Offsets for third corner of simplex in (i,j,k) coords
    169.  
    170.                 if (x0 >= y0)
    171.                 {
    172.                     if (y0 >= z0)
    173.                     {
    174.                         // X Y Z order
    175.                         i1 = 1;
    176.                         j1 = 0;
    177.                         k1 = 0;
    178.                         i2 = 1;
    179.                         j2 = 1;
    180.                         k2 = 0;
    181.                     }
    182.                     else if (x0 >= z0)
    183.                     {
    184.                         // X Z Y order
    185.                         i1 = 1;
    186.                         j1 = 0;
    187.                         k1 = 0;
    188.                         i2 = 1;
    189.                         j2 = 0;
    190.                         k2 = 1;
    191.                     }
    192.                     else
    193.                     {
    194.                         // Z X Y order
    195.                         i1 = 0;
    196.                         j1 = 0;
    197.                         k1 = 1;
    198.                         i2 = 1;
    199.                         j2 = 0;
    200.                         k2 = 1;
    201.                     }
    202.                 }
    203.                 else
    204.                 {
    205.                     // x0 < y0
    206.                     if (y0 < z0)
    207.                     {
    208.                         // Z Y X order
    209.                         i1 = 0;
    210.                         j1 = 0;
    211.                         k1 = 1;
    212.                         i2 = 0;
    213.                         j2 = 1;
    214.                         k2 = 1;
    215.                     }
    216.                     else if (x0 < z0)
    217.                     {
    218.                         // Y Z X order
    219.                         i1 = 0;
    220.                         j1 = 1;
    221.                         k1 = 0;
    222.                         i2 = 0;
    223.                         j2 = 1;
    224.                         k2 = 1;
    225.                     }
    226.                     else
    227.                     {
    228.                         // Y X Z order
    229.                         i1 = 0;
    230.                         j1 = 1;
    231.                         k1 = 0;
    232.                         i2 = 1;
    233.                         j2 = 1;
    234.                         k2 = 0;
    235.                     }
    236.                 }
    237.  
    238.                 // A step of (1,0,0) in (i,j,k) means a step of (1-c,-c,-c) in (x,y,z),
    239.                 // a step of (0,1,0) in (i,j,k) means a step of (-c,1-c,-c) in (x,y,z),
    240.                 // and
    241.                 // a step of (0,0,1) in (i,j,k) means a step of (-c,-c,1-c) in (x,y,z),
    242.                 // where c = 1/6.
    243.  
    244.                 // Offsets for second corner in (x,y,z) coords
    245.                 double x1 = x0 - i1 + G3;
    246.                 double y1 = y0 - j1 + G3;
    247.                 double z1 = z0 - k1 + G3;
    248.  
    249.                 // Offsets for third corner in (x,y,z)
    250.                 double x2 = x0 - i2 + F3;
    251.                 double y2 = y0 - j2 + F3;
    252.                 double z2 = z0 - k2 + F3;
    253.  
    254.                 // Offsets for last corner in (x,y,z)
    255.                 double x3 = x0 - 0.5;
    256.                 double y3 = y0 - 0.5;
    257.                 double z3 = z0 - 0.5;
    258.  
    259.                 // Work out the hashed gradient indices of the four simplex corners
    260.                 int ii = i & 0xff;
    261.                 int jj = j & 0xff;
    262.                 int kk = k & 0xff;
    263.  
    264.                 // Calculate the contribution from the four corners
    265.                 double t0 = 0.6 - x0 * x0 - y0 * y0 - z0 * z0;
    266.                 if (t0 > 0)
    267.                 {
    268.                     t0 *= t0;
    269.                     int gi0 = _random[ii + _random[jj + _random[kk]]] % 12;
    270.                     n0 = t0 * t0 * Dot(Grad3[gi0 * 3], Grad3[gi0 * 3 + 1], Grad3[gi0 * 3 + 2], x0, y0, z0);
    271.                 }
    272.  
    273.                 double t1 = 0.6 - x1 * x1 - y1 * y1 - z1 * z1;
    274.                 if (t1 > 0)
    275.                 {
    276.                     t1 *= t1;
    277.                     int gi1 = _random[ii + i1 + _random[jj + j1 + _random[kk + k1]]] % 12;
    278.                     n1 = t1 * t1 * Dot(Grad3[gi1 * 3], Grad3[gi1 * 3 + 1], Grad3[gi1 * 3 + 2], x1, y1, z1);
    279.                 }
    280.  
    281.                 double t2 = 0.6 - x2 * x2 - y2 * y2 - z2 * z2;
    282.                 if (t2 > 0)
    283.                 {
    284.                     t2 *= t2;
    285.                     int gi2 = _random[ii + i2 + _random[jj + j2 + _random[kk + k2]]] % 12;
    286.                     n2 = t2 * t2 * Dot(Grad3[gi2 * 3], Grad3[gi2 * 3 + 1], Grad3[gi2 * 3 + 2], x2, y2, z2);
    287.                 }
    288.  
    289.                 double t3 = 0.6 - x3 * x3 - y3 * y3 - z3 * z3;
    290.                 if (t3 > 0)
    291.                 {
    292.                     t3 *= t3;
    293.                     int gi3 = _random[ii + 1 + _random[jj + 1 + _random[kk + 1]]] % 12;
    294.                     n3 = t3 * t3 * Dot(Grad3[gi3 * 3], Grad3[gi3 * 3 + 1], Grad3[gi3 * 3 + 2], x3, y3, z3);
    295.                 }
    296.  
    297.                 // Add contributions from each corner to get the final noise value.
    298.                 // The result is scaled to stay just inside [-1,1]
    299.                 return (float)(n0 + n1 + n2 + n3) * 32;
    300.             }
    301.  
    302.             static void Randomize(int seed, NativeArray<int> _random, NativeArray<int> Source)
    303.             {
    304.                 if (seed != 0)
    305.                 {
    306.                     // Shuffle the array using the given seed
    307.                     // Unpack the seed into 4 bytes then perform a bitwise XOR operation
    308.                     // with each byte
    309.                     //var F = new byte[4];
    310.                     //NativeArray<byte> F = new NativeArray<byte>(4, Allocator.TempJob);
    311.                     //UnpackLittleUint32(seed, F);
    312.  
    313.                     byte buffer1 = (byte)(seed & 0x00ff);
    314.                     byte buffer2 = (byte)((seed & 0xff00) >> 8);
    315.                     byte buffer3 = (byte)((seed & 0x00ff0000) >> 16);
    316.                     byte buffer4 = (byte)((seed & 0xff000000) >> 24);
    317.  
    318.                     for (int i = 0; i < Source.Length; i++)
    319.                     {
    320.                         _random[i] = Source[i] ^ buffer1;
    321.                         _random[i] ^= buffer2;
    322.                         _random[i] ^= buffer3;
    323.                         _random[i] ^= buffer4;
    324.  
    325.                         _random[i + RandomSize] = _random[i];
    326.                     }
    327.  
    328.                 }
    329.                 else
    330.                 {
    331.                     for (int i = 0; i < RandomSize; i++)
    332.                         _random[i + RandomSize] = _random[i] = Source[i];
    333.                 }
    334.             }
    335.  
    336.             static double Dot(int g0, int g1, int g2, double x, double y, double z)
    337.             {
    338.                 return g0 * x + g1 * y + g2 * z;
    339.             }
    340.  
    341.  
    342.             static int FastFloor(double x)
    343.             {
    344.                 return x >= 0 ? (int)x : (int)x - 1;
    345.             }
    346.  
    347.             /// <summary>
    348.             /// Unpack the given integer (int32) to an array of 4 bytes  in little endian format.
    349.             /// If the length of the buffer is too smal, it wil be resized.
    350.             /// </summary>
    351.             /// <param name="value">The value.</param>
    352.             /// <param name="buffer">The output buffer.</param>
    353.             //static NativeArray<byte> UnpackLittleUint32(int value, NativeArray<byte> buffer)
    354.             //{
    355.             //    //if (buffer.Length < 4)
    356.             //    //    Array.Resize(ref buffer, 4);
    357.  
    358.             //    buffer[0] = (byte)(value & 0x00ff);
    359.             //    buffer[1] = (byte)((value & 0xff00) >> 8);
    360.             //    buffer[2] = (byte)((value & 0x00ff0000) >> 16);
    361.             //    buffer[3] = (byte)((value & 0xff000000) >> 24);
    362.  
    363.             //    return buffer;
    364.             //}
    365.         }
    366.     }
    367.  
    368. }
     
    Last edited: Sep 7, 2023
  17. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    I'll look into your posts a bit more deeply later. But to fix this error, add the [ReadOnly] attribute to the native arrays in your job that you're only reading from. Like this:

    Code (CSharp):
    1. [BurstCompile]
    2. public struct EvaluateVertexPointJob
    3.     : IJobParallelFor
    4. {
    5.     // ...
    6.     [ReadOnly]
    7.     public NativeArray<int> Source;
    8.  
    9.     // ...
    10. }
     
    ATMLVE likes this.
  18. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Thanks, I swear I tried that and it didn't work. Which, whoops, I am actually writing to _random in there as well, I'll need to figure out how to fix that. I've updated my code in post #16 with the [ReadOnly]s.

    Edit - If I have 100 little native arrays that I have to create, do stuff on, then copy to a main array (which I'm doing and shows up in the profiler) could it theoretically be better to have one giant array that I allocate and copy only once? Or is the hangup the total number of array entries being allocated/copied, and the amount of arrays those values are in is irrelevant?
     
    Last edited: Sep 7, 2023
  19. Per-Morten

    Per-Morten

    Joined:
    Aug 23, 2019
    Posts:
    109
    Instead of copying, are you able to write directly to the main array, through NativeArray.GetSubArray for example?

    Also, how many vertices do you normally have in a chunk?

    It's still also a bit unclear to me what is actually debug code, and what's not in the EvaluateVertexPointJob. For instance, you still call Randomize with 0, is that just for debugging? Why do you do the i < jobVertices.Length, are you ever scheduling more "iterations" than jobVertices.Length?

    Also, do you need those random numbers to be the same each time in Evaluate for the algorithm to function properly, or do you just need a random number? (if so, unity.mathematics.random is probably a better bet, then you don't need the array either). The noiseFilterValues, is that just a struct that you've put into multiple floats?
    I'm not familiar with the problem you're working with at all, so these might be stupid questions, it's just the questions I'm asking myself when looking at your code.

    If I were you now, I would dig more into what in the EvaluateVertexPointJob, that is taking time (if that's the job that is taking the most time). You can create your own profiler markers that you pass into the job. Like this:

    Code (CSharp):
    1. using Unity.Profiling;
    2.  
    3. public class MyClass
    4. {
    5.       static readonly ProfilerMarker MyProfilerMarker = new ProfilerMarker("MyProfilerMarkerName");
    6.  
    7.       void Update()
    8.       {
    9.               new MyJob
    10.               {
    11.                     Marker = MyProfilerMarker
    12.               }.Schedule().Complete();
    13.        }
    14.  
    15.        [BurstCompile]
    16.        public struct MyJob : IJob
    17.        {
    18.               public ProfilerMarker Marker;
    19.               public void Execute()
    20.               {
    21.                      using var _ = Marker.Auto();
    22.               }
    23.        }
    24. }
    Create multiple markers for the different sections of your code that you want to profile. I would create a marker per static function you have in the EvaluateVertexPointJob.
     
    Last edited: Sep 7, 2023
    ATMLVE likes this.
  20. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    No idea, I'll look into it!


    That's fair, and since you asked them, I'm happy to answer. I just might learn something.

    "Also, how many vertices do you normally have in a chunk?"
    The resolution+1 squared. I've been doing my testing at a resolution of 16, so 289 vertices. It'd be nice to have that in real time but I can go as low as a resolution of 4 probably (so only 25 vertices). At a reasonable update interval, one update would go through several hundred chunks.

    "It's still also a bit unclear to me what is actually debug code, and what's not in the EvaluateVertexPointJob. For instance, you still call Randomize with 0, is that just for debugging?"
    This is a good question and I'm still working through this part. The noise.cs file I was using has a few quirks, I'm not entirely sure how it works and it was clearly not cleaned up before its release as most of the variables listed at the start aren't even used. Ideally I would be passing my own generated seed here but I haven't gotten that far.

    "Why do you do the i < jobVertices.Length, are you ever scheduling more "iterations" than jobVertices.Length?"
    Yes, and maybe you can teach me something here. I only need to loop through the vertices for each vertex, but the native arrays I'm passing in can be larger - for instance, _random, Source, and Grad3 (all from noise.cs) are larger than 25, so at a resolution of 4 the job throws index out of bounds errors at me otherwise. I've tried a few things to solve this and ended up bootlegging it this way.

    "Also, do you need those random numbers to be the same each time in Evaluate for the algorithm to function properly, or do you just need a random number? (if so, unity.mathematics.random is probably a better bet, then you don't need the array either)."
    Answering this one the same as the other, I'm not quite sure what this 3D noise script actually needs...

    "The noiseFilterValues, is that just a struct that you've put into multiple floats?"
    Yes, coming from Sebastian Lague the noise filters have lots of values and there can be an arbitrary number of filters. I pass all this stuff into one single array of floats (and I haven't actually finalized this so not passing all of them yet):


    I'll get started on this too, thanks!
     
  21. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    I have some excellent news for me, after making lots of arrays ReadOnly I was able to isolate the vertex jobs to starting all them first and then finishing them all second (rather than one by one like before). Deep profiling time for this job alone went from 60-90 ms down to around 6-7ms. The frame hitching during an update at my high resolution of 16 is now about as imperceptible as it used to be at 4 (~one sixteenth the number of vertices), and I still have optimizations I can do.
     
  22. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    Now for some bad news, I'm now getting a pretty regular crash when I seemingly am initiating a lot of jobs at once. However this only happens in the editor, I can cause it without fail within seconds in play mode but cannot cause it within a build.

    Edit - this was a bug and is fixed in the latest version:
    https://issuetracker.unity3d.com/product/unity/issues/guid/ECSB-424

    Native Crash Reporting
    =================================================================
    Got a UNKNOWN while executing native code. This usually indicates
    a fatal error in the mono runtime or one of the native libraries
    used by your application.
    =================================================================

    =================================================================
    Managed Stacktrace:
    =================================================================
    at <unknown> <0xffffffff>
    at Unity.Jobs.LowLevel.Unsafe.JobsUtility:ScheduleParallelFor_Injected <0x000c2>
    at Unity.Jobs.LowLevel.Unsafe.JobsUtility:ScheduleParallelFor <0x00062>
    at Unity.Jobs.IJobParallelForExtensions:Schedule <0x000fa>
    at Chunk:StartVerticesJob <0x007e2>
    at TerrainFace:UpdateTree <0x00262>
    at Planet:UpdateMesh <0x0007a>
    at <PlanetGenerationLoop>d__29:MoveNext <0x0011a>
    at UnityEngine.SetupCoroutine:InvokeMoveNext <0x00081>
    at <Module>:runtime_invoke_void_object_intptr <0x000ae>
    =================================================================
    Received signal SIGSEGV
    Obtained 32 stack frames
    0x00007ff639878717 (Unity) `anonymous namespace'::ScheduleManagedJobParallelFor_Internal
    0x00007ff639877828 (Unity) ScheduleManagedJobParallelFor
    0x00007ff638d5db6e (Unity) JobsUtility_CUSTOM_ScheduleParallelFor_Injected
    0x000001975d8f4883 (Mono JIT Code) (wrapper managed-to-native) Unity.Jobs.LowLevel.Unsafe.JobsUtility:ScheduleParallelFor_Injected (Unity.Jobs.LowLevel.Unsafe.JobsUtility/JobScheduleParameters&,int,int,Unity.Jobs.JobHandle&)
    0x000001975d8f4743 (Mono JIT Code) Unity.Jobs.LowLevel.Unsafe.JobsUtility:ScheduleParallelFor (Unity.Jobs.LowLevel.Unsafe.JobsUtility/JobScheduleParameters&,int,int)
    0x000001975d8f3cdb (Mono JIT Code) Unity.Jobs.IJobParallelForExtensions:Schedule<EvaluateVertexPointJob> (EvaluateVertexPointJob,int,int,Unity.Jobs.JobHandle)
    0x000001975d8f3a63 (Mono JIT Code) Chunk:StartVerticesJob () (at C:/Users/ATMLVE/Space Game 1/Assets/Scripts/Planet Generation/Chunk.cs:349)
    ..... trimmed
     
    Last edited: Sep 11, 2023
  23. ATMLVE

    ATMLVE

    Joined:
    Jun 11, 2023
    Posts:
    35
    I have a conceptual question I may as well ask here. I have my vertex job, it runs Execute(int i), inside it are some methods for math. In my code I run several different versions of this job that are almost the same, but not quite. They do all the same math, but the format I send the data in is slightly different or with one extra array for some data I need. But again the math they do is all entirely the same.

    The easiest way I had approached this is to just copy and paste the code to make a separate job and make the changes I needed. Lazy enough, except now I have four different copies of the same job with slight differences and I'm trying to iterate on the math itself and it's a drag copy and pasting it four times, of course.

    Is there a way to handle this where I can have one series of methods written once that four separate jobs all reference? The things I've tried so far have given me errors.