Search Unity

Question Some Questions about Jobs, Burst and ComputeBuffers

Discussion in 'Entity Component System' started by SuperFranTV, Nov 29, 2021.

  1. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Hello all,

    i'am currently testing all out about jobs, burst and computebuffers to get the most performance.
    my project generates large worlds of voxels for now, but only some inside the renderDistance are shown.

    now the questions:

    1. What's more performant or what whould you do?
    i got two paths i can choose:

    Path 1: Create a large NativeArray with Millions of structs inside (like generating a world) and save that to a extern file for later loading

    Path 2: Calculate every x-frame (inside a render Distance) all voxels and save them temporary to NativeArray

    2. Which Job Type should i use and how much batchCount?
    i'am only have two jobs that runs on Update (i now run it every frame is worse)

    onupdate i create a NativeArray with a count for the maximum voxels

    first job checks a noise value from Unity.Mathematics.noise.snoise for every voxel inside the array, set them to solid or not and then sets each Neighbor index for each neighbor Voxel. This job runs as IJobParallel with a batchCount of 1 because i don't see any difference if i change the batchCount.

    second job checks each voxel if solid, then for each neighbor and create the data for the matrices sending to the ComputeBuffer and later use in DrawMeshInstancedProcedual, same as IJobParallel

    i call them in update so:

    Code (CSharp):
    1.             matricesJob = new SetMatrices { voxels = voxels, matrices = matrices }.ScheduleParallel(voxels.Length * 6, batchCount, matricesJob);
    2.             matricesJob.Complete();
    For Short:
    2.1 Is there a better way as using IJobParallel for hugh amount of calls?
    2.2 Whats the right Method to call a Job of Type x?
    2.2.1 Is it better to create a new JobHandle on Update every time i need to call the Job or create a global JobHandle and assign the job?
    2.3 How does the InnerloopBatchCount changes the performance of a job?

    3. Using [BurstCompile(FloatPrecision.Low, FloatMode.Fast)] but where?
    i read about it and and FloatPrecision.Low is currently not support, it uses Medium instead

    3.1 Are there any other options to make the code run faster?
    3.2 Where can I place BurstCompile?
    3.3 Its possible to place it over the whole Script to get it for all inside the Script?

    Thats all for now first.

    If something is miss understanding, let me fix it.

    Thank all iam interested to learn more about, iam more practical nature and can learn better from testing and look on examples.
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    That seems like that is dependent on what you want to do with your game.

    The job type to use depends on your needs, but I'm guessing you want either IJobFor, or IJobParallelForBatch.

    What do you mean by "calls"?
    You don't "call a job". You create it, schedule it, and then force it to complete later.
    JobHandles are structs and live either on the stack or as members to your objects. You can even stuff them inside of NativeArrays and friends. A new JobHandle is generated whenever you schedule a job.
    It changes the distribution of items from the outer loop and the inner loop. The loop control logic around the inner loop is less expensive than the outer loop, but more items in the inner loop makes it harder for threads to evenly distribute the work. For larger quantities, 16 is usually a safe number in the inner loop.
    The attribute goes above the first line of the job struct definition. That's the only place you really need that attribute.
     
    LogicFlow likes this.
  3. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    In my case i want to generate hugh maps of voxels, is there a new way since multithreading (jobs) exist to save data on a file and load it later?

    So currently IJobParallelFor is my Job for doing some calculations million times at the same time ?
    Whats the differend to ParallelForBatch?

    with calls i mean the amount a job should run or scheduled

    I know my short word for this is call xD
    but i see differend types how to do this like:

    Code (CSharp):
    1.             matricesJob = new SetMatrices { voxels = voxels, matrices = matrices }.Schedule(voxels.Length * 6, batchCount, matricesJob);
    2.             matricesJob.Complete();
    or

    Code (CSharp):
    1.         var job = new VelocityJob()
    2.         {
    3.             deltaTime = Time.deltaTime,
    4.             position = position,
    5.             velocity = velocity
    6.         };      
    7. JobHandle jobHandle = job.Schedule(position.Length, 64);
    8.         jobHandle.Complete();
    is there a good way or is there no difference between this? i also see other variant of setting up a job.

    So put it over some voids like Update or global structs that define something will be useless?

    Another Question:
    if i create a NativeArray only so long it takes a job.Complete is this better for memory allocation or should i create only one time a NativeArray OnEnable? Because mobile phones have much less memory for such things

    -------------

    Little additonal question, is there a way how to comment faster with quotes, like your post?
     
  4. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    https://docs.unity3d.com/ScriptReference/Unity.IO.LowLevel.Unsafe.AsyncReadManager.html
    IJobFor is better IJobParallelFor. IJobParallelForBatch is for coarse culling or temp memory batching operations.
    There's not a real difference.
    Typically. There is one exception. If you put it over a class containing static methods that only use basic and Unity mathemetics types, and you also put it over those methods, with Burst 1.5 and later, you get DirectCall Burst optimizations for those functions, which is useful. This only applies if the functions are called from outside a job. They are always Bursted inside a Burst job.

    Unfortunately, this depends on a lot of factors you will need to decide for yourself.

    Highlight text and there's an option to reply to it.
     
  5. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Is there a guide how to use this? i only need it for 1 file but the file contains much data.

    you mean IJobFor is better than IJobyParalellFor?

    Graphics.DrawMeshInstancedProcedural takes 14GB of Ram currently only for 100*100*100 voxels.
    1 Voxel is 6x Quad for each direction but i only draw the quads they can be seen, so the amount is much smaller than the maximum voxels (100 *100*100).
    the ram is used by the computebuffer i think right? If i create a ComputeBuffer with a larger size than the objects i fill into it, takes the computeBuffer more ram?
     
  6. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    I am not aware of one nor have I used this API yet.
    IJobFor does everything IJobParallelFor does but has more scheduling options. I think it is also more likely to have Burst loop logic optimizations since it is intended to replace IJobParallelFor.
    A compute buffer can only hold 1GB and sometimes less before Unity just gives up silently (I'm still investigating if the latter is just my machine or a Unity issue). There are a lot of techniques for compressing voxel representations and submitting the correct data to the GPU. But aside from 3d unit-snapped structure placement mechanics, I usually do freeform stuff so someone else will have to chime in or you can search this subforum for other threads on the topic.
     
    LogicFlow likes this.
  7. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    So should i use IJobFor with ScheduleParallel or only Schedule?

    ---

    For testing my performance i only let Graphics.DrawMeshInstancedProcedural inside update, filled with the data it needs.

    To compare how good my code is so far, I use Minecraft and there it also loads 128³ voxels in all directions, which works better on my laptop than my Unity version at the moment.

    With Voxel i mean 6 quads on each direction, only render the quads who can be seen.

    On my notebook with low technic, minecraft has more than 100 fps but my unity solution has lower than 25 fps, but why?
    Iam only use a basic shader and the one-line for DrawMeshInstanced? Is unity's code so bad?
     
  8. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    IJobFor.ScheduleParallel is what replaces IJobParallelFor.Schedule.

    Probably not. What algorithm are you using to only render quads that can be seen?
     
  9. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    And IJobFor.Schedule is slower?

    After is fills a Array with all Voxels, i check for each voxel if the neighbor is solid, and add the index of the neightbor to a array with maximum size 6. Then later i create the Meshes itself and i check only for each side if the neighborindex is present or is empty.
     
  10. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Some additional Question i got.

    I checked out how i can make my StructuredBuffer much smaller.
    i found RWStructuredBuffer, RWBuffer, Buffer and StructuredBuffer.

    Whats the difference? if i replace my StructuredBuffer with RWStructuredBuffer it doesnt works.
    RWBuffer is in 32bit i know and used for int, float and so on.
     
  11. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    That one is single-threaded.
     
  12. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    I see you got very much experience all about it, i have some strange thing on my shader .hlsl textfile, that is used inside the ShaderGraph.

    Code (CSharp):
    1.     static const float3x3 r[6] = {
    2.     float3x3(1.0f, 0.0f, 0.0f,  0.0f, -1.192093E-07f, -1.0f,  0.0f, 1.0f, -1.192093E-07f), //Up
    3.     float3x3(-10.0f, 0.0f, 0.0f,  0.0f, -1.192093E-07f, 1.0f,  0.0f, 1.0f, -1.192093E-07f), //Down
    4.     float3x3(1.0f, 0.0f, 0.0f,  0.0f, 1.0f, 0.0f,  0.0f, 0.0f, 1.0f), //Forward
    5.     float3x3(-1.0f, 0.0f, 0.0f,  0.0f, 1.0f, 0.0f,  0.0f, 0.0f, -1.0f), //Back
    6.     float3x3(-1.192093E-07f, 0.0f, 1.0f,  0.0f, 1.0f, 0.0f,  -1.0f, 0.0f, -1.192093E-07f), //Left
    7.     float3x3(-1.192093E-07f, 0.0f, -1.0f,  0.0f, 1.0f, 0.0f,  1.0f, 0.0f, -1.192093E-07f) //Right
    8.     };
    9.  
    10.     static const float3 DirectionVector[6] = {
    11.             float3(0, 0.5, 0), //Up
    12.             float3(0, -0.5, 0), //Down
    13.             float3(0, 0, -0.5), //Forward
    14.             float3(0, 0, 0.5), //Back
    15.             float3(-0.5, 0, 0), //Left
    16.             float3(0.5, 0, 0) //Right
    17.     };
    18.  
    19. void ConfigureProcedural () {
    20.     #if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
    21.         int i = _Indexes[unity_InstanceID];
    22.         int y = i / (128 * 128);
    23.         int x = (i - y * 128 * 128) / 128;
    24.         int z = i - y * 128 * 128 - x * 128;
    25.         int d = (i % 6);
    26.         float3 v = float3(x, y, z) + DirectionVector[d];
    27.         float3x3 rot = r[d];
    28.         float3x4 m = float3x4(rot._m00, rot._m01, rot._m02, rot._m03, rot._m10, rot._m11, rot._m12, rot._m13, rot._m20, v.x, v.y, v.z);
    29.         float4x4 mx = float4x4(m._m00, m._m01, m._m02, m._m03, m._m10, m._m11, m._m12, m._m13, m._m20, m._m21, m._m22, m._m23, 0.0, 0.0, 0.0, 1.0);
    30.         unity_ObjectToWorld = mx;
    31.     #endif
    32. }
    i'am totally new to this and need a shorter way, it prints me the error:
    Code (CSharp):
    1. invalid subscript '_m03'
    but if i use c0 or c1, its the same error.

    But if i doing this:

    Code (CSharp):
    1.         float3x4 m = _Matrices[unity_InstanceID];
    2.         unity_ObjectToWorld._m00_m01_m02_m03 = m._m00_m01_m02_m03;
    3.         unity_ObjectToWorld._m10_m11_m12_m13 = m._m10_m11_m12_m13;
    4.         unity_ObjectToWorld._m20_m21_m22_m23 = m._m20_m21_m22_m23;
    5.         unity_ObjectToWorld._m30_m31_m32_m33 = float4(0.0, 0.0, 0.0, 1.0);
    it works fine...

    _Matrices is a StructuredBuffer with float3x4
    _Indexes is a Buffer with ints inside

    If i got this working, i sending much less data to the gpu using the buffers.

    Why I'am need such performance, i will build my base-structure so good as possible, because iam adding more features to it which costs performance over all. Example are more realistic shaders with moveing grass water and so on.


    Some other thing that is not important for the moment:

    In future i want to change from
    Code (CSharp):
    1.     Buffer<int> _Indexes;
    to
    Code (CSharp):
    1.     Buffer<short> _Indexes; //stride of 2
    2.     Buffer<byte> _Indexes; //stride of 1
    this makes possible to sending less data, because i'am only sending values from 0 to 255, not more. I read that Buffer is faster as StructuredBuffer for this case, is this right? and whats the difference to a RWBuffer or RWStructuredBuffer?
     
    Last edited: Nov 30, 2021
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,271
    Yes, I have experience with shaders. But this is the DOTS forum, not the graphics forum.
     
    SuperFranTV likes this.
  14. SuperFranTV

    SuperFranTV

    Joined:
    Oct 18, 2015
    Posts:
    140
    Last edited: Nov 30, 2021