Search Unity

How to draw mesh in the job system?

Discussion in 'Data Oriented Technology Stack' started by laurentlavigne, Jan 21, 2018.

  1. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    following this post I did a quick test and you will be amazed at what happens next!!!

    14k triangles rendered eats up what? 300ms! In the past it would take 1ms, maybe, on a wet day so I definitely am doing something wrong.
    What's the right way to do this sort of things?

    (and too bad we can't pass mesh in a job, I'd love to render them things from there, will there be a way to render stuff from a job or is the exit sync point that has to be... I'm thinking because render stuff must always be spawned from the main thread for unity to do its magic)



    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.Jobs;
    5. using Unity.Collections;
    6. using UnityEngine.Jobs;
    7. using System;
    8.  
    9. public class JobRender : MonoBehaviour
    10. {
    11.  
    12.     public Mesh mesh;
    13.  
    14.     struct RenderJob : IJobParallelFor
    15.     {
    16.         [ReadOnly] public float time;
    17.         [WriteOnly] public NativeArray< Matrix4x4> output;
    18.         [ReadOnly] public float scale;
    19.  
    20.         public void Execute(int i)
    21.         {
    22.             var pos = new Vector3(Mathf.Sin(time + (float)i / 100f), Mathf.Cos(time + (float)i / 100f), (float)i / 100f)*scale;
    23.             var rot = Quaternion.AngleAxis(time * 10 + i, pos);
    24.             output[i] = Matrix4x4.TRS(pos, rot, Vector3.one);
    25.         }
    26.     }
    27.  
    28.     NativeArray<Matrix4x4> output;
    29.     void OnEnable()
    30.     {
    31.         StartCoroutine(Compute());
    32.     }
    33.  
    34.     public int computeSize = 1000000, batchSize = 100;
    35.     public float scale = 3;
    36.     public Material mat;
    37.     IEnumerator Compute()
    38.     {
    39.         while (true)
    40.         {
    41.             var time = Time.time;
    42.             output = new NativeArray<Matrix4x4>(computeSize, Allocator.Persistent, NativeArrayOptions.None);
    43.             var job = new RenderJob()
    44.             {
    45.                 time = time,
    46.                 output = output,
    47.                 scale = scale
    48.             };
    49.             var handleCalculate = job.Schedule(computeSize, batchSize);
    50.  
    51.             yield return new WaitWhile(() => handleCalculate.IsCompleted);
    52.  
    53.             handleCalculate.Complete();
    54.  
    55.             Graphics.DrawMeshInstanced(mesh, 0, mat, output.ToArray(),1023);
    56.             output.Dispose();
    57.         }
    58.     }
    59. }
    60.  
     
  2. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    Ah WaitWhile, you tripped me again... Fausse alerte.

    In fact I'd like to make a request: deprecate either WaitUntil or WaitWhile, both are not necessary because we can always reverse a boolean. Having both is akin to have another version of "if" called "ifNot"...
     
    alexzzzz likes this.
  3. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    Alerte not fausse even with proper wait on end of job...
    I need another set of eyes because I don't understand this profiler, what I see is gfx.ProcessCommand takes up 227ms when the job is finished, why is that? I mean Graphics.DrawRenderInstances is called every frame so if graphics was the cause for this 227ms then it would be like that on every frame, not only when the job is finished.
    Also why is the customyieldinstruction taking up 216ms? And those GC spikes shouldn't happen, or more likely I don't understand all this allocation bizness, in my mind if I allocate the nativearray once, make it persistent and keep overwriting in the job, there should be zero allocation... well except that Joachim said the whole struct gets copied over to the job system, is that the GC spike I'm seeing?


    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.Jobs;
    5. using Unity.Collections;
    6. using UnityEngine.Jobs;
    7. using System;
    8.  
    9. public class JobRender : MonoBehaviour
    10. {
    11.  
    12.     public Mesh mesh;
    13.  
    14.     struct RenderJob : IJobParallelFor
    15.     {
    16.         [ReadOnly] public float time;
    17.         [WriteOnly] public NativeArray< Matrix4x4> output;
    18.         [ReadOnly] public float scale;
    19.  
    20.         public void Execute(int i)
    21.         {
    22.             var pos = new Vector3(Mathf.Sin(time + (float)i / 100f), Mathf.Cos(time + (float)i / 100f), (float)i / 100f)*scale;
    23.             var rot = Quaternion.AngleAxis(time * 10 + i, pos);
    24.             output[i] = Matrix4x4.TRS(pos, rot, Vector3.one);
    25.         }
    26.     }
    27.  
    28.     NativeArray<Matrix4x4> output;
    29.     void OnEnable()
    30.     {
    31.         output = new NativeArray<Matrix4x4>(computeSize,Allocator.Persistent);
    32.         StartCoroutine(Compute());
    33.     }
    34.  
    35.     private void Update()
    36.     {
    37.         Graphics.DrawMeshInstanced(mesh, 0, mat, matrices, 1023);
    38.     }
    39.  
    40.     public int computeSize = 1000000, batchSize = 100;
    41.     public float scale = 3;
    42.     public Material mat;
    43.     Matrix4x4[] matrices;
    44.     IEnumerator Compute()
    45.     {
    46.         while (true)
    47.         {
    48.             var time = Time.time;
    49.             var job = new RenderJob()
    50.             {
    51.                 time = time,
    52.                 output = output,
    53.                 scale = scale
    54.             };
    55.             var handleCalculate = job.Schedule(computeSize, batchSize);
    56.  
    57.             yield return new WaitUntil(() => handleCalculate.IsCompleted);
    58.  
    59.             handleCalculate.Complete();
    60.             matrices = output.ToArray();
    61.         }
    62.     }
    63.  
    64.     void OnDisable() {
    65.         output.Dispose();
    66.     }
    67. }
    68.  
     
  4. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    the spikes and slowdowns comes from matrices = output.ToArray(), it's allocating stuff on the pile, even if I pre-allocate matrices
    How do I get around that?

    PS: I'm sure there is a better way to do this, seeing the examples at Unite, but if what I'm doing is valid, it would be great if DrawMeshInstances takes a NativeArray as parameter.
     
  5. Carpe-Denius

    Carpe-Denius

    Joined:
    May 17, 2013
    Posts:
    804
    I haven't looked into the job system yet so I can't help you with that, but why do you have an endless while loop? Could it be that it gets calculated too often between frames?
     
  6. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,366
    The usual idiom for ToArray is it creates a new array, so spike might be from allocation there. Have you tried the copy commands instead? Those should not create garbage.
     
  7. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    I think you're right. If they use the default collection ToArray, this is the implemetation ... seriously
    Code (CSharp):
    1. public T[] ToArray()
    2. {
    3.     T[] destinationArray = new T[this._size];
    4.     Array.Copy(this._items, 0, destinationArray, 0, this._size);
    5.     return destinationArray;
    6. }
    an overload like this would help a lot:
    Code (CSharp):
    1. public T[] ToArray(Array destinationArray)
    2. {
    3.     Array.Copy(this._items, 0, destinationArray, 0, this._size);
    4. }
     
  8. LennartJohansen

    LennartJohansen

    Joined:
    Dec 1, 2014
    Posts:
    2,292
    You need to wait until they make the native array overload to the function
     
  9. snacktime

    snacktime

    Joined:
    Apr 15, 2013
    Posts:
    2,366
    NativeArray.CopyTo looks like it would work.
     
  10. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    4,013
    Below you can find a modified version of your example. It now has zero per frame allocations (beside Unity internal GC issues) and runs in 2.5ms on hardware from 2008.

    The matrix array copy from native to managed memory is still "inefficient", but according to a few answers in the forum, UT is working on that.

    I wanted to implement rendering of more than 1023 instances, but noticed the API design doesn't allow me to do that easily. These issues could be easily addressed by UT though:
    1. NativeArray.CopyTo is lacking "startIndex" and "count" parameters. Perhaps I would like to copy only a specific segment from a source array to specific segment in a destination array.
    2. Having a "startIndex" parameter for Graphics.DrawMeshInstanced, that can be used to offset into the "matrices" array. That's why I didn't implement rendering of more than 1023 instances, because I would have needed to split arrays or copy more data around.
    Microsoft is using this "startIndex" and "count" pattern for quite a while, such as for DrawIndexedPrimitives, and it proved very useful to me. I'm not sure why UT isn't providing overloads in their API for that too.

    Code (CSharp):
    1. using System.Collections;
    2. using System.Collections.Generic;
    3. using UnityEngine;
    4. using Unity.Jobs;
    5. using Unity.Collections;
    6. using UnityEngine.Jobs;
    7. using System;
    8.  
    9. public class JobRender : MonoBehaviour
    10. {
    11.     struct RenderJob : IJobParallelFor
    12.     {
    13.         [ReadOnly] public float time;
    14.         [WriteOnly] public NativeArray<Matrix4x4> output;
    15.         [ReadOnly] public float scale;
    16.  
    17.         public void Execute(int i)
    18.         {
    19.             var pos = new Vector3(Mathf.Sin(time + (float)i / 100f), Mathf.Cos(time + (float)i / 100f), (float)i / 100f) * scale;
    20.             var rot = Quaternion.AngleAxis(time * 10 + i, pos);
    21.             output[i] = Matrix4x4.TRS(pos, rot, Vector3.one);
    22.         }
    23.     }
    24.  
    25.     public Mesh mesh;
    26.     public int computeSize = 100000, batchSize = 100;
    27.     public float scale = 3;
    28.     public Material mat;
    29.  
    30.     NativeArray<Matrix4x4> output;
    31.     JobHandle handleCalculate;
    32.     Matrix4x4[] matrices = new Matrix4x4[1024];
    33.  
    34.     void OnEnable()
    35.     {
    36.         output = new NativeArray<Matrix4x4>(computeSize, Allocator.Persistent);
    37.         matrices = new Matrix4x4[computeSize];
    38.     }
    39.  
    40.     void OnDisable()
    41.     {
    42.         output.Dispose();
    43.     }
    44.  
    45.     void Update()
    46.     {
    47.         var job = new RenderJob()
    48.         {
    49.             time = Time.time,
    50.             output = output,
    51.             scale = scale
    52.         };
    53.  
    54.         handleCalculate = job.Schedule(computeSize, batchSize);
    55.     }
    56.  
    57.     void LateUpdate()
    58.     {
    59.         // In case the job has not completed yet, complete it now
    60.         if (!handleCalculate.IsCompleted)
    61.             handleCalculate.Complete();
    62.         //handleCalculate.Complete();
    63.  
    64.         // Copy matrices back from native to managed memory
    65.         // !!! THIS IS "INEFFICIENT" !!!
    66.         output.CopyTo(matrices);
    67.  
    68.         // WHY DOES DrawMeshInstanced NOT HAVE A startingIndex PARAMETER INTO matrices???
    69.         Graphics.DrawMeshInstanced(mesh, 0, mat, matrices, 1023);
    70.     }
    71. }
    profiler_timeline.png

    profiler_hierarchy.png
     
    Last edited: Jan 22, 2018
    dreamerflyer and laurentlavigne like this.
  11. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    I like how you separated job completion from making the matrics
     
  12. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    i copied your code over and I'm getting 33ms on a i5 8400 @ 100,000 compute size!
    What's your compute size?


    [edit] I dropped compute size to 1000 and getting 20ms in lateupdate. this makes no sense so I ran a deep scan (cool word)

    why is this even showing up on the main thread?
    timeline it, enhance

    job aren't even showing up. what kind of voodoo is that (still the same script as you peter)
    re copies the script, you never know

    now the jobs are here (some bug in the profiler)
     
    Last edited: Jan 22, 2018
  13. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    4,013
    I set it to 1024, because it also renders 1024 cubes only. Your initial post was about how slow rendering is:
    Rendering isn't slow, it's the job related code that seems to be slow.
     
  14. alexzzzz

    alexzzzz

    Joined:
    Nov 20, 2010
    Posts:
    1,404
    You won't believe - https://github.com/dotnet/csharplang/issues/882 :eek:
     
    MNNoxMortem likes this.
  15. SugoiDev

    SugoiDev

    Joined:
    Mar 27, 2013
    Posts:
    234
    I think the WaitUntil and WaitWhile might be getting their names from Rx
    It confuses me too, just as much as Rx does sometimes.


    Oh! :eek:
     
  16. Necromantic

    Necromantic

    Joined:
    Feb 11, 2013
    Posts:
    115
    I agree on both points regarding startIndex and count. There is a Slice extension method that gives you Slice of the NativeArray but you still have to get a copy. You can't even make it one persistent array that you fill in this particular case because source and target have to have the same size and the remainder can be a problem.

    Just quick and dirty test code:
    Code (CSharp):
    1.  
    2.     private void LateUpdate()
    3.     {
    4.         handleCalculate.Complete();
    5.      
    6.         int amount = computeSize / 1023;
    7.         int remainder = computeSize % 1023;
    8.         if (remainder != 0)
    9.             ++amount;
    10.  
    11.         for (int i = 0; i < amount; ++i)
    12.         {
    13.             int index = i * 1023;
    14.             int count = (i == amount - 1 && remainder != 0) ? remainder : 1023;
    15.             Matrix4x4[] matrices = output.Slice(index, count).ToArray();
    16.             Graphics.DrawMeshInstanced(mesh, 0, mat, matrices);
    17.         }
    18.     }
    19.  
    I also tried segmenting up the jobs so you have an array of jobs and outputs etc. all handling batches of 1023 matrices. I still have to analyze the difference in overhead for both these workarounds.

    I also don't see a reason not to let us pass NativeArray to those methods, so no copying is needed since the internal elements can be accessed already. It's probably just a matter of updating the API. Just needs a lot of overloaded methods for wherever Unity accepts arrays. ;)
     
    Last edited: Feb 1, 2018
    laurentlavigne likes this.
  17. OswaldHurlem

    OswaldHurlem

    Joined:
    Jan 6, 2017
    Posts:
    40
    Last edited: Feb 14, 2018
  18. Peter77

    Peter77

    Joined:
    Jun 12, 2013
    Posts:
    4,013
    It looks like they're solving this issue with NativeArray.Slice (available since beta 6?) Great addition Unity! Now we just need various other API's to accept that slice :)
     
  19. laurentlavigne

    laurentlavigne

    Joined:
    Aug 16, 2012
    Posts:
    2,015
    how does slice work?
    Edit: I remember reading the docu a week or so ago, it makes a slice of a native array... useful when things support native, which I bet is why jobs is delayed.
     
    Last edited: Feb 10, 2018
  20. OswaldHurlem

    OswaldHurlem

    Joined:
    Jan 6, 2017
    Posts:
    40