Search Unity

Job System for mesh skinning lots of small idle inbetween tasks

Discussion in 'C# Job System' started by ahmidou, Apr 19, 2018.

  1. ahmidou

    ahmidou

    Joined:
    Sep 17, 2012
    Posts:
    87
    Hi,
    I'm testing the Job System with a custom skinning deformer, it's working well with ~1.4ms for the deform function on 7 threads, Burst is on.
    Then I duplicated the character and got a massive frame rate drop. When I looked at the profiler I noticed tons of very small idle tasks between each character eval. There also a lots of Profiler.DeserializeThreadData but I guess those won't happen with the profiler is closed:

    upload_2018-4-19_20-21-23.png


    here's my LateUpdate function:

    Code (CSharp):
    1.         Matrix4x4[] boneMatrices = new Matrix4x4[skin.bones.Length];
    2.         for (int i = 0; i < boneMatrices.Length; i++)
    3.             m_xfo[i] = skin.bones[i].localToWorldMatrix * bindPose[i];
    4.  
    5.         deformedMesh = new DeformedMesh()
    6.         {
    7.             vertexCount = mesh.vertexCount,
    8.             maxBonesperVertex = quality,
    9.             vertices = m_vertices,
    10.             outVertices = m_outVertices,
    11.             normals = m_normals,
    12.             weights = m_weights,
    13.             boneID = m_boneID,
    14.             xfo = m_xfo  
    15.         };
    16.  
    17.         int vCount = deformedMesh.vertexCount;
    18.         m_JobHandle = deformedMesh.Schedule(vCount, 1000);
    19.         m_JobHandle.Complete();
    20.  
    21.         meshOutput.vertices = deformedMesh.outVertices.ToArray();
    22.         meshOutput.normals = deformedMesh.normals.ToArray();
    23.         meshOutput.RecalculateBounds();
    Am I missing something?
    Thanks
     
  2. ahmidou

    ahmidou

    Joined:
    Sep 17, 2012
    Posts:
    87
    BTW I've added the
    Code (CSharp):
    1.         deformedMesh = new DeformedMesh()
    2.         {
    3.             vertexCount = mesh.vertexCount,
    4.             maxBonesperVertex = quality,
    5.             vertices = m_vertices,
    6.             outVertices = m_outVertices,
    7.             normals = m_normals,
    8.             weights = m_weights,
    9.             boneID = m_boneID,
    10.             xfo = m_xfo
    11.         };
    in the LateUpdate function as I followed job-system-cookbook examples, is there's currently a tiny overhead by doing so? I noticed that it was working exactly the same if it was done just one time at start.
     
  3. timjohansson

    timjohansson

    Unity Technologies

    Joined:
    Jul 13, 2016
    Posts:
    473
    Looks to me like the big problem is that you do too much on the main thread and wait for the jobs to complete before continuing.
    The pattern of Scedule(); Complete(); means the main thread will wait for the job to complete before doing anything else. If you look at the main thread in the profiler you'll see that it is doing a lot of stuff after the jobs, presumably converting NativeArray to managed arrays and updating bounds, before the next character starts.

    If you split the update to first schedule the jobs for all characters and then complete + update them after all have been scheduled it should be a little bit better, but still not perfect since the main thread conversion looks more expensive than deformation.
    You should also reduce the cost of the conversion + update to make it really good. You could start by copying to a cached existing managed array instead of creating a new one every time (use CopyTo instead of ToArray).
     
  4. M_R

    M_R

    Joined:
    Apr 15, 2015
    Posts:
    559
    Code (CSharp):
    1.         m_JobHandle = deformedMesh.Schedule(vCount, 1000);
    2.         m_JobHandle.Complete();
    don't Complete() the job right after scheduling it. that will block the main thread until the job is completed, and you won't get parallelism.

    you should instead schedule the job early in the frame (e.g. Update) and complete it later (e.g. LateUpdate)

    if you can allow one frame latency, you can keep around the job handle for a frame, then do
    Code (CSharp):
    1. // in your mono behaviour
    2.  
    3. JobHandle jobHandle;
    4.  
    5. void Update() {
    6.     jobHandle.Complete(); // previous frame job
    7.     jobHandle = new YourJob {...}.Schedule(...);
    8. }
     
  5. ahmidou

    ahmidou

    Joined:
    Sep 17, 2012
    Posts:
    87
    Hi Tim,
    So this means I need some sort of global character manager to schedule the non deformation parts like updating matrices and bounds right?

    This almost doubled the perfs, thanks!
     
  6. ahmidou

    ahmidou

    Joined:
    Sep 17, 2012
    Posts:
    87
    Doing so nicely packed the jobs at the begining of the frame.
    The NativeArray transfer (even CopyTo()) is definitely the bottleneck it's 10 time more expensive than the deformation which is a bit surprising shouldn't this be done in linear time?
    Couldn't this be parallelized too?

    Thanks
     
  7. timjohansson

    timjohansson

    Unity Technologies

    Joined:
    Jul 13, 2016
    Posts:
    473
    Having a manager for it is probably a good idea if you're using MonoBehaviours, but not required. You could also use multiple systems in ECS or Update / LateUpdate in MonoBehavious or allow one frame latency and apply last frames result before scheduling the current etc.