Search Unity

IJobParallelForTransform, 15000 transforms, executed on single job thread, any hints?

Discussion in 'Entity Component System' started by jakub-gemrot, Jun 25, 2018.

  1. jakub-gemrot

    jakub-gemrot

    Joined:
    Nov 1, 2014
    Posts:
    16
    Hi!

    I'm playing with Job System, implemented simple scenario where cubes are falling down and being respawned once they hit a certain Y level. I have two jobs, the second dependent on the first. They both are referencing the same native array for transforms. It runs without errors/warnings but the job is never split among job threads.

    Job's code:
    Code (CSharp):
    1. namespace Ships.JobSystem
    2. {
    3.     [BurstCompile]
    4.     public struct MoveJob : IJobParallelForTransform
    5.     {      
    6.         public float speed;
    7.         public float deltaTime;
    8.  
    9.         public void Execute(int index, TransformAccess transform)
    10.         {
    11.             Vector3 position = transform.position;
    12.  
    13.             position = position + new Vector3(0, speed, 0) * deltaTime;
    14.  
    15.             transform.position = position;
    16.         }
    17.     }
    18. }

    Profiler as the proof (development build, not taken from the editor):
    Unity-JobSystem-Question.png


    Stats window:
    Unity-JobSystem-Question.2.png

    I'm running

    Version 2018.2.0b8 (fed204371f5a)
    Wed, 30 May 2018 15:35:38 GMT
    Branch: 2018.2/staging

    My project manifest:

    Code (CSharp):
    1. {
    2.     "dependencies": {
    3.         "com.unity.modules.ui": "1.0.0",
    4.         "com.unity.modules.tilemap": "1.0.0",
    5.         "com.unity.modules.physics2d": "1.0.0",
    6.         "com.unity.modules.assetbundle": "1.0.0",
    7.         "com.unity.modules.unitywebrequestassetbundle": "1.0.0",
    8.         "com.unity.modules.unityanalytics": "1.0.0",
    9.         "com.unity.modules.umbra": "1.0.0",
    10.         "com.unity.analytics": "2.0.16",
    11.         "com.unity.modules.vehicles": "1.0.0",
    12.         "com.unity.ads": "2.0.8",
    13.         "com.unity.modules.imageconversion": "1.0.0",
    14.         "com.unity.modules.director": "1.0.0",
    15.         "com.unity.modules.video": "1.0.0",
    16.         "com.unity.modules.audio": "1.0.0",
    17.         "com.unity.modules.unitywebrequest": "1.0.0",
    18.         "com.unity.textmeshpro": "1.2.1",
    19.         "com.unity.modules.ai": "1.0.0",
    20.         "com.unity.modules.unitywebrequestwww": "1.0.0",
    21.         "com.unity.purchasing": "2.0.1",
    22.         "com.unity.modules.particlesystem": "1.0.0",
    23.         "com.unity.standardevents": "1.0.13",
    24.         "com.unity.modules.imgui": "1.0.0",
    25.         "com.unity.modules.physics": "1.0.0",
    26.         "com.unity.modules.screencapture": "1.0.0",
    27.         "com.unity.modules.xr": "1.0.0",
    28.         "com.unity.modules.terrain": "1.0.0",
    29.         "com.unity.modules.unitywebrequestaudio": "1.0.0",
    30.         "com.unity.modules.jsonserialize": "1.0.0",
    31.         "com.unity.modules.terrainphysics": "1.0.0",
    32.         "com.unity.entities": "0.0.12-preview.6",
    33.         "com.unity.modules.animation": "1.0.0",
    34.         "com.unity.package-manager-ui": "2.0.0-preview.3",
    35.         "com.unity.modules.cloth": "1.0.0",
    36.         "com.unity.modules.uielements": "1.0.0",
    37.         "com.unity.modules.vr": "1.0.0",
    38.         "com.unity.modules.unitywebrequesttexture": "1.0.0",
    39.         "com.unity.modules.wind": "1.0.0",
    40.         "com.unity.incrementalcompiler": "0.0.42-preview.1"
    41.     },
    42.     "registry": "https://packages.unity.com",
    43.     "testables": [
    44.         "com.unity.collections",
    45.         "com.unity.entities",
    46.         "com.unity.jobs"
    47.     ]
    48. }
    Any suggestions what I might have missed to make my job be split among threads?

    Thank you!
    Jakub
     
    GabLeRoux and nishikinohojo like this.
  2. jakub-gemrot

    jakub-gemrot

    Joined:
    Nov 1, 2014
    Posts:
    16
    Can it be that IJobParallelForTransform does not have an extension for setting the grouping?

    I have to schedule it like this:
    Code (CSharp):
    1. shipTransforms.moveJobHandle = moveJob.Schedule(shipTransforms.shipsAccess);
    While the code within ShipMoveJob is small, it still takes 2ms at single thread so by using 7 threads it can get down to 0,28ms...

    Jakub
     
  3. M_R

    M_R

    Joined:
    Apr 15, 2015
    Posts:
    559
    IJobParallelForTransform only splits the roots. if all your transforms have the same parent, they will execute in the same thread.

    try removing the parents for your cubes if you have any (or grouping them under different roots, at least >= the number of CPU threads)
     
  4. jakub-gemrot

    jakub-gemrot

    Joined:
    Nov 1, 2014
    Posts:
    16
    M_R is correct, I've split my objects among different parents and it works like a charm.
     
    PrimalCoder and CPlusSharp22 like this.
  5. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Unity needs a concept of a static parent or decorative parent so we can still keep things organised.
     
  6. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    Is not ECS supposed to solve these things?
     
  7. No, ECS has nothing to do whether or not Unity has the concept of "decorative parent" in the Hierarchy window.
     
    Nyanpas likes this.
  8. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    Necrobump deluxe but I was thinking of the GameObject-conversion, anyhow problem solved.:oops:

    [edit] Just noticed that the job specified by OP is also not vectorized.
     
    Last edited: Aug 17, 2020
    hippocoder likes this.
  9. MagicianArtemka

    MagicianArtemka

    Joined:
    Jan 15, 2019
    Posts:
    46
    Hi. Can you provide an example of how OP's code can be vectorized? The code is simple and I cannot find a way to optimize it even more. I want to write better code for my projects - that's the reason why I'm asking ;)

    Also, I will be really appreciated it if you can provide here a few links to information about "vectorized code and how to write it". I think you are more experienced in DOTS than I'm, so you can know good books or articles about the vectorized code.
     
  10. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    Sorry, I cannot help. I am not an expert on the subject, I just read what the burst-inspector tells me...
     
    MagicianArtemka likes this.
  11. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    I'm one of the handful of people on these forums that could help you out.

    However, keep in mind that you should only try to vectorize code that is measurably expensive. Otherwise it is a waste of time, because you can only get up to 4x speedup (8x if using AVX) unless you also fix other things related to aliasing, branching, and caching. Anyways, if you have a particular job that you measured to be too expensive and you would like to performance golf, either start a new thread and tag me or PM me directly if you don't want to share it publicly.
     
    MagicianArtemka likes this.
  12. CaseyHofland

    CaseyHofland

    Joined:
    Mar 18, 2016
    Posts:
    613
    Use a float3 instead of a Vector3. Float3 (and everything in the Mathematics package) is designed to be SIMD compatible, which is essential for vectorization. Though ‘I think’ you can still use Vector3, as soon as you use the new keyword (which may also happen behind the scenes with Vector3) you are shooting yourself in the foot.
     
  13. Flarup

    Flarup

    Joined:
    Jan 7, 2010
    Posts:
    164
    Regarding the original question, that IJobParallelForTransform doesn't multithread properly, then I'm running into the same problem. When using IJobParallelForTransform I also doesn't get any significant speedup, compared to running it on the main thread.

    I'm scheduling the job like this:

    m_PositionJobHandle = m_Job.Schedule(m_TransformsAccessArray);

    I have tried to create 8 root objects, and place the spawned prefabs evenly under these transforms. But that didn't improve performance either. Any other ideas why IJobParallelForTransform doesn't properly utilize all cores?

    Thank you very much in advance for all your help.


    Kind regards,
    Uffe Flarup
     
  14. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    1) Make sure your job is using Burst. You can check that in the profiler timeline view.
    2) Make sure the work you are doing in the job is more expensive than the work to gather inputs to schedule the job. Ideally you shouldn't have to do any gathering of inputs, but I have seen some pretty awful attempts.
    3) Show code and profiler timeline.
     
    Flarup likes this.
  15. Flarup

    Flarup

    Joined:
    Jan 7, 2010
    Posts:
    164
    Thanks a lot for the input. After inspecting some more, I could see that the code actually WAS running on all cores, but the code itself wasn't the big part of the work. Instead, after updating and moving the transforms, it was the subsequent calls to UpdateRendererBoundingVolumes that Unity automatically does, that's taking most of the time.
     
    DevDunk, Krajca and apkdev like this.