Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Future of Scheduling

Discussion in 'Entity Component System' started by Timboc, Jan 30, 2021.

  1. Timboc

    Timboc

    Joined:
    Jun 22, 2015
    Posts:
    234
    At the moment, with a simple system (like in the rough example below) on my 3900x these are the rough timings (in editor but all safety tests disabled, synchronous burst enabled etc):
    Run() : ~1ms main thread
    Schedule(): ~1.2ms thread
    ScheduleParallel(): ~0.5ms main thread, 12ms total

    12 x the amount of work for a 50% reduction in main thread time is fairly brutal trade-off atm.

    I know it's been commented on before that the plan is to improve the scheduling overhead.
    My question is - from a complete 'ballpark' perspective, what kind of improvement might we expect over what kind of time period?
    e.g.
    Could we one day hope to see numbers like A:[0.05ms main, 1.2ms total] or B:[0.4ms main, 10ms total]?
    Could this be something we get hyped for in 2021 or closer to 2025?

    Calibration of expectations would be really valuable. Absolutely not looking for any commitment - just a vague guideline. It would be immensely helpful for forward planning.

    Many thanks.

    Also appreciate comments on alternative approaches, flaws with the example wrt performance etc.
    (In practice I would of course try and avoid the .Complete()).

    Code (CSharp):
    1. public struct TestICD1 : IComponentData
    2. {
    3.     public float4 Value;
    4. }
    5. public struct TestICD2 : IComponentData
    6. {
    7.     public float4 Value;
    8. }
    9. public struct TestICD3 : IComponentData
    10. {
    11.     public float4 Value;
    12. }
    13. public struct TestICD4 : IComponentData
    14. {
    15.     public float4 Value;
    16. }
    17. [BurstCompile]
    18. public struct TestISB : ISystemBase
    19. {
    20.     EntityQuery someEntities;
    21.     public void OnCreate(ref SystemState state)
    22.     {
    23.         someEntities = state.GetEntityQuery(typeof(TestICD1), typeof(TestICD2), typeof(TestICD3), typeof(TestICD4));
    24.         var a = state.EntityManager.CreateArchetype(typeof(TestICD1), typeof(TestICD2), typeof(TestICD3), typeof(TestICD4));
    25.         var e= state.EntityManager.CreateEntity(a);
    26.         NativeArray<Entity> newbies = new NativeArray<Entity>(500000, Allocator.Temp);
    27.         state.EntityManager.Instantiate(e, newbies);
    28.     }
    29.  
    30.     public void OnDestroy(ref SystemState state){}
    31.     public void OnUpdate(ref SystemState state)
    32.     {
    33.         TestJob job1 = new TestJob()
    34.         {
    35.             TestICD1Type = state.GetComponentTypeHandle<TestICD1>(true),
    36.             TestICD2Type = state.GetComponentTypeHandle<TestICD2>(true),
    37.             TestICD3Type = state.GetComponentTypeHandle<TestICD3>(),
    38.             TestICD4Type = state.GetComponentTypeHandle<TestICD4>(),
    39.         };
    40.  
    41.         //job1.Run(someEntities);
    42.         //job1.Schedule(someEntities).Complete();
    43.         job1.ScheduleParallel(someEntities, 1, state.Dependency).Complete();
    44.     }
    45.  
    46.     [BurstCompile]
    47.     struct TestJob : IJobEntityBatch
    48.     {
    49.         [ReadOnly] internal ComponentTypeHandle<TestICD1> TestICD1Type;
    50.         [ReadOnly] internal ComponentTypeHandle<TestICD2> TestICD2Type;
    51.         internal ComponentTypeHandle<TestICD3> TestICD3Type;
    52.         internal ComponentTypeHandle<TestICD4> TestICD4Type;
    53.         public void Execute(ArchetypeChunk chunk, int batchIndex)
    54.         {
    55.             NativeArray<TestICD1> testICD1s = chunk.GetNativeArray(TestICD1Type);
    56.             NativeArray<TestICD2> testICD2s = chunk.GetNativeArray(TestICD2Type);
    57.             NativeArray<TestICD3> testICD3s = chunk.GetNativeArray(TestICD3Type);
    58.             NativeArray<TestICD4> testICD4s = chunk.GetNativeArray(TestICD4Type);
    59.  
    60.             for (int i = 0; i < chunk.Count; ++i)
    61.             {
    62.                 // None of this passes, just junk fake work
    63.                 if (math.any(testICD1s[i].Value > float4.zero) && math.all(testICD2s[i].Value > float4.zero))
    64.                 {
    65.                     testICD3s[i] = new TestICD3() { Value = 4 };
    66.                     testICD4s[i] = new TestICD4() { Value = 5 };
    67.                 }
    68.             }
    69.         }
    70.     }
    71. }
     
  2. Timboc

    Timboc

    Joined:
    Jun 22, 2015
    Posts:
    234
    It was pointed out to me that with ISBs I should also have [BurstCompile] above the OnUpdate(). When I tried that it produced errors until I swapped IJEB for IJobChunk. After which the timings looked closer to ~0.45ms main, 10.5ms overall. i.e. closer to B above. I'm mindful my tests aren't very scientific and I'm just averaging a few frames from the profiler.
    So now I'm wondering if the answer to my question was (B) or if there's still work to be done beyond ISBs?
     
  3. jasons-novaleaf

    jasons-novaleaf

    Joined:
    Sep 13, 2012
    Posts:
    181
    I think this really just shows that ScheduleParallel() isn't a good choice for "light work". You want to use it for cpu intensive loops over large numbers of entities. There is overhead associated with Jobs and ParallelJobs. As your Job loop only sets 2 floats I can see it not being a great solution.

    Also, try setting the batch size (bigger) and max parallel (lower) and I bet you will get better results.
     
    daniel-holz likes this.
  4. Timboc

    Timboc

    Joined:
    Jun 22, 2015
    Posts:
    234
    Thanks jasons-novaleaf but that doesn’t match my observations. More work doesn’t lead to a particular better ratio than the example. I’d be interested if you had an example. That said, I’m also more interested in expectations I should have for the future as my real world usage is around 2-4x the example workload and I see the same thing. Appreciate you taking the time to reply.
     
  5. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,131
    I can’t speak to your benchmark but if your if condition passes, it seems you interlock in the parallel case.

    Generally though, it is good advice to schedule single. As long as you have many independent single jobs, they fill up your worker threads. You can go parallel for specific jobs when you optimize in the end
     
  6. WAYNGames

    WAYNGames

    Joined:
    Mar 16, 2019
    Posts:
    939
    Would it be possible to have a static analyzer of the job to rate their complexity at compile time.
    Then use that complexitiy at runtime to determine base on the number of entites/chunk if it's better to make it single thread or multi threaded ?
     
  7. UsmanMemon

    UsmanMemon

    Joined:
    Jan 24, 2020
    Posts:
    87
    @Timboc I'am curious to know timings for just .Schedule() after burst compiling OnUpdate()
     
  8. Timboc

    Timboc

    Joined:
    Jun 22, 2015
    Posts:
    234
    I tried with ScheduleSingle() and it looked in the same ballpark as Run() (oddly fractionally faster?). I wouldn't take these particular numbers that seriously though - I'd be interested if anyone wanted to do some proper benchmarking.

    For interest I tried 10 jobs with 1/10th of the entities each (50k, using a tag on each group) using the method desertGhost_ quite rightly corrected me on (many thanks by the way).
    Calling
    state.Dependency.Complete()
    on this resulted in ~0.45ms main thread, ~4.5ms total work.
    upload_2021-1-31_22-55-11.png
    Which is interesting. Closer to 4.5x the work for the half the thread time? Still less than ideal though maybe falls quite inline with what we've seen in general. Splitting a lot of work across a few threads can sometimes be a good trade-off - splitting across 24/lots is almost never worth it? As Joachim has previously set, they're not keen on the idea of static analysis or dynamic scheduling but it looks increasingly difficult to work out a good balance on modern cpus and restricting to just one core for some things also doesn't feel great. If I can work up the energy I might try splitting this across e.g. 24 ScheduleSingles out of interest in case it doesn't end up looking like ScheduleParallel().

    That said, I'm trying not to get into the weeds with creating exemplar benchmarks. My main question is around how jobs don't seem to scale across threads well and whether this is something ever likely to change? And if so, roughly which year and by what order of magnitude?

    Or perhaps I'm the only one encountering this?
     
    Last edited: Jan 31, 2021
    Lukas_Kastern likes this.
  9. desertGhost_

    desertGhost_

    Joined:
    Apr 12, 2018
    Posts:
    258
    You are scheduling each job with a dependency on the previous scheduled job. Try something like this:
    Code (CSharp):
    1.  
    2.             var dependency = state.Dependency;
    3.             JobHandle handle;
    4.             handle = job1.ScheduleSingle(someEntitiesA, dependency);
    5.  
    6.             state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    7.             handle = job1.ScheduleSingle(someEntitiesB, dependency);
    8.  
    9.             state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    10.  
    11.             handle = job1.ScheduleSingle(someEntitiesC, dependency);
    12.  
    13.            state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    14.  
    15.            handle  = job1.ScheduleSingle(someEntitiesD, dependency);
    16.  
    17.            state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    18.  
    19.             handle = job1.ScheduleSingle(someEntitiesE, dependency);
    20.  
    21.             state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    22.  
    23.             handle = job1.ScheduleSingle(someEntitiesF, dependency);
    24.  
    25.             state.Dependency = JobHandle.CombineDependencies(handle, state.Dependency);
    26.  
    27.             //...
    28.  
     
    daniel-holz and Timboc like this.
  10. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    714
    Sorry to bring up this old thread, but I figured a comparison of DOTS Jan 2021 vs current DOTS 1.0.8 might be useful for anyone reading this and also I have a simple question.

    I ran a simple vertex animation system in the editor and benchmarked it:
    https://gitlab.com/lclemens/animati...ts/AnimationCooker/Runtime/AnimationSystem.cs

    Here are the results (with 1 million entities):
    • Main thread - ~0.02ms to ~0.03ms for both.
    • Schedule() - 10ms to 12ms on one worker thread
    • ScheduleParallel() - 2.21ms * 11 worker threads --> 24.31ms total
    So this tells me that Schedule() is about 2x as fast for this particular ISystem.

    Since the main thread, (where Schedule/ScheduleParallel resides), is the same for both methods.... can I assume that scheduling overhead is the same for both? Or is there another script/system that unity maintains where I need to look to find the scheduling overhead?
     
    Last edited: May 19, 2023
  11. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,984
    That's probably not true. If your CPU is like mine, you probably have a 6 core CPU with 2 threads per core. The two threads share resources, and help keep the CPU busy when one thread has to wait on data. But if you aren't suffocating one of threads on memory, then you won't benefit from this at all and as a consequence, Unity will effectively be double-counting the core time.
     
    daniel-holz likes this.
  12. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    714
    Hmm.... I'm not totally sure what you're implying, but you can see the system in that link I provided - it's just a pretty boring IJobEntity. Maybe I'm counting the time incorrectly with ScheduleParallel()? What I did was run with 1 million entities and then look at the profiler.

    In the profiler, Job.Worker 0 looks like this:
    upload_2023-5-19_17-51-57.png

    Then I looked at all the other worker threads...

    upload_2023-5-19_17-53-7.png

    And they all were very close to 2.3ms. Since there are 11 of the threads, adding up all the numbers came out to 24.31ms.

    Are you saying that to get the correct time spent running the job, I need to divide by 2?
     
  13. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,984
    1) The timeline view will add that up for you.
    2) Don't forget the main thread.
    3) You probably need to divide. But it largely depends on your CPU model and UEFI settings.
     
    lclemens likes this.
  14. lclemens

    lclemens

    Joined:
    Feb 15, 2020
    Posts:
    714
    Gotcha. Thanks!

    I didn't forget the main thread, but since both Schedule() and ScheduleParallel() showed pretty much the exact same numbers (between 0.01 and 0.02ms) on the main thread for the System and they are miniscule compared to the job times on the worker threads, I have been ignoring them.

    The timeline view gives the same numbers that I was getting via adding them up in the profiler hierarchy.

    upload_2023-5-19_22-40-40.png

    So I think it's safe to say that if you use ScheduleParallel(), expect the times to be ~2x compared to Schedule() when using the profiler. The times measured by the profiler for ScheduleParallel() need to be divided by two to be comparable with Schedule(), which does not suffer from the incorrect double-counting issue.
     
  15. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    3,984
    Schedule still can suffer from the double-counting issue too, just not in isolation.