Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Cann't understand the job schedule strategy on android platform

Discussion in 'Entity Component System' started by YakShaver_dc, Jul 30, 2021.

  1. YakShaver_dc

    YakShaver_dc

    Joined:
    Mar 27, 2019
    Posts:
    29
    I use samsung galaxy s21+ to do some JobSystem test. The JobsUtility.JobWorkerCount==4 and there is 4 worker threads in profiler timeline. The weird thing is JobSystem only use one worker thread and let the other two or three keep idle. The last few days, I met a couple of weird things on android platform (another one : https://forum.unity.com/threads/jobworkermaximumcount-is-1-on-the-android-platform.1147367/) and hardly have any confidence in DOTS on android platform.

    upload_2021-7-30_11-12-4.png

    Code (CSharp):
    1. public class PathfindingStressTestSystem : SystemBase
    2. {
    3.     ...
    4.     private const int MAX_QUERY_PER_FRAME = 300;
    5.     private List<JobHandle> jobHandles;
    6.  
    7.     protected override void OnCreate()
    8.     {
    9.         jobHandles = new List<JobHandle>(MAX_QUERY_PER_FRAME);
    10.     }
    11.  
    12.     protected override void OnUpdate()
    13.     {
    14.         Entities
    15.             .WithoutBurst()
    16.             .ForEach((Entity e, ref DynamicBuffer<Path> path, in AgentComponent agent) =>
    17.             {
    18.                 var job = new SinglePathfindingJob()
    19.                 {
    20.                     ...
    21.                 };
    22.                
    23.                 jobHandles.Add(job.Schedule());
    24.            
    25.             }).Run();
    26.  
    27.  
    28.         var jobArray = jobHandles.ToNativeArray<JobHandle>(Allocator.Temp);
    29.         JobHandle.CompleteAll(jobArray);
    30.         jobArray.Dispose();
    31.     }
    32.  
    33.     [BurstCompile]
    34.     private struct SinglePathfindingJob : IJob
    35.     {
    36.         ...
    37.         public void Execute()
    38.         {
    39.             ...
    40.         }
    41.     }
    42. }
     
    apkdev likes this.
  2. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    @Yury-Habets Any idea why it behaves like that? Is that bug or expected behavior?
     
  3. Yury-Habets

    Yury-Habets

    Unity Technologies

    Joined:
    Nov 18, 2013
    Posts:
    1,165
    It's hard to say if this is the expected behavior without taking a systrace and trying to understand what the CPU cores are doing. It is likely that there is some thread contention, the reason for that may be that some cores were brought offline because they are overheated.

    Overall, if you have 8 cores on your mobile phone, it doesn't necessarily mean you can run 8 full-feature threads simultaneously for a period of time that is long enough. Even if you take, for example, only 4 big cores (regardless of the actual complicated core configuration), running all of them at full speed is likely to overheat the CPU.

    It's not a DOTS-specific issue, it's about job system and some unfortunate constraints of the mobile phone world when you are trying to push the performance to the maximum.
     
    YakShaver_dc likes this.
  4. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    @Yury-Habets I see. Last time I heard there's plan to improve overhead of job system for mobile phone. What's the current state? Is the time taken to schedule a job at mobile phone has been reduced significantly that able to move most the code out of main thread?
     
    Last edited: Aug 6, 2021
    YakShaver_dc likes this.
  5. Yury-Habets

    Yury-Habets

    Unity Technologies

    Joined:
    Nov 18, 2013
    Posts:
    1,165
    @optimise as far as I know, the work on improving scheduling overhead is currently in progress. Until then, having many tiny jobs may be suboptimal on mobiles because the scheduling and context switching cost will outweigh the multithreading benefits. You have to balance between moving _everything_ out of the main thread and having to pay the costs.

    Mobiles are hard, especially Androids. They always try to preserve energy and prevent overheating, while we are trying to push them to the max. The behaviour is not deterministic, thus hard to reproduce and reliably fix. For example, I was once doing some perf measurements, and a breeze from the balcony just messed up all the numbers :)

    Anyway, as Florian replied on the other thread, Snapdragon 888 will be fixed soon.
     
    YakShaver_dc and optimise like this.
  6. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    @Yury-Habets I see. One more question. I know there's struct based system will solve slow main thread system update issue at android but I still have a lots of class based systems that need to use class type component and monobehavior. Any plan to improve performance of class based system too that code gen better performance EntityQuery code to replace current slow EntityQuery code? I believe the current EntityQuery code is implemented in generic way that make it very slow and need to replace it to faster EntityQuery code. I have use other 3rd party ECS solution before and it's also class type system but it's extremely faster than this official Unity class based system.
     
  7. Yury-Habets

    Yury-Habets

    Unity Technologies

    Joined:
    Nov 18, 2013
    Posts:
    1,165
    Sorry can't help you with that. What is the "class based system" that you mean BTW? Maybe someone who is closer to generic DOTS can help you.
     
  8. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    @Yury-Habets "class based system" means the system inherit from SystemBase and it's using class type.
     
  9. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Main difference between SystemBase and ISystemBase as struct is that you can burst ISystemBase structs / systems.
    And that's about it.

    Right now, main performance impact of systems comes from scheduling / combining dependencies.
    (which Unity is working on, AFAIK)

    There's no such thing as "slow EntityQuery code", because all code gen is basically the same for both SystemBase and ISystemBase.

    If in doubt - use profiler.
     
  10. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    I believe those who developing dots project targeting low/mid range mobile phone will know how slow it's. There's also a thread mention about this long time ago. Anyway I take a set of systems from system group at mid range android mobile device. Just this simple system group already takes 0.13ms. Not to mention other system group. When the system don't find any matching Entities, seems like it takes 0 ms to run. It clearly show that how slow EntityQuery code at SystemBase. From what I understand now, dots developer will move to ISystemBase and burst the system to solve this EntityQuery performance issue. Meaning that SystemBase will not have any EntityQuery performance improvement which is not what I want.

     
  11. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Can you share what's inside the system?
     
  12. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    It's just extremely simple system with only one or two EntityQuery. If u targeting desktop platform this same set of system group is extremely fast even at Editor but it's not the case at mobile platform.

    upload_2021-8-10_16-43-16.png
     
  13. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Most important information here is data / dependencies / chosen type of execution, not the target, or EntityQuery. Without an actual solid code, its near impossible to figure out what's wrong / takes most of the time.

    Also, have you tried profiling with Deep Profile enabled?
     
  14. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    Those systems is fully run on main thread and should be zero job scheduling cost. Anyway I create a fresh new project to prove that it's really slow at mobile platform. Only one system is create running fully on main thread and iterate on only one entity inside the project to show performance result.

    Code (csharp):
    1.  
    2. using Unity.Entities;
    3. using Unity.Jobs;
    4. using Unity.Mathematics;
    5. using Unity.Transforms;
    6.  
    7. // This system updates all entities in the scene with both a RotationSpeed_ForEach and Rotation component.
    8.  
    9. // ReSharper disable once InconsistentNaming
    10. public partial class RotationSpeedSystem_ForEach : SystemBase
    11. {
    12.     // OnUpdate runs on the main thread.
    13.     protected override void OnUpdate()
    14.     {
    15.         float deltaTime = Time.DeltaTime;
    16.  
    17.         // Schedule job to rotate around up vector
    18.         Entities
    19.             .WithName("RotationSpeedSystem_ForEach")
    20.             .ForEach((ref Rotation rotation, in RotationSpeed_ForEach rotationSpeed) =>
    21.             {
    22.                 rotation.Value = math.mul(
    23.                     math.normalize(rotation.Value),
    24.                     quaternion.AxisAngle(math.up(), rotationSpeed.RadiansPerSecond * deltaTime));
    25.             })
    26.             .Run();
    27.     }
    28. }
    29.  

    Mobile
    upload_2021-8-10_19-6-57.png

    Desktop
    upload_2021-8-10_19-27-42.png

    Desktop (Editor)
    upload_2021-8-10_19-7-34.png
     
    Last edited: Aug 10, 2021
  15. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    This system still relies on completion of transform related jobs (mainly synchronizing transform rotation to the Rotation component). Which are heavy on their own (any quantity of transforms, due to horrible TransformAccessArray management), and its right after transform group, which also causes a sync point (due to .Run).

    From the screenshots, you can see that actual "query" iteration time is 0.01.
    Deep profile should show this as well.

    Alternatively, you can try iterating on some other component that does not have any kind of jobs running.
     
  16. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    Alright. I changed the system to totally independent TestComponent that increment 1 value every frame. The result still almost the same.

    Code (csharp):
    1.  
    2. using Unity.Entities;
    3.  
    4. [GenerateAuthoringComponent]
    5. public struct TestComponent : IComponentData
    6. {
    7.     public int value;
    8. }
    9.  
    10. public class TestSystem : SystemBase
    11. {
    12.     protected override void OnUpdate()
    13.     {
    14.         Entities
    15.             .WithName("TestSystem_ForEach")
    16.             .ForEach((Entity entity, ref TestComponent test) =>
    17.         {
    18.             test.value++;
    19.         }).Run();
    20.     }
    21. }
    22.  


    Mobile
    upload_2021-8-10_20-57-43.png

    Desktop (Editor)
    upload_2021-8-10_21-0-6.png
     
    Last edited: Aug 10, 2021
    YakShaver_dc likes this.
  17. xVergilx

    xVergilx

    Joined:
    Dec 22, 2014
    Posts:
    3,292
    Other than giving Deep Profile advice, I cannot give any other valuable advices unfortunately.
    Something slow is happening before actual query iteration.

    Best bet is to ask @Joachim_Ante / address it in a separate thread, but I'm pretty much sure they're aware of multiple SystemBase overhead.
     
  18. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    Here's the screenshot. The system OnUpdate() is extremely expensive at mobile platform. I think it really need a full rewrite for that.

    upload_2021-8-10_21-31-49.png
     
  19. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,029
    @Yury-Habets Looks like ScheduleTimeInitialize method spend most the time at SystemBase. Any idea why ScheduleTimeInitialize method spend so much time to execute and why needs this extremely complex method that keep calling further into very deep hierarchy of methods? Any plan to simplify it and significantly improve the performance until able to make it spends 0.01ms or below at low end mobile phone at main thread? Maybe make IL processor even smarter that able to generates even better IL code to get even simplified and high performance final C# method then finally make this method burstable?
     
    Last edited: Aug 13, 2021
  20. JooleanLogic

    JooleanLogic

    Joined:
    Mar 1, 2018
    Posts:
    447
    That's possibly mine and it's a real worry if this is still a problem.
    Joachim's comments on this issue were over a year ago here and here. I hope it's progressed since then cos I'm building with ecs for much lower than S21.