Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Bug (IN-49838) [1.0.11] Massive performance spike

Discussion in 'Physics for ECS' started by optimise, Aug 1, 2023.

  1. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    There is massive performance spike that caused by job has main thread dependency that can found it at SolveAndIntegrateSystem and EntitiesGraphicsSystem. At desktop with i9-11900, the spike goes up and down from 1ms+ to 8ms+. The expectation is job should not affect main thread time as long as it's not more than 16.66 ms assuming we target solid 60 fps.
     
    Last edited: Aug 3, 2023
  2. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
  3. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    Hi @optimise,

    Thanks for reporting this. I didn't find any information on how to reproduce this issue in the ticket you filed.
    Can you provide some information on how to reproduce this or provide a screenshot of the profiler showing the issue here as a first step?

    Thanks!
     
  4. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    Here's the profiler screenshot showing huge spike up and down that profiled windows player runtime build.
    upload_2023-8-9_9-29-43.png
     
  5. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    Oh yes. That looks indeed quite significant.
    Could you please provide us with a description of how to reproduce this?

    Our performance regression tests don't show any spikes.
     
  6. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    1) Use high end pc to reproduce. My pc CPU spec is i9-11900 @2.50GHz
    2) Set Quality - Vsync Count to Don't Sync
    3) At Build Settings remove all the scenes then add DynamicBenchmarkECS scene
    4) Build windows development player runtime build then profile it

    Actually at editor just enter play mode and profile it still can get same result.
     
  7. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    Ok thanks. I see that you attached a project for reproduction now. I'll have a look.
     
    optimise likes this.
  8. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    I was able to produce some spiking on my machine with your test scene.

    To confirm my understanding, I consider this here a good frame (no spike):
    upload_2023-8-9_13-21-51.png

    We can see that the EntitiesGraphicsSystem is waiting on the physics jobs to complete, as expected.
    The dashed line above marks the end of the physics jobs and then this graphics system can prepare the rendering of the results of the physics simulation. So far so good.

    When we do get a spike, we get this situation:
    upload_2023-8-9_13-27-16.png

    As before, we have the EntitiesGraphicsSystem waiting on the physics jobs to complete, which is normal.
    But at the end of the frame we get an extra time slice in the EditorLoop.
    Is that what you are also observing on your end?

    Note that this is all run in the editor at this point. I will profile on a player build next.
     
  9. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    Ya almost the same. Actually why u think EntitiesGraphicsSystem waiting on the physics jobs to complete is normal? For me it's not normal that the spike is around 0.44ms+ to 8ms+ up and down which is almost 7x performance reduction spike. My expectation is physics job should not affect main thread as long as it's within 16 ms if target 60 fps. I dun understand why physics job has main thread dependency to EntitiesGraphicsSystem. Is this the current limitation of Entities Graphics that core logic of EntitiesGraphicsSystem is not jobified as job yet that caused EntitiesGraphicsSystem needs to wait for physics job on main thread?

    Can official solve these main thread stalling since it really affect performance too much specially at limited hardware mobile platform like Android.
     
    Last edited: Aug 9, 2023
  10. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    When I run in a player build, the spikes completely disappear btw.
    See here:
    upload_2023-8-9_14-6-55.png

    Regarding the stall on the main thread in EntitiesGraphicsSystem:
    There is a data dependency between this system and the physics systems through the transformation components of the Entities which the physics is setting and the graphics system is reading for rendering purposes.

    At this point in the framework, these data dependencies are managed on the system level, meaning a system will need to wait for the dependent jobs from other systems to have completed before it can commence its work.
     
    optimise likes this.
  11. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    Any plan to address this in future release since it still has a lot of room for improvement? For short term improvement, will official further improve job system to schedule job much more evenly and at best timing? As screenshot shown below job can move into Idle and also group multiple jobs to same timing to further reduce idle time.

    One more thing. Why there is physics job like DispatchPairSequencer:CreateDispatchPairPhasesJob has main thread dependency to SolveAndIntegrateSystem? Since both the job and system are within the same ecs physics I expect the job should not have main thread dependency to SolveAndIntegrateSystem.

    upload_2023-8-10_2-25-14.png
     
  12. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    I just had another look specifically at the job dependency completion that we see within the EntitiesGraphicsSystem.
    The completion is in fact an explicit one within the OnUpdate() call from that system itself.
    So I will reach out to the wider team to see if that is something that is absolutely necessary or could potentially be changed, hopefully freeing up some time on the main thread, e.g., by scheduling jobs dependent on the data produced by the physics and letting the job system start them whenever the dependency is resolved, rather than completing the system's input dependencies manually as is the case here.
     
    Last edited: Aug 9, 2023
    optimise likes this.
  13. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    Regarding that sub-optimal parallelization you are pointing out here, we are in fact looking into that at the moment.

    The solver is currently parallelized by solving groups of independent constraints in parallel phases. We find these groups with a fast, greedy algorithm that is run sequentially (you will see it in the timeline as "CreateDispatchPairsPhasesJob") which has the limitation that it does not support more than a fixed number of such groups. All remaining constraints are then solved sequentially in a single job at the end of every solver iteration (there are 4 iterations by default), which is the part on the right you are pointing out.
    Specifically in these "object heap" cases with many stacked and piled objects, you can run out of parallel phases easily.
    Another example of this is the Planet Gravity demo in the PhysicsSamples project.

    As for the ParallelCreateContactJobs, if you scroll up you will see that the job is simply run on the main thread in this case, given that there is that "Complete()" call from the EntitiesGraphicsSystem, freeing up the main thread for work.
    So, the maximum number of worker threads is in fact used, the timeline just shows one job processing phase on the main thread in this special case. If we were to remove the Complete() call in the EntitiesGraphicsSystem mentioned above, the job processing would again show on the worker thread.
    I hope that makes sense.
     
  14. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    Finally, by using the "Edit Mode" option in the profiler (top left, by default set to "Play Mode") while in play mode I was able to see where the spikes that we observed above came from.
    These are caused by the profiler itself ;) which is infrequently updating its window.
    See here:
    upload_2023-8-9_15-43-55.png

    In the frame before the spike (marked on the left), the profiler window is not updated and there is no time spent.
    In the frame with the spike (marked on the right), we see the large processing overhead caused by the profiler window update.
     
  15. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    How about main thread job dependency issue at SolveAndIntegrateSystem? I believe SolveAndIntegrateSystem and those jobs all are owned by dots physics. Will official fix the issue that it costs nearly 2ms on main thread at desktop i9 cpu which is still a lot?

    upload_2023-8-23_22-37-51.png
     
  16. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    If you are referring to the
    Job.Complete()
    call we can see in my comment here, this one is currently necessary because the solver does not know yet how many jobs to spawn while the phasing algorithm is still operating, which decides how many parallel phases there will be.
    In a previous version, this
    Job.Complete()
    call was not present, which caused the solver to spawn as many jobs as there could be maximally required, which were way too many and overwhelmed the job system with unnecessary job schedules and executions (max "solver iteration" times "32 phases" many job executions), which in the end effectively slowed the physics pipeline down significantly. This is no longer the case.

    Ideally, we would launch a solver job that then launches the correct amount of solver phase jobs once the information comes in, freeing up the main thread. However, since in the C# job system there is currently no way to spawn jobs from within other jobs, this is not something that we can do at the moment.
    We do have an eye on that though as a general area of improvement.
     
    Last edited: Aug 23, 2023
    optimise likes this.
  17. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,623
    Optimise pointed me in this direction

    I currently have a bit of a vendetta against sync points in entities and its packages and have a fork where I've removed them all including for scheduling SolveAndIntegrateSystem where I turned the job basically into a work queue (really a double queue, 1 for worker items, 1 for phases)

    It actually runs faster because of the more optimized thread usage, though the actual cpu time is a fraction more due to contentions over my large core count (I'm pretty sure someone could optimize this a more.)

    All profiles from a build

    Before
    upload_2023-10-9_11-21-24.png

    After
    upload_2023-10-9_11-21-31.png

    But apart from that, I've removed sync points from every physics system and the entire fixed update loop (15 fps cap for 4x fixed update per frame)

    Before
    upload_2023-10-9_11-21-43.png

    After
    upload_2023-10-9_11-21-49.png

    Full source and changelogs (not all related to removing sync points)

    https://github.com/tertle/com.unity.entities
    https://github.com/tertle/com.unity.physics

    As far as I can tell from my limited testing, I've managed to keep the simulations identical.
    Video comparison - first drop is my syncless version, second is original Unity physics.



    (Sorry for poor quality, kept it low for discord upload)

    -edit-

    in case anyway was curious what I mean by work queue and wasn't sure what to look at in the code, this what I've changed with the ParallelSolverJob.
    I simply create N threads now and have each iterate every phase, grab some work continuously until the phase is done and wait for every other worker to finish the phase before moving next

    Code (CSharp):
    1.  
    2. var totalPhases = NumActivePhases[0];
    3.  
    4. for (var phaseIndex = 0; phaseIndex < totalPhases; phaseIndex++)
    5. {
    6.     var numWorkItems = Phases[phaseIndex].NumWorkItems;
    7.  
    8.     ref var counts = ref UnsafeUtility.AsRef<int>(((int*)Count.GetUnsafePtr()) + phaseIndex);
    9.     ref var workDone = ref UnsafeUtility.AsRef<int>((int*)WorkDone.GetUnsafePtr() + phaseIndex);
    10.  
    11.     int workItemIndex;
    12.  
    13.     while ((workItemIndex = Interlocked.Increment(ref counts) - 1) < numWorkItems)
    14.     {
    15.         var work = ExecuteInternal(
    16.             workItemIndex, phaseIndex, ref JacobiansReader, ref CollisionEventsWriter, ref TriggerEventsWriter, ref ImpulseEventsWriter);
    17.         Interlocked.Add(ref workDone, work);
    18.     }
    19.  
    20.     // Spin until all work has been done for this phase
    21.     while (workDone < numWorkItems)
    22.     {
    23.         Unity.Burst.Intrinsics.Common.Pause(); // is this the best idea?
    24.     }
    25. }
    The only other minor requirement that needed fixing up was a special case in physics where it should run the phase entirely on a single thread, which i basically just force onto the first work item

    Code (CSharp):
    1.  
    2. var count = 1;
    3.  
    4. // Note: If we have duplicate body indices across batches in this phase we need to process the phase
    5. // sequentially to prevent data races. In this case, we choose a large batch size (batch equal to number of work items)
    6. // to prevent any parallelization of the work.
    7. if (info.ContainsDuplicateIndices)
    8. {
    9.     if (workItemIndex != 0)
    10.     {
    11.         return 0;
    12.     }
    13.  
    14.     count = info.NumWorkItems;
    15. }
    16.  
     
    Last edited: Oct 12, 2023
  18. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,024
    @daniel-holz Have a look at this post #17. It improves the performance significantly and greatly benefits dots netcode
     
  19. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,623
    I'm pretty sure unity are already working on similar things from the netcode posts, I'm just impatient. More than that though, I just really enjoy a challenging architectural problem.

    (also if anyone is looking at the above profile, I didn't just magically make it run 4x faster it still has to do the work and is you have other sync points soon after there won't be much benefit. I just have a syncless project so it benefits me a lot and why I'm so into fixing this.)
     
    optimise likes this.
  20. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    @tertle : Many thanks for exploring this avenue.

    The approach makes sense. It's not unlike the idea I brought up above of spawning the required solver work from within a first solver job, but rather via a manually managed work queue. Given that we already have all the "work batches" calculated, your approach requires low overhead in the pipeline (all the data is already there and organized as required).

    I'll take not of all this and will attach it to our internal ticket on engine performance. There are a few spots in the pipeline which have potential for optimization. Removing the solver sync point without impacting "low load cases" is one of them.
     
    optimise likes this.
  21. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    210
    @tertle : And btw, regarding the other change you did here ("Uniform scale no longer bakes into colliders."), keep your eyes peeled for upcoming releases :). We've had that on our short list for improvements.
     
  22. tertle

    tertle

    Joined:
    Jan 25, 2011
    Posts:
    3,623
    Oh yeah I'm aware that's coming and looking forward to it (my implementation is pretty hacky) but I actually ran into a blocker around this a couple of months ago so I had to jump the gun.
     
    daniel-holz likes this.