Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Question Slow Physics: How do I examine this Performance Readout?

Discussion in 'Physics for ECS' started by Thygrrr, Jun 3, 2023.

  1. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    upload_2023-6-3_18-3-55.png

    My "Before Physics System Group" is extremely slow. Unfortunately, I cannot glimpse which system actually eats up all this time.

    (this is related to the threads stating that since pre65, physics is slow - it still is, in 1.0.10)

    So how can I examine this to find out where the actual waiting happens? The actual values drastically vary if Burst is off, so I have to profile this with Burst being enabled.
     
  2. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    upload_2023-6-3_18-9-15.png
    In the profiler, it looks like it's this.
    However, this call (or WairForJobGroupID) is nowhere in that system.

    It uses idiomatic foreach, does that create a hidden sync point after all?

    But the really weird part is: If I disable that system, NOTHING gets any better.

    upload_2023-6-3_18-13-22.png

    In fact, if I disable all my systems, the application still stays slow, which is why I think this is a problem with Unity Physics itself.

    upload_2023-6-3_18-17-4.png

    Evidence with all my systems turned off - now it's the next group, PhysicsInitializeGroup / BuildPhysicsWorld, that spends all its time waiting...

    upload_2023-6-3_18-48-39.png

    ... but waiting for what?

    Also, it almost looks as if the system is invoked twice in one frame?
     
    Last edited: Jun 3, 2023
  3. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    It feels like I'm running into this, especially with the multiple physics updates per frame, etc.
    https://forum.unity.com/threads/in-...ower-than-older-version.1429963/#post-9065278

    But my jobs that work with these singletons should be long finished, so I'm a bit confused as to how to investigate this slowdown further.

    Unfortunately, since most of my objects are mobile and dependent on reasonably accurate collision detection, skipping BuildPhysicsWorld isn't a viable approach.
     
  4. n3b

    n3b

    Joined:
    Nov 16, 2014
    Posts:
    56
    It waits for previous simulation step to complete, there is FIndOverlaps job bar is visible. The cause of the stall that forced system to spam simulation steps happened somewhere before.
    Also it's weird how solver phases are spread over the step (those multiple small dots in the Job tab). Could it be the change to the Unity Jobs?
     
    Last edited: Jun 8, 2023
  5. n3b

    n3b

    Joined:
    Nov 16, 2014
    Posts:
    56
    BTW, you can disable that catch up death loop by setting Project Settings -> Time -> Maximum Allowed Timestep equal to the Fixed Timestep. This will expose what actually causes slowdowns.
     
  6. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    You might have three steps happening in this frame. And the first "long" BuildPhysicsWorldSystem is waiting for completion of something. You can see this in the blue bar below it that reads "JobHandle.Complete()". It has to wait for completion of the systems (and the jobs schedule by these systems!) from the first step.

    Can you zoom in on the first little green bar in front of your first "long" BuildPhysicsWorldSystem?
     
    Thygrrr likes this.
  7. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    These Project Settings don't have any influence on the ECS time management, nor the time step.
    The time management you are referring to is implemented using the ECS rate manager interface.
    And the time step (by default 1/60) is set using the TimeStep property in the FixedStepSimulationSystemGroup.

    The well-known fixed stepping semantics for MonoBehaviors, that are controlled with the settings you pointed out, are implemented in ECS using different types of rate managers.
    There is the FixedRateCatchUpManager, which is the one that is used by default in the
    FixedStepSimulationSystemGroup
    :
    upload_2023-6-9_12-28-26.png
    When it determines that the simulation is running behind, it will perform multiple system updates in a single
    FixedStepSimulationSystemGroup
    to try and remain at real-time rates, which works up to a point.

    Then there is the VariableRateManager, which uses variable time steps for stepping things forward rather than multiple steps with fixed time steps.

    Finally, there is the simplest one, the FixedRateSimpleManager, which ensures that there is always one step done per update of the FixedStepSimulationSystemGroup.

    One could enforce this "single step" behavior (which I think you suggested) by setting the rate manager in the
    FixedStepSimulationSystemGroup
    to the
    FixedRateSimpleManager
    via the
    FixedStepSimulationSystemGroup.RateManager
    property.

    Note that changing the
    FixedStepSimulationSystemGroup.TimeStep
    conveniently also updates the time step used in the rate manager, as you can see here: upload_2023-6-9_12-30-43.png
     
    Last edited: Jun 9, 2023
    Thygrrr and n3b like this.
  8. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,113
    Any plan to expose as multiple layman's terms physics config at Project Settings in future release? Looks quite complicated to me. Anyway for now if I want to exit stalling death loop, which one should I change and how should I change to make it exits the loop quickly? CMarastoni suggests me to implement own FixedRateManager but seems like it's deprecated and rename to RateManager?
     
  9. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    To always do one step per frame you could use the
    FixedRateSimpleManager
    .
    OR you copy the code of the
    FixedRateCatchUpManager
    and expose and play with the min / max time stepping parameters to limit the number of steps per frame, or maybe even change the time step slightly if that works in your simulation (similar to the
    VariableRateManger
    ).

    Which one works best for you depends on your exact use case.
     
  10. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    That spread out ParallelSolverJobs situation should be fixed with the next version, if it's what I think it is. There was an issue which caused more jobs to be spawned than needed. This is fixed now. This could be that issue, or something else. Not sure exactly. In any case, it's worth trying to run it with the next version first to reassess.
     
    n3b likes this.
  11. n3b

    n3b

    Joined:
    Nov 16, 2014
    Posts:
    56
    Thanks for the details about rate managers, it's very useful.
    Regarding this jobs situation - are you referring to the next version of Physics, ECS, Collections, or the engine itself?
     
  12. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    Just Unity Physics.
    We noticed that there were an abundance of solver jobs scheduled when multithreading was enabled even when they were not needed at all (no workload whatsoever) or only a significantly smaller number was needed (lower than maximal workload). This issue, which caused slowdowns, specifically in cases with lower contact and joint counts is fixed now and will be released in the next available version.
     
    n3b likes this.
  13. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Thank you for taking the time to reply and help me dig through this. It is most sincerely appreciated.

    That is one of my systems, Jovian.Systems.PIDSystem, 0.60 ms.

    I'm pretty sure it has some issues because for the love of god, I can't get Burst to compile anything outside the OnUpdate method, but IF this causes a deadlock, I need to understand why (and 0.6 milliseconds doesn't sound too bad for 20k entities). I use the AggressiveInlining attribute trying to imitate whatever Unity.Mathematics does, I even salvaged some of its adjacent code from QuaternionToEuler.

    Code (CSharp):
    1. namespace Jovian.Systems
    2. {
    3.     [BurstCompile]
    4.     [UpdateInGroup(typeof(BeforePhysicsSystemGroup))]
    5.     [UpdateBefore(typeof(VesselControlIntegrator))]
    6.     public partial struct PIDSystem : ISystem
    7.     {
    8.         [BurstCompile]
    9.         public void OnCreate(ref SystemState state)
    10.         {
    11.         }
    12.  
    13.         [BurstCompile]
    14.         public void OnDestroy(ref SystemState state)
    15.         {
    16.         }
    17.  
    18.         [BurstCompile]
    19.         public void OnUpdate(ref SystemState state)
    20.         {
    21.             // Avoid updating for first frame or other very small delta time values.
    22.             var dt = SystemAPI.Time.fixedDeltaTime;
    23.             if (dt <= float.Epsilon) return;
    24.  
    25.             //Process all vessels that have a PID controller
    26.             foreach (var (transform, velocity, controls, pid) in SystemAPI.Query<LocalTransform, PhysicsVelocity, RefRW<ShipControls>, RefRW<PIDControl>>())
    27.             {
    28.                 // Move towards desired position (or stay if position is zero)
    29.                 var posError = math.lengthsq(pid.ValueRO.goalPosition) > float.Epsilon ? pid.ValueRO.goalPosition - transform.Position : default;
    30.                 controls.ValueRW.translate = ComputePID(ref pid.ValueRW.posI, posError, -velocity.Linear, dt);
    31.  
    32.                 // Rotate towards desired orientation (or stay if orientation is zero)
    33.                 var rotError = CalculateError(transform.Rotation, pid.ValueRO.goalOrientation);
    34.                 controls.ValueRW.pitch = ComputeRequiredAngularAcceleration(rotError, transform.Rotation, -velocity.Angular, dt);
    35.             }
    36.         }
    37.  
    38.         #region Computations
    39.  
    40.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    41.         private static float3 ComputePD(float3 error, float3 delta)
    42.         {
    43.             const float gainP = 1f;
    44.             const float gainD = 1.5f;
    45.  
    46.             var output = gainP * error + gainD * delta;
    47.             return output;
    48.         }
    49.  
    50.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    51.         private static float3 ComputePID(ref float3 integral, float3 error, float3 delta, float deltaTime)
    52.         {
    53.             const float gainP = 1f;
    54.             const float gainD = 2f;
    55.             const float gainI = 0f;
    56.             const float saturation = 0.2f;
    57.  
    58.             integral += error * deltaTime * gainI;
    59.             integral = math.clamp(integral, -saturation, saturation);
    60.  
    61.             var output = gainP * error + gainD * delta + gainI * integral;
    62.  
    63.             return output;
    64.         }
    65.  
    66.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    67.         private static float3 CalculateError(quaternion currentOrientation, quaternion desiredOrientation)
    68.         {
    69.             //No rotation desired (default quaternion, not identity!)
    70.             if (math.lengthsq(desiredOrientation.value) < float.Epsilon) return float3.zero;
    71.  
    72.             //World Space Quaternion rotation to reach desiredOrientation
    73.             var requiredRotation = math.lengthsq(desiredOrientation.value) > float.Epsilon ? RequiredRotation(currentOrientation, desiredOrientation) : quaternion.identity;
    74.  
    75.             //World space Euler rotation.
    76.             return QuaternionToEuler(requiredRotation, math.RotationOrder.ZXY);
    77.         }
    78.  
    79.  
    80.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    81.         private static float3 ComputeRequiredAngularAcceleration(float3 error, quaternion currentOrientation, float3 currentAngularVelocity, float deltaTime)
    82.         {
    83.             //Angular Velocity is in local space, but should be considered in world space.
    84.             var delta = math.mul(currentOrientation, currentAngularVelocity);
    85.             return ComputePD(error, delta);
    86.         }
    87.  
    88.  
    89.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    90.         private static quaternion RequiredRotation(quaternion from, quaternion to)
    91.         {
    92.             var requiredRotation = math.mul(to, math.inverse(from));
    93.  
    94.             // Flip the sign if w is negative.
    95.             // This makes sure we always rotate the shortest angle to match the desired rotation.
    96.             requiredRotation = requiredRotation.value * math.sign(requiredRotation.value.w);
    97.  
    98.             return requiredRotation;
    99.         }
    100.  
    101.         #endregion
    102.  
    103.         #region QuaternionToEuler
    104.         // Note: taken from Unity.Animation/Core/MathExtensions.cs, which will be moved to Unity.Mathematics at some point
    105.         //       after that, this should be removed and the Mathematics version should be used
    106.  
    107.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    108.         private static float3 QuaternionToEuler(quaternion q, math.RotationOrder order)
    109.         {
    110.             const float epsilon = 1e-6f;
    111.      
    112.             //REDACTED (just a boatload of math)
    113.      
    114.             return EulerReorderBack(euler, order);
    115.         }
    116.  
    117.         [MethodImpl(MethodImplOptions.AggressiveInlining)]
    118.         private static float3 EulerReorderBack(float3 euler, math.RotationOrder order)
    119.         {
    120.             switch (order)
    121.             {
    122.                 case math.RotationOrder.XZY:
    123.                     return euler.xzy;
    124.                 case math.RotationOrder.YZX:
    125.                     return euler.zxy;
    126.                 case math.RotationOrder.YXZ:
    127.                     return euler.yxz;
    128.                 case math.RotationOrder.ZXY:
    129.                     return euler.yzx;
    130.                 case math.RotationOrder.ZYX:
    131.                     return euler.zyx;
    132.                 case math.RotationOrder.XYZ:
    133.                 default:
    134.                     return euler;
    135.             }
    136.         }
    137.  
    138.         #endregion
    139.     }
    140. }

    The system right after it is Jovian.Systems.VesselControlIntegrator, an older SystemBase style system still with one of those Entities.ForEach abominations.

    I somehow have the nagging feelign that the "ScheduleParallel" in that one could be causing some of the issues (because it interacts with the physics components directly)

    Code (CSharp):
    1. using Jovian.Components;
    2. using Unity.Burst;
    3. using Unity.Entities;
    4. using Unity.Mathematics;
    5. using Unity.Physics;
    6. using Unity.Physics.Extensions;
    7. using Unity.Physics.Systems;
    8. using Unity.Transforms;
    9.  
    10. namespace Jovian.Systems
    11. {
    12.     [UpdateInGroup(typeof(BeforePhysicsSystemGroup))]
    13.     [BurstCompile]
    14.     public partial class VesselControlIntegrator : SystemBase
    15.     {
    16.         [BurstCompile]
    17.         protected override void OnUpdate()
    18.         {
    19.             var dt = World.Time.DeltaTime;
    20.             //Hey, this indents kinda nicely for once. :)
    21.             //TODO: Convert into jobified ISystem
    22.             Entities
    23.                 .ForEach(
    24.                          (
    25.                              ref PhysicsVelocity velocity,
    26.                              in PhysicsMass mass,
    27.                              in LocalTransform transform,
    28.                              in Propulsion propulsion,
    29.                              in ShipControls controls
    30.                              ) =>
    31.                          {
    32.                              //Standard thrust addition.
    33.                              var linear_accel = controls.translate;
    34.            
    35.                              //Transform to local space, apply thrust limiters, then transform back to world space.
    36.                              linear_accel = math.mul(math.inverse(transform.Rotation), linear_accel);
    37.                              linear_accel = math.clamp(linear_accel, propulsion.ThrustMin, propulsion.ThrustMax);
    38.                              linear_accel = math.mul(transform.Rotation, linear_accel);
    39.            
    40.                              velocity.ApplyLinearImpulse(mass, linear_accel * dt);
    41.            
    42.                              //Todo: Velocity dependent turning.
    43.                              var angular_accel = controls.pitch;
    44.                              angular_accel = math.clamp(angular_accel, -propulsion.TurnMax, propulsion.TurnMax);
    45.                              var world = velocity.GetAngularVelocityWorldSpace(mass, transform.Rotation);
    46.                              world += angular_accel * dt;
    47.                              velocity.SetAngularVelocityWorldSpace(mass, transform.Rotation, world);
    48.  
    49.                              /*
    50.                               TODO: Bring it back :)
    51.                              //Flow Logic - make it feel more like a ship in water.
    52.                                  var linear_magnitude = math.length(velocity.Linear);
    53.                                  if (linear_magnitude > 0)
    54.                                  {
    55.                                      var linear_direction = math.normalize(velocity.Linear);
    56.  
    57.                                      //Apply trim tensor logic allowing to preserve forward velocity through turns.
    58.                                      FIXME: Wait a sec, this isn't local right / up, this is world right / up? can't be right!
    59.                                      var lateral_flow = propulsion.TrimX * math.abs(math.dot(right, linear_direction));
    60.                                      var vertical_flow = propulsion.TrimY * math.abs(math.dot(up, linear_direction));
    61.  
    62.                                      var flow = (lateral_flow + vertical_flow) * linear_magnitude * dt;
    63.                                      var inert = linear_magnitude - flow; //this is a remainder, so no dt
    64.  
    65.                                      var linear_flow = flow * forward;
    66.                                      var linear_inert = inert * linear_direction;
    67.  
    68.                                      velocity.Linear = linear_flow + linear_inert;
    69.                                  }
    70.                                  */
    71.  
    72.                          }
    73.                         ).ScheduleParallel();
    74.         }
    75.     }
    76. }

    Zoomed in it looks like this:
    upload_2023-6-12_18-9-42.png

    However, everything is a bit faster now, I'm unsure what caused this (33 ms per frame now, rather than 50+)

    Stepping along the lines of this frame timeline, I see another interesting one in the "second" execution block (no idea why PIDsystem comes up there again?):
    upload_2023-6-12_18-11-59.png

    However, if I turn CourseCommandSystem off, nothing improves (well I don't get the double frame wrap all the time, but FPS still is ~30).
    upload_2023-6-12_18-15-10.png

    Curiously, all the waiting is now done by other systems. (which drove me crazy in profiling, because it's not good that the wait time is attributed to the job doing the waiting, it should be attributed to the one being waited for)

    Both CourseCommandSystem and MoveCommandSystem use Idiomatic foreach, look at a FixedList of pre-calculated waypoints (usually always empty in my scenario here), and write something to the PID component. (I wonder if that's causing some kind of deadlock? I could possibly move them forward in time, to do their work before PIDsystem)

    Or is idiomatic foreach really so terrible that I should jobify the systems first? They consume almost no time on their own, only if they have to wait on whatever it is they are waiting on. (the same thing CourseCommand is normally waiting on, unless it is turned off)

    Ultimately, in LateSimulationSystemGroup is another point where the Semaphore hits the Fan, because these Mouse and Pip systems are VERY simple and just do a lot of distance and matrix calculations. Some of them do lead to structural changes, which they put into ECBs provided by BeginSimulationEntityCommandBufferSystem. Again, many of these are older SystemBase derived systems, but they kind of have to be, because they read from UnityEngine.Input stuff (like the mouse position, button state, and modifier keys)

    I still believe the real problem must be something before CourseCommandSystem (or MoveCommandSystem).
     
    Last edited: Jun 12, 2023
    daniel-holz likes this.
  14. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    First off, only code within the system's main function (OnUpdate, OnCreate etc.) and code in Jobs can currently be burst compiled. Have a look at the bursted code using the Burst Inspector window and make sure you find your code there.

    If you use the right Output dropdown Display Option (see documentation link above) you will get annotation that shows your original function calls embedded inside the assembler code, which should help you identify if everything was successfully burst compiled.
    Also, to make absolutely sure that the function you want to burst compile is successfully compiled, you can add the [GenerateTestsForBurstCompatibility] attribute above the function alongside the [BurstCompile] attribute.

    Second, is it possible that your application is somehow locked to 30 FPS? Maybe your time step is 1/30 Hz and that's why you get some waiting in the player loop to ensure we reach exactly that framerate? Just a thought...
     
  15. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Not that I know of, I run this on a 120 Hz display and it used to run at over 300 fps in an earlier version of Physics. Sometimes, even this one does - usually if I clear the Library folder, in a couple of instances, it will be fast for a couple of editor starts.

    I just verified - no calls to Application.targetFrameRate.

    I think it's back to finding out what job hangs here. Unfortunately, it still happens even if I disable all my systems.
     
  16. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    Make sure to have also a look at the jobs in your profiler. If a system "hangs" it is likely just waiting for some jobs to finish which manipulate some data that is used as an input to the system, or some jobs might be reading data that this system wants to write to.

    These data dependencies are very common and are modelled via job dependencies most of the time. In other cases, specific jobs are "waited on" by a system via JobHandle.Complete() which will block the execution at this point until the job has finished.

    It might be obvious but if you see a JobHandle.Complete() call in the profiler in one of your systems, that's exactly what is happening. To see which job is waited on you can either inspect the code of the system or simply look in the profiler's timeline at which point the system will continue (when the JobHandle.Complete() bar ends). The job that finishes at this point will be the one that this JobHandle.Complete() call was waiting on.
     
  17. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    upload_2023-6-14_20-11-1.png

    I took everything out that was simulating physical entities (left in some instantiation code).
    It all boils down to the initial culprit, BuildPhysicsWorld,

    Funny enough, there should be no reason for BuildPhysicsWorld to be run 4 times, as 7ms would be fairly good (at some 30k entities).

    Maybe it can never climb out of the deathloop once it's in there? I do some pretty heavy loading earlier on, spread across a large number of frames, but still heavy.

    I don't know how to read the "Jobs" category in the profiler, but from my intuition, it doesn't look like any job is stuck or has a particularly wide bar.
    upload_2023-6-14_20-14-7.png

    upload_2023-6-14_20-14-31.png

    The only suspicious call at the start might be:
    Profiler.ParseThreadData
    1.28ms
    Current frame accumulated time:
    1.89ms for 10 instances on thread 'Worker 9'
    9.59ms for 130 instances over 25 threads


    EDIT: This call is indeed HIGHLY suspicious, the more I look at it.

    In other frames, I even see
    Profiler.ParseThreadData
    3.51ms
    Current frame accumulated time:
    4.66ms for 6 instances on thread 'Worker 6'
    17.15ms for 102 instances over 25 threads


    I'll try to compare with the standalone profiler. EDIT: nah, it's slow even without any profiler running... you got any ideas?
     
    Last edited: Jun 14, 2023
  18. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    After pausing and unpausing the game, the "deathloop" doesn't occur, but it's still quite slow.
    (and it shouldn't be this slow - I've seen pretty much the same code, more of it, actually - run much faster in earlier Entities releases)

    I've also removed the systems in LateSimulationStystemGroup, because now the "blame" was put on them: (they just spend their time waiting, though, they can run in under 1 ms easily)
    upload_2023-6-14_20-21-33.png


    Then I get:
    upload_2023-6-14_20-28-2.png

    The small stippled joblets are all (and yes, also the ones in "PresentationSystemGroup", for some weird reason - is there, like, a second Physics world scheduling jobs outside FixedStepSimulationSystemGroup?):

    Solver:parallelSolverJob (Burst)
    0.000ms
    Current frame accumulated time:
    0.004ms for 85 instances on thread 'Main Thread'
    0.062ms for 1107 instances over 24 threads

    (which is nothing, even in sum, so why do BuildPhysicsWorld and UpdateHybridChunkStructure end up waiting 100x their cumulative, parallelization-ignoring duration for them? thread switches are expensive, but clearly not THAT expensive, or?)

    The one big chunk is:
    Broadphase:DynamicVsDynamicFindOverlappingPairsJob (Burst)
    0.710ms
    Current frame accumulated time:
    34.88ms for 46 instances over 25 threads

    Hmm, that accumulated time for that one looks suspiciously like almost my entire frame time. I presume it's highly parallel; and 0.710 ms is pretty good for ~30k Entities in relatively close proximity.
     
    Last edited: Jun 14, 2023
  19. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    As I also explained in this other post, if you see a long time bar on the
    BuildPhysicsWorldSystem
    (or any system for that matter), that does not at all mean that it's this system which is "spending" the time. As you can see, in your case, the
    BuildPhysicsWorldSystem
    is doing the following on the main thread (the top timeline outside of the job category):

    JobHandle.Complete()


    That means, the system is waiting for a job to finish.

    It's the same as in the thread I linked above. From the looks of it, in your case, there are as much as 5 (!) physics steps done per frame which is caused by the FixedRateCatchupManager in an attempt to catch up with real-time simulation through a lower number of renders and a higher number of physics steps (each with the same time step of by default 1/60 seconds). If things start taking too long and real-time simulation can not be achieved by simply a single physics step per frame, that's what happens.
    BuildPhysicsWorldSystem
    is updated in each of these 5 steps, but starting with the second one, they all have to wait until the previous step has finished, which means that all the jobs scheduled by the previous steps have to finish (collision detection, solver, etc.).

    If you don't want this to occur, always want a single physics step per frame and are fine with the simulation being a little slower than real-time, you need to change the rate manager (as explained here in this thread in an earlier post).

    And the Profiler.ParseThreadData entry is nothing to worry about. It's from the profiler itself which needs to do a bit of work to produce the timeline and collect the data. You can safely ignore these calls.
     
  20. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Thank you - I think I did realize that.
    How do I find out which job that is? I really can't seem to find it.

    How would you go and find that job? I thought the Profiler had some setting to show these semantic switches, but since I can't even see the job it's waiting for here, and there is stuff in PresentationSystemGroup that still waits on physics jobs, I'm at a loss.

    To rule out some other possible causes, I am examining any MonoBehaviours that still interact with EntityManager in a way that creates sync points, but there are hardly any left (apart from a scene loader). So far, to no avail.

    I also am okay with the catchup issue (but I still get poor frame rates - <30 fps - even with just one of these updates) even if there's a single catchup.

    But for development, I can make the game run with one tick per frame / no catchup and be fine with that. I just need to be able to tell if I'm way outside my intended performance envelope. That looked really good until recently, when these new issues came up and physics got really slow.
     
  21. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Where do I set this? In a [RuntimeInitializeOnLoadMethod]?
    (asking for a good practice / best practice here, I really don't understand where in the Playerloop/Application Lifecycle default systems and worlds are even created or their updates scheduled)

    Edit: I realize this isn't a class variable, and the constructor requires a current fixed deltatime value so... I need to do this every frame? Surely not in a MonoBehaviour::FixedUpdate?

    Edit: Or is that supposed to be a constant?

    upload_2023-6-15_11-25-52.png
     
    Last edited: Jun 15, 2023
  22. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Okay, this works to keep the rollover spiral from happening. This makes it much easier to debug the ACTUAL issue, which is still a very slow physics step, even if the game now runs at somewhat above >30fps.

    I'm expecting about 5x the performance here, and I have had that before. If I look at the profiler, all things the same except for the rate settings, I still see these massively inflated job waits that move to whatever system is left. (so if I turn all of them off, it's BuildPhysicsWorld).

    Curiously, it moves between entire groups of stuff, even; from PhysicsInitialize to LateSimulation, even to PresentationSystem groups.

    My problem is this huge empty space.
    (or rather, what's not in it)

    What's happening in that space. :) Why does it happen for any two scheduled systems (it only happens twice, meaning any number of systems N>2 will have the last 2 systems waiting on something, perhaps).

    upload_2023-6-15_11-40-14.png
     

    Attached Files:

    Last edited: Jun 15, 2023
  23. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    upload_2023-6-15_12-4-31.png

    Even if I move stuff to AfterPhysicsSystemGroup, they will spend a long time waiting.

    Huge idle blocks there.

    upload_2023-6-15_12-11-50.png

    Even huger if I move a specific raycast from LateSimulationGroup to BeforePhysicsSystemGroup.

    (it fails in AfterPhysicsSystemGroup, which I guess is some sort of case where it would work on an old world, which is weird (I wouldn't expect the world to be gotten before the system starts).

    upload_2023-6-15_12-14-55.png

    But now... I wholly commented that system out.
    Now the insane wait state happens in an unrelated system.
    upload_2023-6-15_12-18-24.png

    I think I understand the ECS scheduler even less than when I started this thread.
    (change is good, though - hoping for that breakthrough, any minute now, please)


    The best thing would be a hint how to find out what is in that empty space, what it really is that is being waited on.
     
    Last edited: Jun 15, 2023
  24. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Yeah, no idea why physics jobs are even running in PresentationSystemGroup.
    Again utterly confused about the scheduler and how to interpret the profiler output. o_O

    upload_2023-6-15_12-52-38.png

    (well I kind of get why it's shown this way, but I'd expect that sync point and CompleteAllJobs before PresentationSystemGroup is even scheduled - probably before TransformSystemGroup or any of the motion smoothing stuff)

    Looks like the exact inverse in reality, though:
    upload_2023-6-15_13-14-45.png
     
    Last edited: Jun 15, 2023
  25. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    Let me go through your questions and concerns one at a time, starting with this one.

    The accumulated time is simply the sum of all the times spent by all the jobs on the workload and is not relevant for the effective time spent in the timeline of your game.

    In the case you mention,
    BroadphaseDynamicVsDynamicFindOverlappingPairsJob
    , these jobs find all the overlapping pairs of colliders in a 100% parallel manner (fully data parallel problem). For a large number of colliders and based on how close they are to each other, there will be a very large workload that has to be processed. Luckily this is done in parallel. So, while the accumulated time (the sum of all the individual jobs' times spent, which directly represents the overall workload) is very high (34.88ms in this case), the effective time, which will present as the continuous time that will pass in the game's overall timeline is very small (only 0.71ms).
    And that effective time is what matters for the performance of your game.

    If this effective time would be very high in this case (say, 10ms) then you would see that all these jobs that do this work will occupy 10ms of continuous (!) time along the timeline in the profiler, occupying effectively a slice of 10ms in your game frame of 16.66ms (assuming you want this to run at 60 Hz). That would be a huge amount.

    Now, how can one get a better understanding of the effective time occupied by the jobs in each frame?
    Let's use your example to figure this out.
    Looking under jobs, and selecting the individual green time slices, you will be able to identify the slice in the timeline during which the
    BroadphaseDynamicVsDynamicFindOverlappingPairsJob
    jobs are active.
    And you will see that at some point in time t, these jobs (there can be many!) are starting to work on multiple worker threads simultaneously, trying to get the total workload done.
    From point forward, some of the jobs will finish with one task in the workload, and then a new job might get spawned that picks up another task in the workload and so on and so on until the entire workload is processed.
    And at some time t + x, in your case x = 0.71ms, all of them will have already finished, meaning, all of these jobs took less than a ms to get the workload processed. So, that is a GOOD situation, no matter what the "accumulated time" displayed in the profiler is when you select one of them.

    If the time at which they are done would be t plus some high value, t + 10ms for example, THEN that would be very slow since 10 ms out of your total frame's budget of 16.66 ms would be occupied processing that workload. But that's not the case here, so all is good and you can keep looking around to find the real performance slowdown, which might be some other batch of jobs in another slice of time on the timeline.

    With this approach and looking at the Jobs timeline section in the profiler, you can look at your frame data and see what portion of the physics step is taking up a large "effective" slice compared to your overall frame budget.

    Let me know if that helps.
     
    Last edited: Jun 15, 2023
    Thygrrr likes this.
  26. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    Unfortunately there is no immediate way I know of to identify precisely the job that a JobHandle.Complete() call is waiting on.
    One thing you can try though is to enable "Flow Events" in the profiler, which will help you identify where the jobs on the timeline are coming from. It adds little arrows to the systems that schedule jobs and if you click on the arrow, it brings you to all the jobs schedule by the system.

    Here is how to enable these event markers:
    upload_2023-6-15_10-33-48.png

    In an example here I am clicking on the arrow added to the BroadphaseSystem, which will identify all the potentially intersecting pairs of colliders, and among others it will launch these "find overlapping pairs" jobs (
    BroadphaseDynamicVsDynamicFindOverlappingPairsJob
    ) which I highlighted on the right. As you can see, following the little arrow under the system (top left) allows you to get to the point where the jobs scheduled by this system are actually started to get processed by the worked threads, which might not be immediate (and it isn't as in this case here) because the worker threads are still busy with other jobs.
    upload_2023-6-15_10-38-15.png

    Now, to the most important part: understanding what jobs are done when a JobHandle.Complete() call is finished.

    What you can here is to simply open the Jobs section of the profiler and manually inspect it at the place where JobHandle.Complete() call is DONE. Roughly at this point, when the jobs the JobHandle.Complete() was waiting on are finished (at time t + x, as described in my post above) the JobHandle.Complete() time bar will also end.

    I illustrated this process for one of your screenshots. The x identifies that point in time when JobHandle.Complete() finishes, and the dashed arrow identifies the jobs that are done roughly at that moment, which are very likely the jobs that this JobHandle.Complete() had been waiting on.
    In this particular case, this rendering system is not even waiting for one particular job to finish, but for ALL of them (CompleteAllJobs() call). So, you can expect that all worker threads will be "blank" (no more jobs in the pipeline for any of them) at the end of the bar of CompleteAllJobs(), which is exactly what we see here.

    upload_2023-6-15_10-1-37.png

    Let me know if I explained this properly.
     
    Last edited: Jun 15, 2023
  27. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    Here you can hand in the Timestep property from the FixedStepSimulationSystemGroup to keep the time step the same.
    Alternatively you can pass in
    SystemAPI.Time.TimeData.DeltaTime
    , which will be the same value.
     
  28. daniel-holz

    daniel-holz

    Unity Technologies

    Joined:
    Sep 17, 2021
    Posts:
    277
    In the topmost screenshot you are seeing what is going on.
    All the worker threads, in the "Jobs" section of the profiler, are occupied dealing with a huge physics workload from the look so fit.
    If you click on the individual sections in the Jobs timeline you will be able to identify what takes how much time.

    This would also allow you to compare this version of your simulation with a previous simulation you say was faster and find out which parts in the physics timeline are now slow for some reason. That would give us some indication of whether there might even be a regression or not, or whether your game is now simply behaving differently in some way or presents a different work load to the system, leading to a higher time consumption.

    Here I annotated your topmost screenshot with the results of my manual inspection, which shows that the jobs scheduled by the PhysicsSystemGroup (which contains all the physics systems required for stepping the physics forward one time step) finish at t = 16ms in the continuous time of your game's frame. This is a significant time consumption, considering that you have only 16.66ms of budget for the entire frame for stable 60 FPS @ 60 Hz simulation.
    upload_2023-6-15_10-21-2.png
     
    Thygrrr likes this.
  29. Thygrrr

    Thygrrr

    Joined:
    Sep 23, 2013
    Posts:
    699
    Thank you so much for the very detailed explanation.
    It has cleared up most of it for me now, and I can focus on isolating potential scheduling or performance issues in my code now.

    I'll modernize it to ECS 1.0 to ensure there's more consistency when reasoning about the systems


    A last question about interpreting this flow events readout:

    Is there a way to see what's done instead / before the point in time at the end in WaitForJobGroupID? The Workers look predominantly idle. What are they idling for? (given that so much time passes)

    This is the brunt of the physics workload (well, some of these jobs all come together here)

    Unity_FHBsNROjJ8.png


    It turns out that the the flow event view has some overlaps, for instance here, BubbleTenancySystem and ParallelIntegrateMotionsJob have both an arrow on Worker 12. Does that mean they compete?

    Unity_7pLGsOZZ9A.png



    Interestingly the systems that are (in my perception) wrongfully delayed, in this case MouseOver and PipHighliughtHoverSystem, don't have any flow events (they're on the main thread/idiomatic foreach).

    Yet they are only slow in this particular configuration, if I put another system behind them, they are very fast (and the new system takes that really long time to complete)

    They perform (one or two) world.CastRay operations, and are run in LateSimulationSystemGroup. I assumed Physics is completely over by that time (i.e. "LateSimulationSystemGroup" waits for all Jobs of earlier groups to complete).

    Is this interpretation close to what's possibly happening?

    EDIT: It's not the raycasts. The performance of the system is corellated, but not linearly, to the number of entities (even though it's O(n)). So at 100x fewer entities, I see barely 10x better performance.

    upload_2023-6-19_11-13-14.png
     
    Last edited: Jun 19, 2023