Search Unity

Bug [0.51] FPS tightly couple with network condition

Discussion in 'NetCode for ECS' started by optimise, Jun 17, 2022.

  1. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    Looks like my dots netcode project game runtime performance has been drop significantly after upgraded to dots 0.51 release. Drops from around 60 fps to around 30 fps. And also seems like both Unity 2020.3.35f1 and Unity 2021.3.4f1 have this same significant performance drop issue.
     
  2. CMarastoni

    CMarastoni

    Unity Technologies

    Joined:
    Mar 18, 2020
    Posts:
    900
    mmm. I need more info about what it is slowing down in order to understand where the cause is.
    Can you do a profile and see what has changed in your project such to cause a 50% FPS impact?
    In term of NetCode, we didn't change that much in between 0.5 and 0.50.1 (major things are bug fixes)
     
  3. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    I mean 0.51 release not 0.50 nor 0.50.1 release. I will try to test it to find out.
     
  4. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    Alright. I guess now I know why it behave like that. Currently the dots netcode implementation is too sensitive and tightly couple with networking condition. Seems like when networking condition goes little bit weird or not stable it will greatly affect fps. When it happens u will see GhostPredictionSystemGroup frame time spikes insanely and the game will cut by half fps from 60 fps to 30 fps. I hope this part of netcode needs to rewrite to behave like other mobile game that networking condition won't affects fps until so crazy. You should still get 60 fps when networking condition has a little bit weird or not stable. But for my case maybe is a little bit weird. It's still working nicely at other moba mobile games. Just my dots netcode game keep getting 30 fps. I fixed it by just disconnect then reconnect with my mobile phone. I'm not really sure but maybe is that because of Unity Transport has bug at Android platform I dunno? Or it's just because I didn't enable Lag Compensation feature?

    1.png
     
  5. CMarastoni

    CMarastoni

    Unity Technologies

    Joined:
    Mar 18, 2020
    Posts:
    900
    Can you please clarify what you mean weird or unstable network condition?
    It is about latency changes (it jitter) or high latency? Packet drop?
    The fact the prediction group consume a lot of cpu is undeniable. If the latency is high we are predicting a lot of frame ahead of the server and when we receive the snapshot the rollback cost can be quite significant.

    I can't guess however how many ticks you are trying to predict ahead (you can get these from the net debug, but you may also just print the average) and which systems are slow (that are your implementation details).
    Furthermore, since I presume this is from the editor, how did you run the profile? Thread jobs enabled, Job Debugger enabled, Burst enabled but with with sanity checks (that slow down quite a bit).
    Would will be interesting to understand what is exactly taking all this time.

    We already worked on batching together multiple prediction frames but will be available for 1.0. That can save quite a lot of performance.
     
    optimise likes this.
  6. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    Alright. The profile screenshot that I show is captured from android develop il2cpp build at yesterday. I keep getting 30 fps when playing the game i.e. only drop to 30 fps from 60fps when player ghost spawned but I still can get solid 60 fps at other mobile moba games. Firstly I thought it's Unity 2021 issue and rool back to Unity 2020 to get another android build to test but still getting 30 fps. Then I rollback to dots 0.50.1 release and build android again still get the 30 fps issue. If I remember correctly the game ping is round 70ms+. Eventually I fix it by just disconnect and reconnect my mobile phone wifi.

    I want make it clear that it's not the performance issue since the game is able to get 60 fps that systems at GhostPredictionSystemGroup does not spend so much time like screenshot above. My theory is this issue is caused by the current implementation of dots netcode that when network condition not really well for whatever reason, the prediction will goes wrong and capped fps to 30 that the system at GhostPredictionSystemGroup will increase insanely high time.
     
    Last edited: Jun 18, 2022
  7. CMarastoni

    CMarastoni

    Unity Technologies

    Joined:
    Mar 18, 2020
    Posts:
    900
    Ok, so I was in the player, that mean we can remove some variables.
    Trying still to understand more: 70ms ping should imply (given the 2 extra tick of slack) the client should run approx 8 ticks ahead of the server.
    So every time you receive a snapshot from server (that send at 60 fps) and assuming (worst/best case scenario) you receive them every frame, you will do every frame on the client at 8 prediction steps.
    That means, given you screenshot, every step takes about 4-5ms at least.
    Are these number what are you experiencing or are they different? (maybe we are doing some miscalculation).

    Other questions are:
    How many ghosts?
    How many systems are running in the prediction loop?
    Are you using physics?

    I'm trying to understand if there is something in particular the is dragging your FPS down.
     
  8. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    I have only 1 predicted ghosts and a couple of interpolated ghosts but I believe interpolated ghosts won't affect GhostPredictionSystemGroup. I have quite a lot of systems at GhostPredictionSystemGroup but it not really takes lot of frames time at network network condition for most the systems since most systems will just rest with no matching component. Now I notice that there's 1 system takes really a lot of time when netw and I think because it needs to keep add/remove components. It's also the system that make the game drops to 30 fps when at bad network condition. I believe at dots 1.0 release I can improve it significantly with enable/disable component feature.

    I use physics for a couple of interpolated ghosts but it doesn't takes so much as the system I mention above. I just only use physics shape to do basic collision detection to check whether player hits the ghost. I also remove all physics related components from client and only exists at server for those ghosts since it makes the game drops so much fps at editor. I can't really understand why it drops so much fps when both client and server has same set of physics related components. It will keep dropping fps when there's more and more interpolated ghosts spawned. Seems like there's something wrong at prediction for physics too I dunno.
     
    Last edited: Jun 18, 2022
  9. CMarastoni

    CMarastoni

    Unity Technologies

    Joined:
    Mar 18, 2020
    Posts:
    900
    Interpolated ghosts are not predicted, as such they don't have a PredictedGhostComponent.

    If you are using physics only on the server, and removing all physics component from the client (that imply also PhysicsVelocity) physics prediction on the client in practice does not run for most of the entity.
    But..
    You said you removed the physics components but you are using physics for collision detection. So now I'm a little confused.

    Interpolated ghosts are configured as kinematic physics object (if they have a have a PhysicsVelocity component).
    As soon as you a predicted ghost, the prediction loop start running.
    When the BuildPhysicsWorld kick in, all the entities with PhysicsVelocity component are considered. That is also true for the StepPhysicsWorld and Export step. As such, event though interpolated ghost are not running "prediction" they are still needed to build and updated the client physics world state used by the few predicted physics ghost.
    As such, the interpolated kinematic ghost count (as cost) toward the physics simulation.

    Now, there are a lot of optimisation opportunities here: you can just build the PhysicsWorld once, and just update the AABB tree with the only one ghost that it is moving instead or rebuilding every frame.
    If the number of physics object ti small, better to disable thread scheduling for physics (see PhysicStepSetting) or just skipping using jobs altogether (I think physics sample show something for that).
    It also worth considering to add to the physics systems (Build, Step, Export etc) very little simple checks before executing / scheduling jobs that the query actually has entity to process.
    Something like:
    Code (csharp):
    1.  
    2. if(!query.IsEmpty())
    3.   call the jobs scheduling
    4.  
    this alone can easily remove about 5-6ms or even more (given you are doing 8/9 steps per frame, the cost of scheduling the physics jobs is quite large) in case there no entities to process. (but this is probably not your case).

    Because of the spiral of death. The server run at fixed time step (60hz lets say). There server it always does only 1 simulation step, until the performance drop below 60 doing more steps per frame
    Clients runs at variable frame rate and does multiple prediction steps per frame, that imply multiple physics step per frame as well.
    When client and server are on the same process, things can get really bad, and pretty fast.
    The client start consuming more CPU because it need to run more physics stuff. That means larger frame time.
    The server in turn does more steps, and the packets from the server are now even more ahead (because is doing more step per frame). That will cause in turn client running more prediction, that cause a larger frame time, than more steps for the server and so and so on.

    In 0.51 you can configure the server to "batch" multiple step together and limit the number of ticks there server does per frame. See
    ClientServerTickRate MaxSimulationStepsPerFrame and MaxSimulationLongStepTimeMultiplier.
    This should alleviate a bit the performance issue in the editor.

    That being said, if you one 1 single ghost, I think you are also doing something wrong somewhere. Like making tons of structural changes or other things. Also, you many not have burst-compiled some jobs (maybe).
    And given the low number of entities, probably scheduling the jobs it is still the larger cost. I suspect is going to be faster to just call Run on the main thread (and burst compile the job of course).
     
    Occuros likes this.
  10. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,129
    Means now the collision response callback logic after collision detection now run at server only like destroy a bomb ghost decrease player health. Not sure will that looks very lag behind at client when player playing the game.

    Do u mean modify BuildPhysicsWorld system at dots physics package? It seems like there's no way to achieve that other than modify dots physics package.

    Any plan to make client world runs as a different process and server world runs as another different process at dots 1.0 release?

    I think after I change this settings it will affect real client and server build right? It seems like I need to make the settings at editor only. Btw how much value should I set to the settings?

    It's SystemBase that call Run so I believe it's on the main thread and the Entities.ForEach dun have WithouBurst() so it's bursted. Although it's just 1 ghost but what the system does is it will loop through most of the child entities of the ghost to check something and add/remove component. I believe tat's why it's super costly and it will scale up when the prediction run more and more times per frame. The calls are keep fluctuating but I guess the screenshot below is calling 21 times per frame?

    upload_2022-6-19_23-41-24.png