Search Unity

Question Abbysmal performance in prediction system group + schedule parallel

Discussion in 'NetCode for ECS' started by l33t_P4j33t, Apr 14, 2021.

  1. l33t_P4j33t

    l33t_P4j33t

    Joined:
    Jul 29, 2019
    Posts:
    232
    whenever there's any system set to schedule parallel and updates in ghost prediction system group, the performance absolutely tanks.
    and these are quite involved multi thousand line jobs.
    shouldn't the extra threads make short work of this? is this intended behavior?



    this is with burst on, leak detection off, safety off, jobs debugger off
     

    Attached Files:

    Lukas_Kastern likes this.
  2. CMarastoni

    CMarastoni

    Unity Technologies

    Joined:
    Mar 18, 2020
    Posts:
    900
    It should in general and it is not the intend behaviour at all.

    That being said, the cost of scheduling jobs can be quite high. Scheduling a job may take up to 200 us in some cases, that all depends on how many safety handles are present and another bunch of things.
    Should be noted that in the build you will not get these costs at all. The safety system is disable in that case.

    Also, don't take those number as the ground truth (the EntityDebugger) but always check the Profiler for understanding the real CPU time and cost. The entity debugger only show you the time on the main thread (they don't consider the job time).

    Honestly, I didn't remember having seen that large amount of delta, in particular when jobs debugger is switched off. Is this view for the ServerWorld or the ClientWorld? What is your current frame rate ?
    Asking that, because as soon you drop under 60hz, since the NetCode run the prediction loop at fixed step, you start executing it multiple time per frame. On the client in particular, this is always the case in practice, because in general it rollback to the server snapshot and re-simulate up to the current tick. With latency 0, that means at least 2/3 tick ahead pretty much.
    Given the timing I see for your jobs in the non-thread case, very small, the cost of scheduling using threads (let's say 2/3 times per frame) is probably the dominating factor.
    If you take a look at the profiler you can also see how many loop per frame the client/server is doing and understanding where the costs goes
     
    l33t_P4j33t and Occuros like this.