Search Unity

Bug Bad Job Scheduling Halves Performance (IN-35504)

Discussion in 'C# Job System' started by Ponzel, Jul 22, 2021.

  1. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Hello everyone!

    I'm working on a simulation game which requires a lot of CPU time to simulate the world.
    In order to achieve a good frame rate, I tried running most of the calculations in jobs while the main thread is busy rendering.

    So I scheduled the jobs at the end of Update(). They would have time to complete until the blue BehaviourUpdate code is called in the next frame.
    This is the result:

    jobsDuringAnimator.png

    Unfortunately, Unity thinks my jobs are so important that it's ok for them to hold up the animator, delaying rendering.

    So I thought maybe single jobs take so long that the scheduler can't start an animator job. Since the scheduler can't interrupt jobs, we need to have some jobs finish and give the scheduler an opportunity to start an animator job. So I tried splitting each job into 4 - more jobs but they finish quicker. This is the result:

    jobsDuringAnimator2.png

    There are plenty of opportunities to let the animator do its thing, but Unity prefers to work on my jobs first.

    There is lots of unused time during the renderer and physics phase I could use to run the simulation. Ideally, I'd be able to assign a lower priority to my jobs so Unity can do what it must when necessary and my jobs could fill the unused time in between.

    Please let me know if you have any idea how to use the job system so that it uses all the available time in the frame or if that is possible at all. Thanks a lot for any input!
     
    redwyre likes this.
  2. Yoreki

    Yoreki

    Joined:
    Apr 10, 2019
    Posts:
    2,605
    I dont really work with animations / Animators, so i cant say too much about that in particular. However, as a general rule with Jobs you want to "Schedule Early, Complete Late", so it seems a bit weird that you schedule at the end of a frame, to collect the data in the next frame. But of course this may work.
    Last time i worked with DOTS, which was last year tho, Unity did a pretty fantastic job at deciding what can be calculated when. So it may be that some part of your architecture locks the animator out of doing its thing until some Job finished.
    Since Jobs utilize most of the CPU, but the Animator likely only runs on the main thread like most things in Unity, no amount of "giving jobs a lower priority" should change anything about the behavior you see. But i may be wrong.
    On that note, it would be good to see a bit more about how you handle the data the jobs prepare for your animators.

    Also, while im not sure anything official exists, some people seemingly created animation systems for dots, which may be worth looking into.
     
  3. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Thanks for trying to help! Let me address your suggestions one by one:

    My jobs are calculating the simulation and my code on the main thread is applying the results. So if I scheduled the jobs at the beginning of the frame, the main thread would have to immediately wait for the calculation to complete so it can apply the results. This way I first get the results from the last simulation step, apply them, kick off the next simulation step and then have time to do other stuff like rendering while the simulation for the next frame runs in the background.

    I can't apply and calculate at the same time - that would lead to race conditions.

    The cyan-colored bars in the picture above are animation jobs - on the main thread and otherwise. The animator is taking so long because it waits for its own jobs to finish. You can even see a blue colored bar in the main thread while the animator is waiting for its animation jobs: This means the main thread is processing one of my jobs instead of animation code.

    My jobs don't interface with the animators at all. The animators are just long-running background animations and the jobs calculates other stuff, like how many resources are in which item. This is also why the jobs can't block the animators, because they're not even referenced in my code, jobs or otherwise, they just start automatically when the object spawns and that's it.
     
  4. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    I just made a visualization to explain why I'm running jobs during the animation / rendering / physics phase and not during BehaviourUpdate:

    jobsVisualization.png

    Sure, Unity also runs some jobs during in the animator / rendering / physics phase, but as you can see in the profiler picture above, the worker threads are idling most of the time in that phase.

    It's fine if the scheduler isn't perfect, but half of the frame is spent waiting for jobs to complete and the other half the workers are pretty much idle - their processing power goes unused.
     
  5. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Hm, maybe one could hook into callbacks like "OnPreCull" or "OnPostRender" during the frame lifecycle in order to spoon-feed the job scheduler the jobs only a couple jobs at a time. That way it wouldn't be able to run all of my jobs before continuing with Unity's jobs.

    However, there is no way to get scheduling anywhere near perfect when I'm trying to have my own job scheduler spoon-feed Unity's scheduler.

    I'm wondering what the official way to do this might be? It can't be intended that you can't use multiple cores while the frame is rendering, right?
     
  6. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Hey,

    it's been almost 2 years, so I thought it would be worthwhile to ask again if the status quo has changed:

    Is there any way to prioritise jobs?
    It feels like we are at the mercy of random scheduling that may leave most threads idling and it's just not possible to utilize these hardware resources with Unity?

    If we are using the system wrong or there is another way, please let me know, thanks!
     
  7. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    12
    Wish I had an answer... But figured I'd at least chip in that I'm running into a similar problem in another thread about NavMesh...

    It seems absolutely crazy to me that the Unity rendering system uses jobs internally, yet it can't prioritize itself over other jobs... And in my case, Unity's own internal NavMesh jobs can overrun the rendering jobs if submitted enough data... If they're not going to bother implementing prioritization, why doesn't rendering at least have its own queue or worker pool or something?

    It feels like some of the Jobs philosophy is just missing the mark here - it's pitched as a way to multithread code, make better use of CPU cores, huge performance gains with burst, and not be limited to the main thread, with the logical conclusion that you'd want to use it for large amounts of work... Except that so much of the system is designed around short bursts of work, which make it clunky for 80% of the types of things you'd want it for. Stuff that runs for long periods of time and/or across many workers actually causes frame freezes which is probably what you were trying to get around in the first place....

    I appreciate Unity pushing into multithreading, async work, the burst compiler, etc, but so many of the APIs just totally miss the mark and I end up having to shelve them and do things an entirely different way because these only get 90% of the way there.

    Case in point, the NavMesh.UpdateNavMeshDataAsync() you would think be helpful in building NavMesh data in the background... Except that you have to submit the data on the main thread, you have to read the result on the main thread, and submitting too much data freezes the main thread when the renderer needs the job system. Why do I have to jump through hoops putting my inputs together, scheduling them back to main, and then schedule other repeating work to check for completion on main, only to find a brick wall of locking the main thread in circumstances I can't measure or predict?

    The jobs system is 5 years old and still this is unaddressed?
     
  8. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    12
  9. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    Is there a bug report filed for the issue?
     
  10. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    I'm glad I'm at least not the only one with this issue.

    I have not filed a bug report, since this doesn't seem like a bug, but more of a missing feature / design limitation.

    It's unfortunate, because it leaves the processor cores idling half the time, but maybe Unity developers are ok with that?
     
  11. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    Half the cores idling when there are jobs to schedule seems like a bug.
    Only thing I can think of rn is to make sure you are up to date and maybe batch jobs if you use foreach.
    I personally have the jobs scheduled in Update with the script order being before most other scripts. Then retrieve the data in Late Update to not deal with any info from the previous frame
     
  12. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    The issue is not quite that the cores are idling when there are jobs to schedule.

    The issue is:
    • low prio jobs being done first, while high prio jobs are holding up main thread from advancing
    • high prio jobs finish, now cores are idling. low prio jobs could have been done here, there is more than enough time

    Basically, it needs a float or int called "priority" to schedule with each job.

    I don't know how much priority Unity putting on these kinds of performance issues right now, I thought maybe they're ok with a job system like that for now?

    Or I'm missing something and there is some other way to use the cores?
     
    Last edited: Apr 14, 2023
  13. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    If you can't find anything for this in the docs I suggest to just make a report. Best case it will be fixed, worst case you spent a bit of time making it.
    Issues which don't necessarily are a bug also can get through to the issue tracker
     
  14. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    The docs say:

    upload_2023-3-16_17-14-24.png

    It does not clarify on "the appropriate time".


    Alright, created a bug report with simple example project, issue number IN-35504.
    Basically:
    • spawns 1700 animated cubes so animation system and rendering sytems have something to do
    • schedules some jobs that just wait for 0.5 ms (busy waiting)
    • observe that jobs are holding up animation system, worker threads are idle during rendering and physics phases

    See profiler:

    upload_2023-3-16_17-56-8.png

    If anybody's interested, I can upload the project publicly.

    @DevDunk thanks for trying to help.
     
    Last edited: Mar 17, 2023
    Yoreki and DevDunk like this.
  15. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    12
    @Ponzel Can you link to that issue in the tracker? I don't see a way to search by issue number, don't see it anywhere in search, etc.... Not sure if you submitted it in the issue tracker or as a bug report and if those end up in the same place?
     
  16. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Currently it's in the Unity bug reporting portal (I used Unity -> Help -> Report a Bug).

    My understanding is someone from Unity will take a look and put it in the public issue tracker if they think it is an issue.
    Currently the status is still "Open".
     
    DevDunk likes this.
  17. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    here is another profiler screenshot with annotations to explain the problem:

    Screenshot 2023-03-17 130602 - Kopie.png
     
  18. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    12
    FWIW, here's a screenshot from my case I mentioned in another thread with much worse delays due to submitting async requests to Unity's own NavMesh APIs.... Just feeding it a large chunk of data can cause it to queue tons of jobs and internally block the rendering system... Other examples in that thread also show the main thread going as far as to try to run some of these jobs to try to help out instead of just running the rendering job itself.

    I'll try to put together a sample project of this and submit a bug report with it as well when I have a bit more time...


     
  19. jiraphatK

    jiraphatK

    Joined:
    Sep 29, 2018
    Posts:
    300
    Have you tried scheduling job at other injection point beside Update ()?
    Baste-RainGames/PlayerLoopInterface (github.com)
    Also, not sure if it helps or even possible, if you just disable the graphic job so the rendering doesn't use worker thread to submit draw command

    I'm just guessing though. I have never used jobs before and have never observe if disable graphic job would actually leave the worker thread idle....
     
    Last edited: Mar 19, 2023
  20. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    12
    I've had submissions at other points in the main loop, but not bothered with PlayerLoop type stuff... I don't see why it would matter: Where you submit jobs shouldn't impact how they get scheduled or how the scheduler works.

    This might work as a workaround to this specific problem, but it still doesn't fix the underlying issue that jobs can't be prioritized over each other... And what if some other Unity system needs jobs? This might work as a one off fix here, not sure, but doesn't really solve the underlying problem... just waits for it to rear its head somewhere else.
     
  21. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
  22. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    thanks for trying to help!

    about scheduling jobs at some other point:
    I talked about this solution a little further up the thread: #5

    Just scheduling it somehwere else just moves the problem somewhere else. Ok, now the animator is fine, but instead particle effects are being held up. Or the particle effects are fine, but culling is held up. etc.
    You could try spoon-feeding your jobs bit by bit, but this is really bad for the dev process. As you develop your game and players build different things, the job execution times change. You'd need to build your own dynamic scheduler to schedule around Unity's job scheduler :D

    about disabling the graphics jobs
    I don't think you can honestly, and even if you could, that may be better, but not good, by a long shot.
     
  23. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    yes, I do and it's done that way in all the examples.
    If I don't call ScheduleBatchedJobs, the jobs don't start at all until unity's animator is being run. I guess Unity calls ScheduleBatchedJobs in there, which schedules both the animator jobs and my custom jobs.
     
    DevDunk likes this.
  24. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    btw, the incident status is currently still at "Open", just fyi
     
  25. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    It's been 1 month since I submitted the report using Unity's bug report feature.
    I haven't heard anything so far unfortunately.
     
    MadeFromPolygons and DevDunk like this.
  26. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    If another dev blitz day pops up ask there. Sadly issues related to performance tend to take up a lot of time (2 months for my VR performance issues).
    You can also edit the title to include the report ID and add the bug tag to it so it's more visible here
     
    Ponzel likes this.
  27. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Thanks for the suggestions, just did it!
     
    DevDunk likes this.
  28. kevinmv

    kevinmv

    Unity Technologies

    Joined:
    Nov 15, 2018
    Posts:
    51
    Hey hey!

    Thanks for the project and report and apologies no one has responded in such a long time.

    You'd be surprised to know that Unity does actually prioritize jobs on the C++ side, and the animation jobs are indeed high priority. The main problem here is that in 2021 the priority is binary (high or normal), and unfortunately, the priority system is repurposed for job dependency chain execution as well. So the Animation jobs end up being preferred over other jobs, but as jobs try to be run but can't because their dependency is outstanding, the outstanding job that can run may be inserted with the high priority work which is why you can see a mix of the slowjobs and animation jobs in your sample running together.

    In 2022.2, the wait logic has been changed to prefer parallel work we are waiting on before stealing new jobs. This helps make the Animation jobs take priority when we wait on them, but still isn't perfect for complete separation since we will still run unblocked jobs on a worker thread when possible once a job completes. We are indeed aware of the lack of control for grouping jobs and providing prioritization of work, and now that we have a more flexible job system foundation in 2022.2, providing mechanisms to control how jobs run is a key focus for us currently, although I don't have a exact timeline to provide to you.

    Once that system is in place there will still be work to be done though. e.g.

    > low prio jobs being done first, while high prio jobs are holding up main thread from advancing
    > high prio jobs finish, now cores are idling. low prio jobs could have been done here, there is more than enough time

    > Basically, it needs a float or int called "priority" to schedule with each job.

    One issue here is that the Animation jobs are scheduled (right at the end of Animators.PrepareFirstPass, and then again in the Animators.PrepareSecondPass) and then immediately waited on (the wait happens almost immediately in Animators.ProcessGraphJob) in the engine which is going to almost guarantee a stall regardless of had there been a means to prioritize C# jobs. We do give priority to those animation jobs while the wait is happening which is why SlowJobs do end up on the right side of Animators.ProcessGraphJob in the profile, the animation jobs just aren't given exclusive priority as mentioned above. I don't know the full details of the animation system but moving the animation main thread work into a job with a dependency would have removed that stall which would have reduced the necessity for explicit prioritization against non-animation jobs. I suspect it's due to other main thread only requirements that might exist in the engine but I'll follow up with the animation team.

    There are other places in the engine that we will take a look at once we have better grouping and prioritization mechanisms available, as I would prefer to provide the visibility of job groups and the tailoring of how they should be run as a possible configuration mechanism so you can better coordinate your work with the engine's with reasonable defaults. To do so requires some work on the job system side and then some grooming of the engine use of the job system to make it simpler to navigate as users.

    I know this ultimately is not amazing news / very helpful, but I do want to make sure you know you've been heard and we are working on it.

    All the best,
    Kev
     
  29. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    Thanks for the insights!
     
  30. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Hey Kevin, thanks a lot for the reply!

    It was very helpful to get some insight into what Unity thinks about this internally.

    I just want to clarify one thing:
    We are only talking about the animator because it happens to be the first thing to run after Update() is done. Other systems, like frustum culling could exhibit the same behavior. Completely job-ifying the animation system would be amazing, but I suspect a priority system will already be enough and give us most of the control we need.

    I'll check back for progress in a year or so, but if you have any news to share regarding this, please feel free to update here.
    I'll be watching this thread and take any news into account when thinking about the simulation systems for our games.

    Thanks!
     
    Last edited: Apr 25, 2023
    kevinmv likes this.
  31. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    Hello everyone.

    The issue just got closed with this answer:


     
  32. DevDunk

    DevDunk

    Joined:
    Feb 13, 2020
    Posts:
    5,060
    Glad they are aware, but I do not understand why issues like this won't be on the issue tracker (where people can read updates if there are any)
     
    MadeFromPolygons, Yoreki and Ponzel like this.
  33. Ponzel

    Ponzel

    Joined:
    Jun 17, 2017
    Posts:
    41
    I agree
     
  34. MadeFromPolygons

    MadeFromPolygons

    Joined:
    Oct 5, 2013
    Posts:
    3,983
    agree, its stupid