Search Unity

  1. We are migrating the Unity Forums to Unity Discussions. On July 12, the Unity Forums will become read-only. On July 15, Unity Discussions will become read-only until July 18, when the new design and the migrated forum contents will go live. Read our full announcement for more information and let us know if you have any questions.

Question UpdateNavMeshDataAsync Locking Main Thread

Discussion in 'Navigation' started by metallibus, Mar 12, 2023.

  1. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    14
    If I submit a bunch of mesh data to NavMesh.UpdateNavMeshDataAsync(), the function itself does in fact run asynchronously, but it's queueing up a bunch of job work that seems to lock up the main thread either on that frame or a subsequent frame shortly after, while some large amount of the work completes. The returned AsyncOperation still isn't marked complete for a few more frames, so some amount of it is happening separately, but I'm consistently seeing one unbearably-long frame along the way.... Is there some way to remove this frame locking/bottleneck and allow the main thread to continue rendering without waiting on parts of the nav mesh jobs?

    These calls seem to be dominated by NavMesh.TileMesh... It seems that the frame doesn't start rendering until a bunch of these are finished, though some do seem to persist beyond the frame render, so not sure if this is really the culprit.

    upload_2023-3-12_15-35-12.png upload_2023-3-12_15-35-37.png

    Basically everything else is idle at this point, so I don't think there's anything else holding up the render timing:
    upload_2023-3-12_15-39-45.png
     
  2. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    14
    So, a little bit more info here...

    After drilling a little deeper, and closing the Scene view while profiling, it looks like the main thread is getting hung up trying to execute some rendering job...

    upload_2023-3-16_9-44-43.png

    It looks to me like the rendering system is trying to use a job for some portion of its logic, seemingly related to shadows, and waiting on the result.... But since the NavMesh system has submitted so many jobs, all the workers are full, so it looks to be switching to help finish those jobs, until its own is then executed....

    This seems insane.... If the render system needs jobs, why does it not have its own worker pool? Why can't it increase its own priority? It seems pretty crazy that submitting enough work to a system that's meant to help unblock main thread work, can in itself block the main thread...

    That being said, maybe this becomes more of a jobs question and less of a NavMesh question... unless there's some way to control how many workers the NavMesh calls use? I don't see anything clear, but since lots of these workers are stuck in mesh tiling, maybe submitting fewer meshes than there are job workers would alleviate this? But it would greatly slowdown NavMeshing performance, especially because I should be able to process multiple meshes per worker per frame, but wouldn't be sure that I won't bottleneck something if I try doing so. This seems a little silly to juggle, and also will mean I have to be careful about what other jobs I'm submitting as well....
     
  3. optimise

    optimise

    Joined:
    Jan 22, 2014
    Posts:
    2,137
    @metallibus I suggest u submit bug report for this issue
     
  4. DwinTeimlon

    DwinTeimlon

    Joined:
    Feb 25, 2016
    Posts:
    304
    It doesn't solve the overall problem I guess, but you actually can set MaxJobWorkers in NavMeshBuildSettings.
     
  5. Whipexx_DigitalSun

    Whipexx_DigitalSun

    Joined:
    Aug 8, 2017
    Posts:
    18
    For a while I wondered if you are someone working on my team, we've been looking at this for a couple days.
    As DwinTeimlon said setting max job workers to less than JobsUtility.JobWorkerCount will free some threads from the greedy Navmesh allocation.
    However this will only reduce the problem: I've seen that this happens a lot when some task A in a thread creates one or more threads and has to wait for them to complete to finish the work. If A goes iddle the threading system will use that thread for other tasks, like TileMesh; this causes A to have to wait until TileMesh completes to end the work. Even if the update to the navmesh is async and does not block the main thread thad task A might not be and that's what's causing the lag spike.

    I believe there might be something causing really long TileMesh executions when they shouldn't, but I'm still trying to blame it on my project and not navmesh. Are you also experiencing abnormally long calls to UpdateNavMeshDataAsync while others are much faster? If that's the case there might be something that's up.
     
  6. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    14
    Just looped back and saw this thread... I had opened a bug report and Unity didn't respond for months and then said it was "not a priority" to fix and that the issue can be "circumvented by using the recommended workflows in the Navigation package". I don't know what "recommended workflow" they mean besides just not doing this at runtime or only using small navmeshes.

    There is another comment from Unity in another thread saying their prioritization changed or something, but I cannot confirm if this actually fixes this issue or not.

    https://forum.unity.com/threads/bad-job-scheduling-halves-performance-in-35504.1145261/#post-8972248
     
  7. DEBBAJfarouk

    DEBBAJfarouk

    Joined:
    Jun 21, 2021
    Posts:
    6
    This problem still persists , when making a procedural infinite terrain the navMeshSurface take about 2 seconds to calculate which freezes the game even tho when I open the profiler it says it running on a worker thread from jobs system and main thread.
     
  8. Sameer1472

    Sameer1472

    Joined:
    Apr 1, 2020
    Posts:
    6
    I wanted to add something here (find my code at the bottom), I'm facing the same problem and it seems I have fixed it in my specific case. I finally figured it out that the lag spikes happened when I called UpdateNavMeshDataAsync again before the previous job had finished. To fix this, I simply don't update the nav mesh if the previous async operation hadn't finished. I also set the max job workers to 1.

    My guess is the jobs keep stacking if you call the update nav function before the previous one finishes and you run out of available workers and then the main thread has to wait until there's an available worker. In my game it's acceptable if the nav mesh takes some time to build as long as it doesn't block the main thread, which is why just to be safe I set the max workers to 1.


    Here's my code below, the only important thing in this code is that if the previous nav update hasn't finished, return out of the function and don't call UpdateNavMeshDataAsync until the previous one finishes.




    Code (CSharp):
    1.   void FixedUpdate()
    2.     {
    3.  
    4.         if (!m_initialized) return;
    5.  
    6.  
    7.  
    8.         float dist = (m_lastPos - m_playercar.transform.position).magnitude;
    9.         if (dist >= m_updateDistance)
    10.         {
    11.  
    12.             UnityEngine.Debug.Log("Already building:" + m_buildInProgress);
    13.             if (m_buildInProgress) return;
    14.             UnityEngine.Debug.Log("Starting another build");
    15.  
    16.             Stopwatch stopwatch = new();
    17.  
    18.             m_navBounds.center = m_playercar.transform.position;
    19.             stopwatch.Start();
    20.  
    21.             StartCoroutine(UpdateSurfaces());
    22.             stopwatch.Stop();
    23.  
    24.             m_lastPos = m_playercar.transform.position;
    25.         }
    26.     }
    27.  
    28.     public IEnumerator UpdateSurfaces()
    29.     {
    30.  
    31.  
    32.         NavMeshBuildSettings set = m_surfaces[0].GetBuildSettings();
    33.         NavMeshData data = m_surfaces[0].navMeshData;
    34.         set.maxJobWorkers = 1;
    35.         AsyncOperation op = NavMeshBuilder.UpdateNavMeshDataAsync(data, set, m_navSources, m_navBounds);
    36.         m_buildInProgress = true;
    37.  
    38.  
    39.         float timer = Time.unscaledTime;
    40.         while (!op.isDone)
    41.         {
    42.             yield return null;
    43.         }
    44.         m_buildInProgress = false;
    45.         UnityEngine.Debug.Log("Nav Mesh Time: " + (Time.unscaledTime - timer));
    46.  
    47.     }
    48.  
     
  9. Whipexx_DigitalSun

    Whipexx_DigitalSun

    Joined:
    Aug 8, 2017
    Posts:
    18
    While not ecactly the same issue there is another issue that "randomly" causes these same side effects. I've been working on it for a while but to sum up.

    - It's actually fixed on Unity 6000 (or I haven't managed to properly reproduce it there)
    - If you use child collection mode on your nav surface the issue doesn't occurr either (but you take a heavy hit on performance because child mode seems to be inherently more expensive than the alternatives)
    - They won't be backporting it because the fix wasn't intentionall, it was a consequence of the improvements that have been made for 6000
    - Here's the bug report
     
  10. WBonX

    WBonX

    Joined:
    Aug 7, 2018
    Posts:
    66
    Spent 10 minutes rereading without understanding the problem.

    You are doing an async call that creates some jobs ... this are on other threads but since you still got the async some of the work could be (actually it is) on the main thread.

    It makes no sense using so many jobs, the average CPU has 2 maybe 4 cores that are already busy doing something else. You can create a very detailed(1carve x unit) 3/5.000km (unity units) world in 30secs with a single job.

    Looking at the hierarchy in the profiler you would be able to read/notice the internal calls done by the navmesh builder that actually hang the main thread (this are copying/allocating large list of data/ native array).
    This are proportional to the amount of sources you pass but more important, to the size of the growind navmeshdata passed, since this will be edited. Tile size plays a big role too.

    Thus... it is a matter of optimizing things.

    In my opinion the main problem is not being able to see what is done under the hood.
     
  11. metallibus

    metallibus

    Joined:
    Jun 1, 2019
    Posts:
    14
    No, the problem is that the NavMesh libraries will freely gobble all available workers, and starve out other jobs from running, including those that are critical for frame rendering. This is a scheduling problem between two core Unity libraries and I do not have access to change their scheduling behavior.

    If there is a worker count limiter, that could function as a workaround to an extent, but that means I now have to dedicate fewer CPU cores to chewing through my NavMesh, meaning I would then have to wait longer just so Unity even functions.

    I am fine with having a complex nav mesh, and that taking a few frames to update. Unity doesn't provide good ways to chunk this out without other headaches.

    I understand what's happening under the hood - it's a resource contention problem. This is what happens when you don't use an intelligent scheduler or implement any sort of priority system.
     
  12. alexVSilva

    alexVSilva

    Joined:
    Apr 2, 2020
    Posts:
    12
    I had the same issue while using NavMeshBuilder.UpdateNavMeshDataAsync(). Using the default navmesh build settings (NavMeshBuildSettings.maxJobWorkers = 0) were making the navmesh to be updated using all the job workers available (JobsUtility.JobWorkerCount), which starved necessary unity internal job workers (for example, shadows?). I didnt mind for the navmesh to take several frames to be updated so i set NavMeshBuildSettings.maxJobWorkers = 1. Fixed the stuttering I was experiencing.