Search Unity

Long frame stalls in VR when loading assets async - Oculus

Discussion in 'AR/VR (XR) Discussion' started by benjcooley, Mar 9, 2016.

  1. benjcooley

    benjcooley

    Joined:
    May 3, 2014
    Posts:
    45
    We're experiencing very long stalls when running in VR mode with Oculus support whenever we call methods which asynchronously load asset bundles or scenes into an existing scene.

    The stalls occur in OculusWaitForGPU() and can be up to several seconds (measured in the profiler is groups of 200 - 700 ms each frame).

    When we disable VR support in the same app, there are no large noticable (multi second) stalls in asynchronous bundle load performance.

    We think we've narrowed the actual stalls down to the asset download handler in the WWW and UnityWebRequest system, but they could be being caused by the actual Async asset loads. We disabled the asset loads leaving just the WWW and also tried UnityWebRequest with the AssetBundleDownloadHandler and the stalls still occured.

    The methods we've tried to work around this issue:

    1. Legacy WWW.assetBundle - Stalls in OculusWaitForGPU()

    2. New UnityWebRequest with AssetBundleDownloadHandler - Stalls in OculusWaitForGPU()
    UnityWebRequest with BufferDownloadHandler, then using the byte buffer to call AssetBundle.LoadFromMemoryAsync() - Stalls in OculusWaitForGPU() - same behavior

    3. Using a queue to throttle both the number of outstanding requests in both WWW and UnityWebRequest, and also throttle the number of simultaneous calls to LoadFromMemoryAsync() - Stalls in OculusWaitForGPU()

    We tried setting the number of simultaneous web requests and simultaneous LoadFromMemoryAsync() to a variety of values, down to 1 request and 1 LoadFromMemoryAsync() and it still seems to occur.

    System Specs:

    - Late model VR rig with Titan GTX 980 NVidia. Latest version of Oculus drivers 0.8.0.0 and latest version of Nvidia drivers (double checked).

    - Versions 5.3.2 (our current default version), 5.3.3 (Latest stable Unity as of today), and 5.4.0 beta (tried each one).

    - Asset bundle behavior:

    - Some scenes with baked GI information
    - Several larger textures (2048x2048 - compressed)
    - Lots of bundle files (we maintain our own LRU system for individually packed asset bundles to reduce bandwidth to a minimum)
    - Since individual assets are packaged separately and loaded in dependency order, scenes are generally empty except for GI info.

    The stalls 'seem' to be occuring simultanous with scene activation when a scene download completes. However in our system most of the assets for a scene have been pre-loaded, and our scene bundle files are usually just GI information only, other assets are bundled separately.

    Again, no stalls occur when Oculus is not attached or the app is built without oculus support.

    Can post more information if needed. Will also file a bug report.

    Thx.
     
  2. benjcooley

    benjcooley

    Joined:
    May 3, 2014
    Posts:
    45
    We suspect possibly that the GPU driver may be crashing or resetting during async asset uploading. Any idea on whether OculusWaitForGPU() can be caused by graphics driver issues or resets?

    Any history of either nasty interactions between async asset uploading to the GPU and any particular nvidia drivers, or any specific issues with Oculus 0.8.0.0 and nvidia?
     
  3. benjcooley

    benjcooley

    Joined:
    May 3, 2014
    Posts:
    45
    Diving deep into this problem, it seems that this is basically the deal:

    1. If you are using either WWW or the new UnityWebRequest in 5.2.x onward, you'll get the same basic behavior. These seem to both stall in VR mode (likely because underneath the hood they're probably the same code...)

    2. The stalls are visible in non VR mode, but very small. In VR mode the stalls can be as long as several seconds in VR mode. Longest stall was more than 10 seconds. In non VR mode they are maybe 150-300ms max, but still there.

    3. The problem with the video driver resetting seems to be unrelated. We tried tracking video driver error spew and only got the "Video driver reset" message. Apparently windows 10 believes that more than 2 seconds is too long and is resetting the driverr.

    4. The stalls are unrelated to any processing done after data is received. We converted all web request to simply download raw data and the stalls were identical. We also used the ScriptDownloadHandler for the UnityWebRequest to eliminate any possible CG issues (it simply loads and throws the data away). Not GC related.

    5. We have a system to dynamically control the number of simultaneous web requests pending in our own queue. For earlier versions we simply set this to a large value (64) as this seemed to be optimal for using all available bandwidth. (Actually around 20 seems to give the best download performance.) Reducing this value to 1 (one) simultaneous pending request eliminates the stalls for the most part - at the expense of bringing total bandwidth down to about 1/8th what it is for 20 simultaneous requests. Two or more simultaneous requests tend to still bring stalls.

    6. When we simply cache asset bundles to disk, and load them directly, no stalls whatsoever. This eliminates asset bundle decompression and asset creation (and texture uploading) as the source of the stalls.

    ---------------------------------------------

    Finally... we replaced WWW and UnityWebRequest with BestHttp (from Unity Asset Store) to compare.

    Problem Solved.

    So apparently the current WWW/UnityWebRequest system just can not handle more than 1-2 simultaneous asset bundle requests when VR is running without stalling the rendering pipeline, and when rendering stalls it looks like the driver can occasionally reset.

    Since BestHttp seems to have no trouble, it is almost certainly something very specific to the WWW/UnityWebRequest implementation itself and not a driver, oculus, or system issue.

    If we have time we will try to build a simpler repro-project that isolates the problem and send it in. However since it doesn't happen with BestHttp, that may take a bit as we're a bit pressed for time.