Search Unity

Insane performance downgrade unless application launched via Build-and-Run

Discussion in 'Daydream' started by isidro02139, Apr 17, 2019.

  1. isidro02139

    isidro02139

    Joined:
    Jul 31, 2012
    Posts:
    72
    We are working on an education app that teaches communication skills in virtual reality. We are currently in the process of porting the project from Windows Mixed Reality, a desktop VR solution, to Google Daydream, and our device of choice is the Lenovo Mirage Solo.

    Obviously one major part of the porting process is to optimize our scripts and geometry to be performant on a Daydream device. We are able to get it to run at a stable 75FPS (which is the native frequency of the target device) - BUT ONLY IF WE RUN THE APPLICATION USING UNITY'S "BUILD AND RUN" COMMAND. Running the application through any other means - the device app menu, or via the adb - will result in an extremely slow, choppy experience, characterized by extremely large spikes (up to 100ms long) that show up in profiler as "XR.WaitForGPU" and which cannot be explained by simple VSync. Additionally, if the headset is taken off the head for at least 5 minutes (so the process is suspended / the device stops rendering and requires the controller to wake up), the performance spikes will go away.

    This behaviour has been confirmed to NOT depend on device temperature or build settings. It occurs in development builds and production builds and on multiple different Lenovo Mirage Solo devices, built from several different computers and OSs. The issue is present in both 2017.4.25f1 and the latest 2018.3.11f1 builds.

    Details

    It is as if the application has two states, the normal state where everything works properly, and the state with performance spikes. Which state the app will be in depends on how it was launched. If it was launched using Unity's Build and Run, it will be in the performant state, at a stable 75FPS, provides a smooth experience and is fully functional (profile image).

    If launched otherwise it will launch in the state with performance spikes. It will have a low framerate (about 50FPS) and, even worse, the frame drops will be staggered and irregular. This is incredibly disorienting. All the functionality is still there but it is completely unusable from a performance / subjective feel standpoint (profiler image).

    When profiling the problem, the "performance spike" state is characterized by multiple XR.WaitForGPU spikes, one spike every 2-10 frames. I have attached screenshots of the exact same scene with and without spikes.

    Now, I am no stranger to XR.WaitForGPU and I am familiar with its purpose. The theoretical time to render a frame on this device is 13.333ms, and I would expect that if we fail to meet our target framerate we will get dropped frames, and have to wait for an entire refresh cycle. Sad, but something I'd be able to live with and optimize. However, even waiting for a full refresh cycle would only make us wait around 13ms. You will notice that the spikes in question are not your regular waiting for the next frame - they are are HUGE, routinely 50-80ms per spike. I've seen some that were over 100ms. There are NO corresponding CPU spikes in the preceding frames, as far as the profiler is concerned the rest of the application is working perfectly and executing everything within the allotted time budget.

    In other news, the profiler tells me that our app doesn't seem to be doing anything to lock up the device's renderer for such long periods (100ms!). Furthermore, the exact same binary is perfectly capable of running smoothly and without spikes - provided it is started with "build and run" (or put to sleep for 5min and woken).

    Once the app is rid of the spikes it will remain in its performant mode until the process is restarted, even after multiple scene transitions the spikes do not return.

    Note that not only is it the case for two LAUNCHES of the same binary to be in different performance (one with spikes, one without), but also it is possible for a SINGLE launch to start with spikes and then after the trick (losing focus for 5 minutes) remain spike-free for the rest of its run.

    So while both of these tricks are great to allow us to obtain a smooth framerate, obviously they are not acceptable for a commercial product from a customer's perspective. Worse, if it was the other way around (build and run causes the spikes, normal on-device app starts do not) we would be able to shrug off this problem as it would not affect the customer.

    Everything the profiler tells me is corroborating my opinion that I was able to optimize the scene sufficiently to work on an Android device in 75FPS.

    Addition info and takeaways

    (1) I know that with the above description, you may be tempted to blame it on thermal throttling of some sort. It was the first thing to occur to me as well, which is why I ran several thermal tests and I have completely eliminated temperature and thermal throttling as a possible cause. I repeat, I HAVE COMPLETELY ELIMINATED DEVICE TEMPERATURE AS A POSSIBLE CAUSE OF THESE PERFORMANCE SPIKES. I have had the device render the scene for two hours without pause and get quite hot, and still have no spikes. I have also had a cold device start in the morning and immediately entering the state with spikes (which I was able to convert to no-spike state by taking the headset off).

    (2) I've spent a lot of time in VR including in apps that did not meet their FPS requirement so I definitely know what it looks like when you fail to render the frame in the required time. This is not just your regular judder where you get some double images and mild discomfort.

    (3) The Mirage Solo is a 6DoF device, which means asynchronous reprojection helps less, especially with lateral head movements. 6DoF on Android has not been around for all that long. This issue may have been present before but not as noticeable on a 3DoF device because reprojection would pick up the slack.

    (4) Both taking the device off the head (and have it pause the app / stop rendering as a consequence) AND waiting for some period of time (~5 minutes) are necessary prerequisites to eliminate the performance spikes. Doing either of the two on its own will not eliminate the spikes.

    Conclusions

    - Performance spikes seem to be caused by something external to our scripts and geometry
    - The GPU doesn't seem to be over encumbered.
    - The problem doesn't seem to be in our Unity scene set up. We closely monitor the CPU expenditure as well as GPU load and there doesn't appear to be anything that would cause such massive spikes / delays. Indeed, it is perfectly possible - by using tricks that have nothing to do with our code and scene setup - to get our scene to behave perfectly fine on device.
    - The problem doesn't depend on the binary itself. The exact same binary is capable of running with both performance spikes and without performance spikes.
    - The problem is not a property of the launched process, either. A single process can go from having performance spikes to not having performance spikes during its lifetime.

    Any ideas are much, much appreciated:)

    A&G
     
  2. thep3000

    thep3000

    Unity Technologies

    Joined:
    Aug 9, 2013
    Posts:
    400
    I haven't heard of anything like this. I don't have answers, but I can provide some details around what Build and Run does:

    Code (bash):
    1. adb shell am start -a android.intent.action.MAIN -c android.intent.category.LAUNCHER -f 0x10200000 -S -n {package}/{activity}
    You can see what these options mean here and here.

    The -f part is a combination of FLAG_ACTIVITY_RESET_TASK_IF_NEEDED and FLAG_ACTIVITY_NEW_TASK.

    I don't think there's anything special about this, but maybe try running your app like this and seeing if you see the better behavior? Might help us eliminate a variable.

    The other thing to note is that when your application is launched via the VR launcher, an intent extra is passed along android.intent.extra.VR_LAUNCH. You can simulate this by adding --ez android.intent.extra.VR_LAUNCH true to the above command. Unity looks for this, if found it will force VR mode and apply the following to the android activity window decor:

    Code (java):
    1. currentActivity.getWindow().getDecorView().setSystemUiVisibility(View.SYSTEM_UI_FLAG_HIDE_NAVIGATION | View.SYSTEM_UI_FLAG_LOW_PROFILE | View.SYSTEM_UI_FLAG_FULLSCREEN);
    I'd suggest trying different combinations of the above ways of launching to see if you can narrow it down further.
     
    isidro02139 likes this.
  3. isidro02139

    isidro02139

    Joined:
    Jul 31, 2012
    Posts:
    72
    Dear @thep3000 – thanks for your thoughtful reply. I have tried to launch the apk with:

    adb shell am start -a android.intent.action.MAIN -c android.intent.category.LAUNCHER -f 0x10200000 -S -n com.package.name/com.unity3d.player.UnityPlayerActivity


    and

    adb shell am start -a android.intent.action.MAIN -c android.intent.category.LAUNCHER -f 0x10200000 -S -n com.package.name/com.unity3d.player.UnityPlayerActivity --ez android.intent.extra.VR_LAUNCH true 


    ..but regularly start in the spiky state. The only way to recover "smooth" performance is to run via unity build & run or use the trick i.e. deep-sleeping the device for controller wake-up. Are you aware of any particular activities that occur on re: scheduling or thread prioritization relating to waking from this 'deep' sleep?

    Thanks again for your details prior comments.
     
  4. joejo

    joejo

    Unity Technologies

    Joined:
    May 26, 2016
    Posts:
    958
    If you create a brand new, clean, empty (DO NOT add any external assets or sdks) project and enable Daydream do you see the same things happening?
     
    ROBYER1 likes this.
  5. isidro02139

    isidro02139

    Joined:
    Jul 31, 2012
    Posts:
    72
    @joejo @thep3000

    After running some more tests, we have more or less managed to establish that the "spikes" problem is present in scenes with certain skinned mesh renderers. When these skinned mesh renderers and their animators are removed from the scene, the spikes do not appear. We are working on providing a minimal reproduction scene / build with detailed repro instructions.

    We have tried with a variety of settings including GPU skinning and graphics jobs and various combinations thereof.

    The animated meshes in question are nothing too fancy - humanoid characters with 6k vertices, 9k triangles, the skeleton is not overly complex. We render up to 4 such characters at any one time. We were willing to entertain the idea that the skinning was a performance bottleneck that somehow causes all these skipped frames, in some way that it is masked from the profiler. However, if the app is in the "no spikes mode" through the use of one of the tricks described above, the device is perfectly comfortable performing the skinning of these same meshes without any performance impact.

    @joejo re: the empty scene test, we do not see the same thing happening. Hoping to share the repro project asap.