Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

High CPU consumption with no explainable cause

Discussion in 'Unity 5 Pre-order Beta' started by olivier.gosseaume.maison, Jan 26, 2015.

  1. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    First sorry for my English which is not my native language.

    I would like to submit a very strange performance problem I get with Unity 5 (tested with beta 19 & beta 20) on Windows 7 64 bits (editor + player). I don't have Unity 4 to test this.

    I'm evaluating Unity 5 for my company not for building a game but for building a sort of visualization application for "vehicles" (think cars but not only) on a map in 3D (well at least "2.5D"). We are trying to see if Unity could replace our current implementation which works with Silverlight and is 2D only.

    The map is based on a tiling system I've written in C# that work exactly the same way as deep zoom, or google earth or cesium JS.

    The map is 2D, so we add static 3D buildings on it (meshes) just like google earth. This is not done yet.

    We can colourize some roads using transparent textures displayed just above (done).

    The vehicles can be hundreds at a time and move in realtime (or faster that realtime), and each one is a 3d mesh with a sticker (billboard with various content) floating above. My current testing prototype have 2000 objects with plenty that move in 3D world.

    Currently we don't use any lighting (not tested yet).

    We have very strict performance constraints for this application, since :
    - it should work on windows laptops or tablets without draining too much batteries
    - some instances of it could be deployed in vehicles on embedded PC which are not very powerful (Core i3 + HD 4000) and must not heat too much (passive cooling).

    I'm a performance nut, so I've optimized every bit of C# script we use for tiling system, mobiles and billboarding using the built-in Unity pro profiler. Nearly everything is procedural in my prototype. I know that my scripts run fast and don't do anything when not needed. Really.

    Currently we have no problem at all getting very good framerate on this prototype (150 to 300 FPS depending on hardware with 2000 objects if we unlock vsync) and are very pleased with that.

    But at the same time, we also get very high CPU usage (15% to 30%, which means one core out of for) for no apparent reason, and NOT ALWAYS. Because I've found (by accident) a way to get less than 1% CPU with same object count and high framerate by doing a very simple thing I will explain but this does only work in the editor, not on a standalone Windows player ...

    I could reproduce this behaviour on 2 completely different PCs with very different hardware :
    - Intel XEON E5540 (4 cores) + AMD ATI FIREPRO 4800 (my dev pc, don't laugh ...)
    - Core i7 notebook (4 cores) + NVIDIA K3000M

    Here we go. If I run the application with vsync on (QualitySettings.vSyncCount=1) *in editor*, I get a solid 60 fps (520 draw calls with a lot of dynamic batching, 3.0ms render thread, <1ms scripts), but also 20% CPU (nearly one core).

    Now here come the real strangeness ... If I simply insert one System.Threading.Thread.Sleep(1) (1ms sleep) in a gameobject's LateUpdate, then (tada), CPU drops to less than 1% (0.7% to be precise) ... I'm not crazy my cpu measurements are correct (process explorer + task explorer + in app measure using Mono all agree).

    Let's continue with QualitySettings.vSyncCount=0 and Application.targetFrameRate=0 (no sync, full speed) still in editor :
    - I get 150 fps on my slow dev pc (520 draw calls with a lot of dynamic batching, 3.0ms render thread, <1ms scripts), and 40-50% CPU. We are running full speed so this is understandable.
    - Now if I rerun with 1ms sleep, I get 100 fps and 0.7% CPU with exactly the same scene !!! The app is plenty smooth objects animate at 100 fps, everything flies...

    Now even more strange. I CAN'T reproduce this in standalone player. And badly, the player always consumes at least 20% cpu getting the same framerate as the editor. If I insert sleeps in code, it will lower the framerate (depending on vsync activation), the longer the wait the lower the framerate, but in NO WAY I can get 1% cpu. Never.

    My understanding on all of this, is that there must be some sort of "spinwait" between main thread and render thread inside Unity *or* between Unity and DX layer that cause this high CPU consumption.

    So I tried to understand with the profiler what was the difference between application executed in editor, and in standalone player. Here is what I saw:

    When running in editor (in timeline mode) on my crappy dev pc:
    - on the main thread, the only measurable script is my "sleep", and camera render last 3.4ms
    - on the render thread, I see *only* two Gfx.ProcessCommands for a total time of 2.4ms, and some Gfx.DrawDynamicBatch inside
    - the render thread always finish its work before the start of next frame in the main thread

    When running in standalone app (in timeline mode):
    - on the main thread, the only measurable script is my "sleep", and camera render last 3.3ms
    - on the render thread, I see two Gfx.ProcessCommands, the first is a short one of 1.2ms as in editor but the second cost 7.5ms because it calls Gfx.PresentFrame which lasts 5.3ms. What does Gfx.PresentFrame do ? Could the spinwait be here ?
    - the render thread does finish its work *after* the start of next frame in the main thread. This is strange and another difference with editor

    I also tried a Windows native profiler named "Very Spleepy" but didn't find anything conclusive. I see mostly two threads working.

    To add a little bit of strangeness, I also have to explain that when I start the editor, I can't simply get at first <1% CPU by running my app with 1ms sleep. I first must try many sleep values during the run until suddently the cpu will drop to 1% (I added a keyboard control to make a variable sleep in app for this). As soon as the cpu is <1%, I can run the app as many times as I want for all my session with 1ms wait and it will always "work" ...

    What I have also tested and discovered :
    - Quality settings play no role (fantastic, good, simple, etc...) same behaviour
    - DX9 instead of DX11. Same behaviour
    - Same behaviour without dynamic batching, except that as soon as drawed object count on viewport exceed some limit (let's say 1000), the CPU rise sharply
    - The ability to get <1% cpu seems to depends on drivers settings related to vsync in windows control panel at least with nvidia hardware. On nvidia k3000m I could not get less than 1% without forcing vsync off in nvidia control panel.

    I would be very grateful if someone from Unity could shed some light on this, and will be happy to provide any input needed for that.

    Beta 21 just appeared, I will try it.
     
  2. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    I can reproduce the same issue on an empty project with a scene containing 500 rotating cubes : <1% cpu in editor (after some fiddling with delays), 15% in player. I've uploaded the project in this post.

    To test :
    - open the project and launch, you will see the cubes rotating at constant speed
    - use keypad "+" to augment delay to 1ms. You will see quite high cpu usage (15% for example)
    - next use keypad "+" again to augment delay until you see cpu counter near 0. Then go back to 1ms delay with keypad "-"
    - use keypad "/" to reset cpu consumption sampling and see CPU staying very low (you can also check task manager or any tool you want)
    - use keypad "*" to reset delay to 0

    To reproduce, you may need to try multiple times to augment and reduce delay until you will see cpu dropping and/or relaunch multiple times (assuming you can reproduce on your system). Please try also will vsync disabled in your gpu control panel.

    You can also try to build a player. It will never run at low cpu usage.
     

    Attached Files:

  3. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    After a lot of time spent, I've found the cause and a workaround :)

    Will post in a few hours to explain (with a new example)
     
    yorong likes this.
  4. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    As promised, here is what I have found.

    The problem does probably impact only Windows and is caused by the way Unity reconfigure Windows "high performance timers".

    Those timers dictate a lot of things in Windows related to threading like context switch, wait granularity, etc ...

    Normally they run at 15.6ms resolution. You can check this with a microsoft tool on "sysinternals" website. The tool is called "clockres".

    When the Unity editor launch, it reconfigure the timers at 1ms with a call to Windows API NtSetTimerResolution(). Please note that changing this affect all current running processes.

    When timer resolution is set at 1ms, the CPU consumption bug which affects my application does appear probably because inside Unity something tries to make "sleep calls" which are too short, and somehow, those calls become spins and then eat a lot of cpu cycles for nothing. I can only suppose this, but clearly this is a bug for me.

    Approx, 90 seconds later (don't know why), the editor will revert back to 10ms. At this time, the application will behave properly, except that to completely eliminiate to CPU problem it is needed to insert a tiny 1ms sleep in code for each Update() call. Don't know why and I don't really care what interests me is the player.

    Now for the standalone player (this is the interesting part). When starting the player does also reconfigure timers to 1ms, but they stay like this forever eating a lot of CPU for nothing.

    I made a new example (enclosed in this post) that demonstrate this, and that include a workaround for the standalone player. The fix is very simple, I revert timers to 5mS and everything is well (0% cpu for the spinning cubes). I made this sample to demonstrate there is in my opinion a bug in Unity and a serious one that need Unity team attention.

    Here i how to use the sample :
    - open in Unity (any beta up to 21 is you want)
    - compile for standalone windows
    - quit all Unity editor instances, and all players. This is very important. Not doing this will prevent the timer reconfiguration to work (look a microsoft man pages for an explanation why)
    - now launch the just compiled player (choose Windowed, and Fantastic, and resolution you want)
    - you will see 500 spinning cubes at 100 FPS and high cpu usage (18% on my pc)
    - you should see "1ms timer res." meaning the player is running with 1ms timer resolution
    - now hit backspace key. Doing so will revert timers to 5ms (you should see "5ms timer res."). Look at the script to understand how i do that
    - now look at cpu. You should see something near zero (0.2% on my pc). Nice that's 100x less cpu consumed :)
    - if you hit backspace again, it will go back to 1mS timers on the cpu problem will go back in force.

    I've spent a lot of time trying to demonstrate this problem and truly believe it is a serious bug in Unity on Windows. I would be very grateful if an Unity Team member could look at this.

    Thanks
    Olivier
     

    Attached Files:

  5. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    My guess:

    This might be due to Unity 5's new job system, and hyper threading (both of your tests were on Intel CPUs, correct?).

    From what I understand from some of their posts it's a lockless and non-blocking job system (probably something like a lockless work-stealing queue built on CAS primitives). Because jobs in the engine can last for fairly long, the problem is that the spinning causes the processor to have to set up and assign resources constantly, even when the thread is doing no work. The solution is to replace the spinning with os-level blocking calls (at least on Intel CPUs).

    I have the feeling that the timer resolution change is compounding the problem even more.

    I think there's a bug on this (the spinlocking) already. But I still recommend filing a bug/adding this info to an existing bug if you haven't already, just to make sure.
     
    Last edited: Jan 27, 2015
  6. DanSuperGP

    DanSuperGP

    Joined:
    Apr 7, 2013
    Posts:
    408
    This is great stuff Olivier, it seems like you've gone to great lengths to track down exactly what's wrong.

    Have you submitted this as an actual bug through the Unity bug reporter? Your spinning cube scene along with very specific instructions and all the findings you have discovered reported that way will be much more helpful for getting this fixed than posting it here in the forums, since it will get the problem into the Fogbugz system the Unity team uses
     
  7. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    Thanks for your insights. I've submitted a bug. We'll see :)

    Were you able to reproduce the problem on your pc ?

    Thinksquirrel, i'm not sure they use something so sophisticated. With a profiler attached, I see a lots of SleepEx calls and WaitForSingleObject (waits on handles). I'm more thinking of calls like Sleep(x/y) where x/y < 1 and x or y would depend on timer resolution (just a guess really). Yes i'm on Intel CPU, disabling hyperthreading does not change anything.
     
    leewang2502 likes this.
  8. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    Gotcha, yea this sounds like a different issue then, that may or may not be related. Glad to hear it's been reported though.
     
  9. DanSuperGP

    DanSuperGP

    Joined:
    Apr 7, 2013
    Posts:
    408
    I'm on OSX, and it wouldn't be appropriate to test it here at work unfortunately.

    I can give it a test on my Macbook Pro at home though if you like. I have my own copy of Unity 5 there...

    I could do so in boot camp mode on win 8 as well...
     
  10. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    There is a tool called "API Monitor" that allow to trace every syscall made by an application.

    In our case the culprit seems to be something called "AnimationCurveTpl<float>::AllowTransferOptimization" in Unity which loops continuously around a call to SleepEx(1) (wait 1ms). When timer resolution is reset to 5ms or 10ms the sleep(1) last longer, thus reducing the loop and cpu consumption.

    This is the very same thread that calls NtSetTimerResolution() via a call to timeBeginPeriod() to change resolution to 1ms
     
  11. olivier.gosseaume.maison

    olivier.gosseaume.maison

    Joined:
    Dec 5, 2014
    Posts:
    9
    Good news, I've opened a case (667409) and QA team have reproduced the bug and sent it for resolution.

    Hope it will be fixed in final V5, it is not yet corrected in RC1.

    :)