Search Unity

"Critical Service Error" blue screen on startup on multiple Windows 10 computers running Unity apps

Discussion in 'Windows' started by Trivium_Dev, Aug 22, 2019.

  1. Trivium_Dev

    Trivium_Dev

    Joined:
    Aug 1, 2017
    Posts:
    78
    Hi all, I’m currently dealing with a massive hardware issue that I have been unable to fix. At this point, I have had over a dozen Windows 10 PC computers with the issue that I’m going to describe, all within the last year. On some, the issue has occurred more than once on the same computer.

    Some background:

    I work at a media development company making applications (which we call “interactives”) mostly for museums and other similar organizations. I am the lead developer at this small company. We develop our applications using Unity. These interactives run full screen on the computers, and rarely are there any other applications running on the computers. The interactives start automatically when Windows starts (using a scheduled task), the computer normally turn off at night (using a scheduled task that calls shutdown.exe with arguments /s /t 00), and the computer will turn back on using BIOS settings to turn the computer back on - occasionally we use wake-on-LAN but it depends on the exhibit/installation.

    Here is the issue:

    Occasionally the computer will not be able to start - it blue screens before it reaches the Windows login with a “Critical Service Failed”. Once this happens, the computer is unable to boot to Windows at all, Windows cannot be repaired, cannot be restored, cannot be started in safe mode, and cannot be started by disabling the driver signature enforcement (for the most part, two computers disabling this made us able to get into Windows, more on that later). This issue is happening on three different computer types - Elo All-in-One computers, HP computers, and Dell computers. These computers are specifically setup to run the interactives and prevent users from doing anything on them except use the interactive.

    Here is a picture of the error we get when turning on the computer: link

    Here is another picture of the error we get when trying to restore/repair: link

    And here is a picture of what happens if we attempt to boot in safe mode: link

    Computer Setup - What We Do

    We have a long list of options we set to prevent the computer from showing popups, going to sleep, blocking our app from running, or closing, or allowing users to access things outside of the interactive itself (such as the start menu). We also disable Windows Update (by disabling the Windows Update service). We have a batch script that we run that does this automatically, but we’ve also done it manually on some of the computers. The batch script can be found in this zip: link. Some things that we specifically disable include:
    • User Account Control to NEVER notify
    • Use netplwiz to allow computer to auto-login without having to enter password
    • Disable Cortana, Windows Defender, Action Center, Edge screen swiping, Smart Screen Filter, and Open File Warning Dialog (most of these are done by editing the registry, some are done through Windows Settings or Control Panel).
    Some things we have tried to fix it (after it happens):
    • We’ve used the advanced startup options to try to do system repairs and startup repairs - doesn’t work
    • We’ve tried restoring Windows using a restore USB - doesn’t work
    • Fully re-install Windows from USB - works, but of course we lose all our data
    Some things we thought might be the issue:
    • LogMeIn is installed on all of the computers - we have other computers (employee computers for example) with LogMeIn that haven’t not had issues
    • Issues with drivers
    • Issues with our computer setup - however we’ve had plenty of computers from before the last year that had these setups and have worked fine
    • A virus of some sort within our LAN (though some of the computers that have crashed, crashed while outside our network).
    • Issues with Windows maybe?
    • Issues caused by Unity saving its “PlayerPrefs” data into the registry (more below)
    • Something the interactives (or Unity) are doing is causing the issue
    Unity Builds

    I feel if Unity was the issue (as in, something Unity does internally on standalone builds), I feel there would be more posts about on the web. However, recently, in order to try to track down the issue, we put two interactives onto two of our prototype/showcasing computers - which have been running some older interactives for months without issue, with LogMeIn installed, and our computer setup on them. A week after we did that, both computers did the blue screen. This seems to point that it's something that the applications are doing (either something we are doing or something that is built into Unity's standalone builds).

    We are using a wide range of Unity versions - but mostly stick to the “long term support versions”, and for the computers that are having this issue, that includes 2017.4.x and 2018.4.x. However we have a project with multiple interactives built with 2017.4.x that have been working fine for many months, while we’ve had plenty of issues for another project that also used 2017.4.x.

    We use the “PlayerPrefs” class in Unity a lot to save out settings that might need tweaked on site - think volume levels or lightning levels or gameplay length. Unity saves these values to the Windows Registry (see Unity’s manual for player prefs). I had a thought that it could be an issue with this (though I feel the issue would be more widespread on forums and stuff if it was). I also thought maybe it was the key names I was using - I use very specific key names to prevent collisions, so the key names are “[Company].[Interactive].[Class].[Variable]”. Most of the keys aren’t more than like 30 characters and registry information says they can be much longer. Also the periods shouldn’t be an issue either (and Unity generates a few keys that are both longer and include things like spaces and periods and there’s nothing I can do to stop Unity from generating those anyway). In order to rule this out, I created my own PlayerPrefs class that basically just stores data in a Dictionary and then saves it out as JSON on exit and loads it back when the interactive starts (hard to say whether this has helped or not).

    Additional Information

    I have a few different batch scripts that run on the computers at different times.

    One batch script for example will update the interactive when the interactive determines there’s an available update (we use Unity Cloud Build to build the interactives and then an update manager within the interactives monitors Unity Cloud Build for updates, downloads the updated build, and then installs it by calling the batch script. You can see that batch script here: link

    Note that I removed the app name and normal location to protect our clients privacy. The batch file is generated by the interactive using File.WriteAllText() so that the batch script has the correct paths and executable names.

    Another batch script monitors the interactive process to make sure that it is responding and that it is still running, and then will restart the application if needed. This is a relatively new addition and some of the computers that have crashed have not used this batch script. It can be seen here: link.

    Finally we have a batch script that will attempt to force focus the interactive when the interactive starts up. This is done because sometimes Windows doesn’t like to give focus to the interactive when it is started using the task scheduler, and the Windows Taskbar will be visible (or the interactive will be completely minimized). It can be seen here: link

    One other thing we have started doing with some of the newer interactives is kill explorer.exe when the interactive starts and then restart it when the interactive is quit. This was done because Windows seems hell-bent on making sure popups, edge swipes, and other things continue to work and show up on top of the interactive, or worse, allow a user to quit/leave the interactive. This is done in C# code using the following:

    Code (CSharp):
    1. Process p = new Process();
    2. ProcessStartInfo pStartInfo = new ProcessStartInfo(“taskkill“, "/IM \"explorer.exe\" /F");
    3. pStartInfo.WorkingDirectory = Directory.GetParent(“taskkill“).FullName;
    4. pStartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    5. p.StartInfo = pStartInfo;
    6. p.EnableRaisingEvents = true;
    It is restarted with:

    Code (CSharp):
    1. Process p = new Process();
    2. ProcessStartInfo pStartInfo = new ProcessStartInfo("explorer.exe");
    3. pStartInfo.WorkingDirectory = Directory.GetParent("explorer.exe").FullName;
    4. pStartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    5. p.StartInfo = pStartInfo;
    6. p.EnableRaisingEvents = true;
    Note that about half of the computers didn’t have this functionality (and we didn’t quit explorer.exe any other way) and still blue screened.

    On two ELO All-in-One computers we were able to disable driver signature enforcement to get to the Windows desktop. However we couldn’t do much because the User Access Control basically was blocking everything (for example we couldn’t open Task Scheduler or Device Manager). Here is a picture of what is happening: link. However, while we were testing on a prototype computer we started getting this error BEFORE the blue screen occurred and driver signature enforcement was still active. When the computer was restarted, the blue screen occurred.

    On a few different computers, we have noticed issues with the computer prior to restarting the computer and getting the blue screen. The biggest indication that the computer will blue screen on next startup is all USB devices will stop working. We have USB cameras (generally Logitech) attached to a few of the computers and they will stop working, we will restart the computer to “fix it” (as restarting the computer is generally our first problem solving step), and the computer will blue screen. On another computer, we had a Phidget board attached via USB and it stopped working - we restarting, boom, blue screen. On another, the touch screen stopped working (again, connected via USB), restart - blue screen. On another computer the computer itself seemed to be lagging or having graphical issues, so we restarted and it blue screened. At this point whenever a computer seems to be having issues we generally know it will blue screen on next startup, but we have no way to prevent it or fix it.

    Here are some log files, not sure how much help they will be: CBS.log dism.log

    Here’s another log file that we accessed using the CMD prompt from the Advanced Startup options: link. It shows that there was an error with “checking for installed LCU”. Google says that LCU stands for “Latest Cumulative Update” but I couldn’t find anything that said why it would say that or how to fix it. This log file was generated when we tried to do an automatic repair link.

    Computer Specs

    We’ve had this happen on HP, Dell, and ELO All-In-One computers. Here are specs for 3 of them.

    Dell XPS 8920 specs: link link

    Dell OptiPlex 7060 specs: link link

    Elo ECMG3-Q170 All-in-One specs: link link

    Final Thoughts

    I’m hoping someone can point me in the right direction. I’ve Googled numerous different issues related to the “Critical Service Error” on startup and while there’s a lot of results out there for it, none seem to be specifically about my case. Mostly I’m trying to figure out why it’s happening so that I can prevent it from happening. If this had been a one (or even two) time occurrence I would have just re-installed Windows and moved on, but we’re up to about 15 computers on which this has happened and we now have a few that it’s happened twice (or three times...). I’ve never dealt with a hardware issue that was both this difficult to track down the root cause and also as wide spread. We have four people working on this basically non-stop for the last week, myself, our QA lead, our hardware lead, and our technical specialist. We also contacted Elo support as we have a business account with them - we sent them one of the interactives that seems to cause it to see if they could replicate it.

    One last thought - while I'm hesitant to say that it's an issue with the Unity applications themselves, it seems like a really weird coincidence that the prototype computers would crash a few days after we put the interactives that seem to be having this issue onto them. I do also find it super strange that none of our developers have had this issue with their computers - since they are constantly running the interactives in the editor, you would think that the issue might manifest on their computers if it was the interactives causing the problem. The developers will sometimes also run full builds on their computers. The whole thing is just very frustrating.

    If anyone needs additional information, I might be able to provide it. I currently have two computers that blue screened this morning when I turned them on. So I can probably pull something off of them if someone wants to check a log file or something. I appreciate any help that could be provided. Thanks!
     
  2. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,680
    That all sounds very weird and honestly, I haven't heard of anyone having these issues before.

    That said, that screenshot you posted where mmc.exe is blocked from being run: if you look carefully, you'll see that it says "Publisher: Unknown". On all my machines, mmc.exe is signed by Microsoft, and apps having digital signatures are waived from Windows SmartScreen protection (that's what that popup is about). This would point to your Windows installation being infected with malware.

    Are these computers connected to the internet? Are all copies of Windows being installed from the same USB key? Perhaps the image on the USB key is bad?
     
  3. Trivium_Dev

    Trivium_Dev

    Joined:
    Aug 1, 2017
    Posts:
    78
    They are all connected to the internet - we've run some antivirus and anti-malware checks on them and nothing came up (we've run it on all our machines in-house). We've also have been leaving on Windows Defender and letting it run (whereas in the past we have disabled it), and Windows Firewall is still active (and we've never disabled that).

    I can't say we've been using the same USB for all re-installs of Windows, since we have a few floating around. Just today (since we had 2 more computers go blue screen) I created a new USB installer using Windows Media Creation Tool (https://www.microsoft.com/en-us/software-download/windows10). We've also re-installed Windows on the Elo computers using a new ISO file that Elo had sent us, so we'll see if that helps at all.

    This issue hasn't happened to a computer that isn't just running our interactives - none of our developers have had this issue on their computers, nor our project managers or graphics department computers - if it was some sort of malware or virus I feel we would have seen it at least once on some other computer. The timing is also super random - the computers will sometimes run for a few weeks without issue, then go blue screen. We've had some that ran normally for a few months and then break.
     
  4. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,680
    Are you by chance running your experiences as "Administrator"?
     
  5. Trivium_Dev

    Trivium_Dev

    Joined:
    Aug 1, 2017
    Posts:
    78
    In the task that we create to start the interactives we check the "Run with highest privileges" which results in the application showing with "Elevated" as true in the Task Manager, so I'm going to go with yes we do. If I start the application normally using the EXE it runs with "Elevated" as false. We do not have the "Run this program as an administrator" checked in the EXE properties.
     
  6. Tautvydas-Zilys

    Tautvydas-Zilys

    Unity Technologies

    Joined:
    Jul 25, 2013
    Posts:
    10,680
    That sounds like a recipe for disaster...

    I recommend creating a non-administrator account and running the task using without "Run with highest privileges". If there are any bugs in the program, running stuff as administrator can literally brick your machine. It's a security risk.
     
  7. Trivium_Dev

    Trivium_Dev

    Joined:
    Aug 1, 2017
    Posts:
    78
    Hmmm, I'll give that a try, though we've done it like this in the past without issue, it's only been in the last year that I've started seeing this issue and only within the last month or so where it's gotten so bad that the computers are just constantly breaking.