Hi all, I’m currently dealing with a massive hardware issue that I have been unable to fix. At this point, I have had over a dozen Windows 10 PC computers with the issue that I’m going to describe, all within the last year. On some, the issue has occurred more than once on the same computer. Some background: I work at a media development company making applications (which we call “interactives”) mostly for museums and other similar organizations. I am the lead developer at this small company. We develop our applications using Unity. These interactives run full screen on the computers, and rarely are there any other applications running on the computers. The interactives start automatically when Windows starts (using a scheduled task), the computer normally turn off at night (using a scheduled task that calls shutdown.exe with arguments /s /t 00), and the computer will turn back on using BIOS settings to turn the computer back on - occasionally we use wake-on-LAN but it depends on the exhibit/installation. Here is the issue: Occasionally the computer will not be able to start - it blue screens before it reaches the Windows login with a “Critical Service Failed”. Once this happens, the computer is unable to boot to Windows at all, Windows cannot be repaired, cannot be restored, cannot be started in safe mode, and cannot be started by disabling the driver signature enforcement (for the most part, two computers disabling this made us able to get into Windows, more on that later). This issue is happening on three different computer types - Elo All-in-One computers, HP computers, and Dell computers. These computers are specifically setup to run the interactives and prevent users from doing anything on them except use the interactive. Here is a picture of the error we get when turning on the computer: link Here is another picture of the error we get when trying to restore/repair: link And here is a picture of what happens if we attempt to boot in safe mode: link Computer Setup - What We Do We have a long list of options we set to prevent the computer from showing popups, going to sleep, blocking our app from running, or closing, or allowing users to access things outside of the interactive itself (such as the start menu). We also disable Windows Update (by disabling the Windows Update service). We have a batch script that we run that does this automatically, but we’ve also done it manually on some of the computers. The batch script can be found in this zip: link. Some things that we specifically disable include: User Account Control to NEVER notify Use netplwiz to allow computer to auto-login without having to enter password Disable Cortana, Windows Defender, Action Center, Edge screen swiping, Smart Screen Filter, and Open File Warning Dialog (most of these are done by editing the registry, some are done through Windows Settings or Control Panel). Some things we have tried to fix it (after it happens): We’ve used the advanced startup options to try to do system repairs and startup repairs - doesn’t work We’ve tried restoring Windows using a restore USB - doesn’t work Fully re-install Windows from USB - works, but of course we lose all our data Some things we thought might be the issue: LogMeIn is installed on all of the computers - we have other computers (employee computers for example) with LogMeIn that haven’t not had issues Issues with drivers Issues with our computer setup - however we’ve had plenty of computers from before the last year that had these setups and have worked fine A virus of some sort within our LAN (though some of the computers that have crashed, crashed while outside our network). Issues with Windows maybe? Issues caused by Unity saving its “PlayerPrefs” data into the registry (more below) Something the interactives (or Unity) are doing is causing the issue Unity Builds I feel if Unity was the issue (as in, something Unity does internally on standalone builds), I feel there would be more posts about on the web. However, recently, in order to try to track down the issue, we put two interactives onto two of our prototype/showcasing computers - which have been running some older interactives for months without issue, with LogMeIn installed, and our computer setup on them. A week after we did that, both computers did the blue screen. This seems to point that it's something that the applications are doing (either something we are doing or something that is built into Unity's standalone builds). We are using a wide range of Unity versions - but mostly stick to the “long term support versions”, and for the computers that are having this issue, that includes 2017.4.x and 2018.4.x. However we have a project with multiple interactives built with 2017.4.x that have been working fine for many months, while we’ve had plenty of issues for another project that also used 2017.4.x. We use the “PlayerPrefs” class in Unity a lot to save out settings that might need tweaked on site - think volume levels or lightning levels or gameplay length. Unity saves these values to the Windows Registry (see Unity’s manual for player prefs). I had a thought that it could be an issue with this (though I feel the issue would be more widespread on forums and stuff if it was). I also thought maybe it was the key names I was using - I use very specific key names to prevent collisions, so the key names are “[Company].[Interactive].[Class].[Variable]”. Most of the keys aren’t more than like 30 characters and registry information says they can be much longer. Also the periods shouldn’t be an issue either (and Unity generates a few keys that are both longer and include things like spaces and periods and there’s nothing I can do to stop Unity from generating those anyway). In order to rule this out, I created my own PlayerPrefs class that basically just stores data in a Dictionary and then saves it out as JSON on exit and loads it back when the interactive starts (hard to say whether this has helped or not). Additional Information I have a few different batch scripts that run on the computers at different times. One batch script for example will update the interactive when the interactive determines there’s an available update (we use Unity Cloud Build to build the interactives and then an update manager within the interactives monitors Unity Cloud Build for updates, downloads the updated build, and then installs it by calling the batch script. You can see that batch script here: link Note that I removed the app name and normal location to protect our clients privacy. The batch file is generated by the interactive using File.WriteAllText() so that the batch script has the correct paths and executable names. Another batch script monitors the interactive process to make sure that it is responding and that it is still running, and then will restart the application if needed. This is a relatively new addition and some of the computers that have crashed have not used this batch script. It can be seen here: link. Finally we have a batch script that will attempt to force focus the interactive when the interactive starts up. This is done because sometimes Windows doesn’t like to give focus to the interactive when it is started using the task scheduler, and the Windows Taskbar will be visible (or the interactive will be completely minimized). It can be seen here: link One other thing we have started doing with some of the newer interactives is kill explorer.exe when the interactive starts and then restart it when the interactive is quit. This was done because Windows seems hell-bent on making sure popups, edge swipes, and other things continue to work and show up on top of the interactive, or worse, allow a user to quit/leave the interactive. This is done in C# code using the following: Code (CSharp): Process p = new Process(); ProcessStartInfo pStartInfo = new ProcessStartInfo(“taskkill“, "/IM \"explorer.exe\" /F"); pStartInfo.WorkingDirectory = Directory.GetParent(“taskkill“).FullName; pStartInfo.WindowStyle = ProcessWindowStyle.Hidden; p.StartInfo = pStartInfo; p.EnableRaisingEvents = true; It is restarted with: Code (CSharp): Process p = new Process(); ProcessStartInfo pStartInfo = new ProcessStartInfo("explorer.exe"); pStartInfo.WorkingDirectory = Directory.GetParent("explorer.exe").FullName; pStartInfo.WindowStyle = ProcessWindowStyle.Hidden; p.StartInfo = pStartInfo; p.EnableRaisingEvents = true; Note that about half of the computers didn’t have this functionality (and we didn’t quit explorer.exe any other way) and still blue screened. On two ELO All-in-One computers we were able to disable driver signature enforcement to get to the Windows desktop. However we couldn’t do much because the User Access Control basically was blocking everything (for example we couldn’t open Task Scheduler or Device Manager). Here is a picture of what is happening: link. However, while we were testing on a prototype computer we started getting this error BEFORE the blue screen occurred and driver signature enforcement was still active. When the computer was restarted, the blue screen occurred. On a few different computers, we have noticed issues with the computer prior to restarting the computer and getting the blue screen. The biggest indication that the computer will blue screen on next startup is all USB devices will stop working. We have USB cameras (generally Logitech) attached to a few of the computers and they will stop working, we will restart the computer to “fix it” (as restarting the computer is generally our first problem solving step), and the computer will blue screen. On another computer, we had a Phidget board attached via USB and it stopped working - we restarting, boom, blue screen. On another, the touch screen stopped working (again, connected via USB), restart - blue screen. On another computer the computer itself seemed to be lagging or having graphical issues, so we restarted and it blue screened. At this point whenever a computer seems to be having issues we generally know it will blue screen on next startup, but we have no way to prevent it or fix it. Here are some log files, not sure how much help they will be: CBS.log dism.log Here’s another log file that we accessed using the CMD prompt from the Advanced Startup options: link. It shows that there was an error with “checking for installed LCU”. Google says that LCU stands for “Latest Cumulative Update” but I couldn’t find anything that said why it would say that or how to fix it. This log file was generated when we tried to do an automatic repair link. Computer Specs We’ve had this happen on HP, Dell, and ELO All-In-One computers. Here are specs for 3 of them. Dell XPS 8920 specs: link link Dell OptiPlex 7060 specs: link link Elo ECMG3-Q170 All-in-One specs: link link Final Thoughts I’m hoping someone can point me in the right direction. I’ve Googled numerous different issues related to the “Critical Service Error” on startup and while there’s a lot of results out there for it, none seem to be specifically about my case. Mostly I’m trying to figure out why it’s happening so that I can prevent it from happening. If this had been a one (or even two) time occurrence I would have just re-installed Windows and moved on, but we’re up to about 15 computers on which this has happened and we now have a few that it’s happened twice (or three times...). I’ve never dealt with a hardware issue that was both this difficult to track down the root cause and also as wide spread. We have four people working on this basically non-stop for the last week, myself, our QA lead, our hardware lead, and our technical specialist. We also contacted Elo support as we have a business account with them - we sent them one of the interactives that seems to cause it to see if they could replicate it. One last thought - while I'm hesitant to say that it's an issue with the Unity applications themselves, it seems like a really weird coincidence that the prototype computers would crash a few days after we put the interactives that seem to be having this issue onto them. I do also find it super strange that none of our developers have had this issue with their computers - since they are constantly running the interactives in the editor, you would think that the issue might manifest on their computers if it was the interactives causing the problem. The developers will sometimes also run full builds on their computers. The whole thing is just very frustrating. If anyone needs additional information, I might be able to provide it. I currently have two computers that blue screened this morning when I turned them on. So I can probably pull something off of them if someone wants to check a log file or something. I appreciate any help that could be provided. Thanks!