A few weeks ago, while I was hacking around with SafeDisc copy protection on my laptop with Windows 7, after a few hours, I shut it down for the night and went to bed; the next morning I faced a constant BSOD on startup. “Oh, so Secsrv.sys decided to screw me over.” I assumed that there was code in the SafeDisc driver that was segfaulting because by running the game in this unusual way trying to patch nanomites in order to unpack it, I probably got it to screw up some configuration somewhere. Fair enough, as long as I could fix it.
The details of the BSOD were this: “Check your hard drive to make sure it is properly configured and terminated. Run CHKDSK /F to check for hard drive corruption, and then restart your computer. *** STOP: 0x0000007B (0xFFFFF880009A97E8, 0xFFFFFFFFC0000034, 0x0000000000000000, 0x0000000000000000)”.
I received this each time I tried to start up Windows. The driver’s name itself did not appear on the BSOD. Just a generic useless error message if you did not happen to know the exact cause of the issue. I tried to boot into Safe Mode, but F8 would not open the Advanced Boot Options menu because Windows wanted me to either “launch startup repair (recommended)” or “start Windows normally”, but nothing else.
So, sure. I was curious to see if Windows and its startup repair tool were smart enough to fix the issue automatically, since that would hopefully be the safest route. They weren’t. After a long, long time on “attempting repairs” (about 30 minutes), I received: “Startup Repair cannot repair this computer automatically” with “NoRootCause”. The BSOD was not fixed.
So I just booted into Linux, verified that all my files appeared to be safe, backed some valuable stuff on to my phone, and deleted Secdrv.sys from the Windows partition and rebooted. Except that didn’t fix the BSOD either. I ended up putting that file back.
So, because I happen to keep the Windows 7 ISO on my computer, I put it on my flash drive (separate from my backed up data), booted from it, and tried its repair tool. This time I got: “Checking for disk errors. This might take more than an hour to complete.” Like before, it took about 30 minutes, and it asked me to reboot. Sure, maybe SafeDisc damaged some file and Windows considers “disk errors” to be in the same class as corrupt OS data. It makes sense. When I rebooted, the BSOD was still there.
Okay, so this time, I booted back on to the flash drive, went into the repair tool again, and after it gave me NoRootCause, I tried to perform “sfc /scannow”. This time, I got: “There is a system repair pending which requires a reboot to complete. Restart Windows and try again.” What? But it’s not doing anything… After googling the issue, it appeared you could delete C:\Windows\winsxs\pending.xml to force sfc /scannow to go through… except that the command line tool could not locate the directory. So I rebooted into Windows, let it give the BSOD again, rebooted so it would ask me if I want to enter the repair tool from the hard drive again, let it try and fail to repair, and then opened the command line, and did “del X:\Windows\winsxs\pending.xml” followed by “sfc /scannow”. Guess what? I got: “Windows Resource Protection did not find any integrity violations.”
I repeated the “enter the repair tool and let it fail and restart” procedure so that F8 would give me the Advanced Boot Options menu and tried entering Safe Mode to see if at least that would work. Except it didn’t. The last driver to load before crashing was classpnp.sys. This was a Microsoft driver. The general solution on the internet appeared to be to delete the file and run the repair tool, and if that did not work, reinstall. Tried it, and it didn’t work, and I put the file back. (I did not take the advice to reinstall.) Last Known Good Configuration did not work. Kernel debugging mode did not work (or give me any new messages). Boot logging, which is supposed to log the “real” last loaded kernel module to ntbtlog.txt, also did not create the file when I went back into Linux.
So I was thinking that SafeDisc probably screwed up a single registry entry somewhere that is required from very early on in the Windows boot process, just because it could (because I was the one trying to break it anyway). If it were a corrupt or missing driver, you would think the repair tool would have fixed that, but hell, the repair tool was not saying anything useful about what it was doing. I had been googling the error codes like “0xFFFFF880009A97E8, 0xFFFFFFFFC0000034″ during all of the waiting it had me do: this just returned forum posts in which people resorted to reinstalling Windows. I was seriously not wanting to reinstall Windows because *one* configuration setting somewhere screwed up. “Repair Windows registry from Linux” did not bring up anything useful because no automated tools for repairing Windows (better than Microsoft’s repair tool can) actually exist. The advice is always just to back up and reinstall, since that’s guaranteed to work even if it wastes your time.
I read that 0x0000007B corresponded to “INACCESSIBLE_BOOT_DEVICE”, and I confirmed this was true from the bug check code reference on Microsoft’s website. They also provide a description: “This bug check indicates that the Microsoft Windows operating system has lost access to the system partition during startup.”
Then I remembered that I disabled my DVD drive in the device manager earlier when I had the virtual drive of the game disc working (using WinCDEmu). I realized (since it told me I would need to reboot before I could see the changes) that I most likely saw “disk drive” and disabled my hard drive instead by accident. Oops. But this is something that you can definitely fix with a single registry tweak from Linux, right? At this point, I was definitely sure the problem was simply that I disabled my hard drive in the device manager.
So I started off by assuming that the device manager stores “enable/disable this device” settings in the registry. But everything you would think of googling to find this information came up with nothing helpful at all. I spent 20 minutes on this. Unlike with Linux where you can ask the community, and essentially the developers themselves, where information about how your distribution enables and disables its kernel modules is stored, and you’ll eventually find a helpful response, there was not actually any way to just ask Microsoft for this information. They’re a corporation. They offer “paid support” if you sign a business contract with them, but absolutely nothing to the regular people that use it other than: “reinstall everything or go to a repair shop to have them do it”. That’s just insane. We’re talking about a single configuration setting.
Right from the start (or rather, right when I backed up my data), I knew in the end I could resort to that, and it’s really not a big deal or a huge time-waster; but if I did need to do this, I wanted to understand WHY.
So I installed Windows 7 in a virtual machine, using virt-manager with KVM/QEMU on Linux, giving it 32GB of hard drive space. I disabled all unnecessary services (especially the search indexer and task scheduler) and background processes to speed things up. I tried to run the 32-bit version of the device manager in OllyDbg (which does not offer debugging support for 64-bit processes yet, although it’s getting close to completion), except that the 32-bit version (which you can only get to run by deleting the 64-bit version of mmc.exe) does not actually disable or enable hardware when you tell it to.
Okay then. That just meant another fun coding opportunity. I wanted a tool to track the device manager’s calls to kernel32.dll, advapi32.dll, and shlwapi.dll. Since my tool could not itself create the device manager process (mmc.exe) without giving it the right arguments, and I didn’t feel like enabling the search indexer service and rebooting the VM to put the device manager shortcut back on the search results of the start menu to get the arguments, the tool would attach to an existing process rather than create a new one. My tool could either hook all functions provided by these dlls in the process’s address space or use the Microsoft debugger API, though if I still wanted the tool to attach to an existing process, it would have to hook the functions by installing hotpatches and performing 64-bit relative jumps. The debugger API would be easier, and I believe I’ve already had enough practice with hooking in my SafeDisc work.
I finished the debugger in about 2 hours (I did all my coding and API googling inside the VM), and it worked; it showed various registry functions being called when I refreshed the list of devices or went into a device’s properties, but oddly, not when I simply enabled or disabled the DVD drive (not the hard drive, since I didn’t want to break the VM installation too), which was the only thing that mattered. Instead, CreateFile was being called, first on “\\.\PIPE\srvsvc”, and then on “C:\Windows\INF\setupapi.app.log”. Sadly, if the device manager performed its configuration changes by performing system calls, I would have no practical way of capturing that. In the worst case, if the configuration was somehow not stored in the registry, I could still look at the raw data on the hard drive and look at the bytes that changed, discover which files (if any) were changing, and dump them by mounting the image. But look, a log file. I peeked inside and discovered at the bottom: “DIF_PROPERTYCHANGE – IDE\CDROMQEMU_QEMU_DVD-ROM_______________________1.2.____\5&3A2A5854&0&1.0.0″. I opened regedit and saw keys under CurrentControlSet, ControlSet001, and ControlSet002: “Enum\IDE\CDROMQEMU_QEMU_DVD-ROM_______________________1.2.____\5&3A2A5854&0&1.0.0\Properties”. Unfortunately, regedit would not even let me view it: “An error is preventing this key from being opened. Details: Access is denied.” Trying to give the “Administrators” group permission failed: “You do not have permission to view the current permission settings for Properties, but you can make permission changes.” / “Unable to set new owner on Properties. Access is denied.” / “Registry Editor could not set security in the key currently selected, or some of its subkeys.”
No problem. I set the cache for the hard drive in virt-manager to “None” and mounted the file system using $ mount -o loop,ro,sync,offset=105906176 “/var/lib/libvirt/images/Windows.img” /mnt/vm: “loop” for loopback from a hard drive image, “ro” for read-only access, “sync” for disabling cache (at least partly), and “offset” for the offset in bytes from the beginning of the file to the NTFS partition where Windows 7 was located in the hard drive image (obtained using parted). Then, I could read the registry from the command line on Linux by loading /mnt/vm/Windows/System32/config/SYSTEM in chntpw.
As it turns out, “CurrentControlSet” is not a registry key of its own, but rather a pointer to either ControlSet001 or ControlSet002, so that eliminated one out of three places to search (assuming that the enabled/disabled flag was really located here). The Properties key under both versions of the IDE\CDROMQEMU_QEMU_DVD-ROM… key had about a dozen strings listing certain information about the device (such as its full name, “QEMU QEMU DVD-ROM ATA Device”, and its driver, “cdrom.inf:cdrom_device.NTamd64:cdrom_install:6.17601.17514:gencdrom”) and a dozen pieces of small binary data about 150 bytes each. There was nothing simply standing out as the enable/disable property. It didn’t matter though. I deleted both keys from chntpw, and when I restarted the VM, I got a “found new hardware” tooltip, and when the device was added, I went to the device manager and the DVD drive was re-enabled. Phew. But I was curious to see if I could figure out exactly which subkey is responsible.
I took a snapshot of SYSTEM while the DVD drive was enabled, and then disabled the drive, waited a few seconds, and closed the device manager, and took another snapshot. Sadly, the SHA-256 sum of the files were the same, so the file did not change. Maybe it had to do with caching, or maybe Windows stores this information somewhere else, or maybe Windows only stores it in memory and you have to go to Start -> Shutdown before it’s saved. I figured that at any rate, it was probably worth being absolutely sure that the data I would end up looking at really included the change, even if it meant weeding through a lot more data in the process; so I took a snapshot of the entire C:\Windows directory (I assumed it wouldn’t make any sense for it to be located anywhere else, and yes, I checked C:\ as well) after booting the VM with the device enabled, and again after disabling the device and shutting down the VM with Start -> Shutdown, and then I ran “git init” on the first version of the folder (which took about 45 minutes), moved the .git folder to the second folder, and ran “git status > git_status.txt” (which took about 20 minutes). There were *31,946* new files in C:\Windows\winsxs, and a grand total of 4 deleted files in C:\Windows\ServiceProfiles\LocalService\AppData\Local and C:\Windows\System32\LogFiles\WMI\RtBackup\, from simply starting Windows, disabling the DVD drive, and shutting down. And I had Windows Backup, System Restore, Remote Assistance, Windows Defender, Windows Firewall, and all of this stuff disabled. Does this happen all the time or does Windows prune old stuff in that folder? I’ll assume they prune.
There were also 36 files modified, four of which were DEFAULT, SECURITY, SOFTWARE, and SYSTEM, so I decided I’d look inside SYSTEM. First, I ran VBinDiff on the old and new versions of SYSTEM. There were over 200 binary changes, and it was a little time consuming to locate the QEMU keys. chntpw lets you dump keys into a plaintext format, so I went to that. diff showed that there were 38 differences between chntpw’s dumps of the QEMU keys under ControlSet001 and ControlSet002 in the “enabled” state, exactly one difference between ControlSet001 of the enabled state and ControlSet001 of the disabled state, and zero differences between ControlSet002 of the enabled state and ControlSet002 of the disabled state.
The key that changed in ControlSet001 was this: < “ConfigFlags”=dword:00000000, > “ConfigFlags”=dword:00000001.
0 for enabled, 1 for disabled. And it was NOT under Properties; it was under 5&3a2a5854&0&1.0.0 itself. How misleading. I backed up C:\Windows\System32\config\SYSTEM on my own Windows partition, loaded it up in chntpw, and yes, my hard drive (IDE\DiskTOSHIBA_MQ01ABD075______________________AX002M__) had its ConfigFlags set to 1 in ControlSet001. I believe Last Known Good Configuration is supposed to swap the two control sets back and forth, but maybe the time I ran it was a fluke, or maybe for whatever reason it thinks ControlSet001 was in use for the last good startup and it always reverts to that one even though it’s incorrect.
After changing it back to 0, the BSOD was resolved.
All of this was caused by disabling my hard drive in the device manager. Useless error message after useless error message, in combination with grossly insufficient documentation, caused the whole process of figuring out what was wrong and what needed to be changed quite involved. And this is something that can *easily* happen to anyone.