Friday, July 8, 2016

Early Boot Windows Debugging - Part 1 - Basics

I have a Windows Server 2012 VM that will not boot past the Windows splash screen but throws a BSOD with the error "SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (NETIO.SYS). It's been a long while since working on troubleshooting Windows (I primarily use CentOS) but here's what I've found. I don't have the solution yet but I'm recording some tidbits that I found so I will have them later.

First a bit of preamble:

1. Advanced Boot Options - When you select "Enable Boot Logging" this is supposed to write a log file named ntbtlog.txt. However, in this particular case that never happens. This is presumably because it is before the appropriate driver is loaded to write log files. However with 2012 this is conjecture since the latest Microsoft documentation that I can find applies to Server 2000. Regardless of reason, it isn't captured in this instance.
2. This VM was originally running on ESXi but I have exported and OVF to my local VMware Workstation for my troubleshooting.
3. In the below operations I will be referencing "d:\" which is actually the c:\ of the server. It is referenced from the rescue command prompt as d:\ on my system.

Step 1: Boot to the command prompt from the troubleshooting menu in the Automatic Repair wizard
Step 2: Run a chkdsk to verify the filesystem is in working order. My scan came back with required repairs which it corrected. Subsequent scans come back clean.

Command: chkdsk d: /f
Step 2: Run sfc to verify that Windows is ok. This returns that everything is ok

Command: sfc /offbootdir=d:\ /offwindir=d:\Windows /scannow
Step 3 - Just for grins I also ran DISM (Deployment Image Servicing and Management) to check the integrity. It will throw a warning if you don't give it a scratch directory so I just created a temporary one on my drive. This also returns no found corruption.

Command: dism /image=d:\ /cleanup-image /scan-health /scratchdir=d:\temp
So far, so good... except it still won't boot up. I have an existing "twin" of this machine that should match it in most regards so just to be super certain I also run a manual hashing check on netio.sys and sacdrv.sys (more on that file later). The syntax for that is:

certutil.exe -hashfile drivers\netio.sys md5 (or sha1)

The number 1 cause of netio.sys BSOD are driver conflicts according to googling so I start down that road next. An export of all the drivers between the 2 systems shows that they are absolutely identical. Because that doesn't help me I start yanking out drivers to see if it will make a difference.

To get a list of non-Microsoft drivers I again use DISM and find that there are fortunately only 8 to worry about.

Command: dism /image:d:\ /scratchdir:d:\temp /get-drivers

I'm going to start removing drivers to see if that makes any difference. Again, using DISM I start by removing the vmxnet3 driver since it makes the most sense considering a netio.sys error.

Command: dism /image:d:\ /scratchdir:d:\tetmp /remove-driver:oem4.inf

After a reboot, no change. In 1 of my tests I also then proceed to remove the 7 remaining drivers, that also did nothing. Time to get more information.... Queue next post.... 

No comments:

Post a Comment