Windows 8.1 x64 DPC Watchdog Violation Traced to LogMeIn Hamachi
(and why I now consider LogMeIn to have such grossly poor engineering effort to the point that I have decided to stop using any of their products both personally and professionally)
Background/ProblemIn mid November, 2013, I decided to build a new PC and install Windows 8.1 64 bit professional on it. Doing multiple updates at once, in this case new hardware and new OS, always poses a challenge as you never quite know what is to blame when something goes wrong. As Murphy's Law would have it, something went wrong in a very big way.
Within 48 hrs (typically within 24 hours), my machine would bug check (a/k/a blue screen of death) with a DPC Watchdog Violation. This started a 3-4 week hunt to identify the culprit(s).
The BugcheckA typical minidump file is provided here.
Here is the WhoCrashed explanation:
Crash dump file: C:\Windows\Minidump\112913-15875-01.dmp Date/time: 11/29/2013 5:32:54 AM GMT Uptime: 1 day, 15:45:51 Machine: WIN8 Bug check name: DPC_WATCHDOG_VIOLATION Bug check code: 0x133 Bug check parm 1: 0x0 Bug check parm 2: 0x501 Bug check parm 3: 0x500 Bug check parm 4: 0x0 Probably caused by: hal.dll Driver description: Hardware Abstraction Layer DLL Driver product: Microsoft® Windows® Operating System Driver company: Microsoft Corporation OS build: Built by: 9600.16422.amd64fre.winblue_gdr.131006-1505 Architecture: x64 (64 bit) CPU count: 8 Page size: 4096 Bug check description: The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above.
with stack trace:
nt!KeBugCheckEx+0x0 nt! ?? ::FNODOBFM::`string'+0x13DDC nt!KiUpdateRunTime+0x57 nt!KiUpdateTime+0x63C nt!KeClockInterruptNotify+0x5C hal!HalpTimerClockInterrupt+0x4F nt!KiCallInterruptServiceRoutine+0xA3 nt!KiInterruptSubDispatchNoLockNoEtw+0xEA nt!KiInterruptDispatchLBControl+0x11F nt!KxWaitForLockOwnerShip+0x30 nt!IoAcquireCancelSpinLock+0x56 nt!IoStartPacket+0x47 USBSTOR!USBSTOR_Scsi+0x2A3 CLASSPNP!ClasspSendMediaStateIrp+0x110 CLASSPNP!ClasspTimerTick+0x65 nt!KiProcessExpiredTimerList+0x1D8 nt!KiExpireTimerTable+0x218 nt!KiTimerExpiration+0x148 nt!KiRetireDpcList+0x19C nt!KiIdleLoop+0x5A
I know that WhoCrashed isn't the most reliable -- but it's good enough to give at least the params. I am not a Windows developer, so I don't have a full debugger installed. If you look at the stacktrace in this particular example, it shows USBSTOR. A smoking gun? Not quite. Here is another minidump example where the stack trace is more innocuous:
nt!KeBugCheckEx+0x0 nt! ?? ::FNODOBFM::`string'+0x13DDC nt!KiUpdateRunTime+0x57 nt!KiUpdateTime+0x63C nt!KeClockInterruptNotify+0x5C hal!HalpTimerClockInterrupt+0x4F nt!KiCallInterruptServiceRoutine+0xA3 nt!KiInterruptSubDispatchNoLockNoEtw+0xEA nt!KiInterruptDispatchLBControl+0x11F nt!KxWaitForSpinLockAndAcquire+0x20 nt!KeAcquireSpinLockRaiseToDpc+0x32 netbt!TimerExpiry+0x1A nt!KiProcessExpiredTimerList+0x1D8 nt!KiExpireTimerTable+0x218 nt!KiTimerExpiration+0x148 nt!KiRetireDpcList+0x19C nt!KiIdleLoop+0x5A
So what exactly is a DPC Watchdog Violation? If you search Google, you will see that a LOT of people have run into it with Windows 8. The technical explanation is given by Microsoft here. In summary, it looks like some driver or drivers are taking too much time in processing DPCs. It also seems (though I have not been able to definitively confirm this) that the DPC Watchdog did not bugcheck prior to Windows 8 -- it would silently log a debug message in Windows 7. This would explain why the ultimate culprit (which as the title indicates, is LogMeIn Hamachi) may not have caused an issue under my prior Windows 7 installation.
Diagnostic ProcessI spent the next three weeks testing various combinations of drivers, hardware installed/removed, reinstalls of the OS, etc. This was a time consuming task of seeing what combinations can survive a 72 hr uptime acceptance criterion. After much troubleshooting, I identified two possible culprits: OpenHardwareMonitor (it has a kernel driver to read data) and LogMeIn Hamachi. The fishy thing was, neither software by itself would exhibit the DPC Watchdog Violation, but once both were added and used, the bugcheck would occur.
The Great Microsoft Driver VerifierDuring my research, I had read about Microsoft's Driver Verifier. It will apply a series of selectable stress tests to drivers and will also check for various error conditions.
So, I activiated Driver Verifier with my normal operating set of software, including both Hamachi and OpenHardwareMonitor. On reboot -- immediate bugcheck with DRIVER_VERIFIER_IOMANAGER_VIOLATION. Here is the corresponding minidump file. And the WhoCrashed description is:
Crash dump file: C:\Windows\Minidump\121813-18703-01.dmp Date/time: 12/18/2013 11:37:51 AM GMT Uptime: 00:00:32 Machine: WIN8 Bug check name: DRIVER_VERIFIER_IOMANAGER_VIOLATION Bug check code: 0xC9 Bug check parm 1: 0x7 Bug check parm 2: 0xFFFFF8000221888C Bug check parm 3: 0xFFFFCF802B720EA0 Bug check parm 4: 0x0 Probably caused by: hamdrv.sys Driver description: Driver product: Driver company: OS build: Built by: 9600.16452.amd64fre.winblue_gdr.131030-1505 Architecture: x64 (64 bit) CPU count: 8 Page size: 4096 Bug check description: This is the bug check code for all Driver Verifier
And what does the stacktrace indicate (actually the WhoCrashed description already identified it)?
nt!KeBugCheckEx+0x0 nt!IovCompleteRequest+0x73 Hamdrv+0x1E0B
Hamdrv is the Hamachi driver! Bug check param 1 indicates that: "The driver called IoCompleteRequest while its cancel routine was still set."
Is It Definitively Hamachi Then?This is a hard question for me to answer definitively. OpenHardwareMonitor did not flag Driver Verifier and it is open source. I am not a Windows developer, but from my reading the code for the ring 0 (kernel mode) driver for OpenHardwareMonitor, it seemed quite innocuous. Here are the other evidence I've gathered:
As to what version of Hamachi -- I believe it was 188.8.131.52. That is the version I get if I were to do a fresh install with the installer I used, but Hamachi does update itself, so I cannot be 100% certain.
Why I Consider LogMeIn's Engineering Efforts Extremely LackingEven if ultimately, Hamachi were exonnerated (unlikely), I still consider LogMeIn's engineering efforts to be so grossly negligent that I have discontinued my annual auto-renewal and am in the process of removing Hamachi from use. The reason is -- DriverVerifier is a fundamental test sequence that every driver should pass before it is released as a production product. Microsoft requires it as part of WHQL cerification -- of course, Hamachi not being tied to hardware does not undergo that certification. To release a commercial production driver that clearly fails Driver Verifier is just plain excusable.
Full Burn-In CriteriaHere is my current burn-in criteria to deem a new system stable (without failure or exceeding CPU thermal ratings)
On USB AudioWhy is USB Audio on my burn-in list? Because USB Audio is very sensitive to DPC latency issues. I've had major struggles with it in the past as described here. In retrospect, now I wonder if my underlying problem under Windows 7 was really due to Hamachi! (As Hamachi is implicated in DPC issues -- I don't think they are related to Windows 8 per se, but rather, Windows 8 flags them whereas Windows 7 does not.) Alas, I no longer have a Windows 7 machine to retest...
One more note -- Running LatencyMon does not give a clean run for me. It typically is clean for minutes, but if left for hours, it will flag issues, typically in the nVidia Windows Kernel Mode driver, the Microsoft Storage Port driver, netbt, or the Network Driver Interface Specification (NDIS) driver. I have the latest Z87 chipset, SSD and nVidia production drivers installed and despite these flags from LatencyMon, my USB audio is perfectly fine during hours of continuous listening. I guess I am concluding that whereas a lot of spikes indicate trouble, a spike every few hrs even to 2000 microseconds is not of concern if the audio empirically works fine.
|> Last Updated: January, 2014||> Contact: email@example.com|