Raymond Lau
 
HOME | WORK | LEARN | EAT | LIKE   

Windows 8.1 x64 DPC Watchdog Violation Traced to LogMeIn Hamachi

(and why I now consider LogMeIn to have such grossly poor engineering effort to the point that I have decided to stop using any of their products both personally and professionally)

Background/Problem

In mid November, 2013, I decided to build a new PC and install Windows 8.1 64 bit professional on it. Doing multiple updates at once, in this case new hardware and new OS, always poses a challenge as you never quite know what is to blame when something goes wrong. As Murphy's Law would have it, something went wrong in a very big way.

Within 48 hrs (typically within 24 hours), my machine would bug check (a/k/a blue screen of death) with a DPC Watchdog Violation. This started a 3-4 week hunt to identify the culprit(s).

The Bugcheck

A typical minidump file is provided here.

Here is the WhoCrashed explanation:

Crash dump file:        C:\Windows\Minidump\112913-15875-01.dmp
Date/time:              11/29/2013 5:32:54 AM GMT
Uptime:                 1 day, 15:45:51
Machine:                WIN8
Bug check name:         DPC_WATCHDOG_VIOLATION
Bug check code:         0x133
Bug check parm 1:       0x0
Bug check parm 2:       0x501
Bug check parm 3:       0x500
Bug check parm 4:       0x0
Probably caused by:     hal.dll
Driver description:     Hardware Abstraction Layer DLL
Driver product:         Microsoft® Windows® Operating System
Driver company:         Microsoft Corporation
OS build:               Built by: 9600.16422.amd64fre.winblue_gdr.131006-1505
Architecture:           x64 (64 bit)
CPU count:              8
Page size:              4096

Bug check description: 
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above.

with stack trace:

nt!KeBugCheckEx+0x0
nt!
??
::FNODOBFM::`string'+0x13DDC
nt!KiUpdateRunTime+0x57
nt!KiUpdateTime+0x63C
nt!KeClockInterruptNotify+0x5C
hal!HalpTimerClockInterrupt+0x4F
nt!KiCallInterruptServiceRoutine+0xA3
nt!KiInterruptSubDispatchNoLockNoEtw+0xEA
nt!KiInterruptDispatchLBControl+0x11F
nt!KxWaitForLockOwnerShip+0x30
nt!IoAcquireCancelSpinLock+0x56
nt!IoStartPacket+0x47
USBSTOR!USBSTOR_Scsi+0x2A3
CLASSPNP!ClasspSendMediaStateIrp+0x110
CLASSPNP!ClasspTimerTick+0x65
nt!KiProcessExpiredTimerList+0x1D8
nt!KiExpireTimerTable+0x218
nt!KiTimerExpiration+0x148
nt!KiRetireDpcList+0x19C
nt!KiIdleLoop+0x5A

I know that WhoCrashed isn't the most reliable -- but it's good enough to give at least the params. I am not a Windows developer, so I don't have a full debugger installed. If you look at the stacktrace in this particular example, it shows USBSTOR. A smoking gun? Not quite. Here is another minidump example where the stack trace is more innocuous:

nt!KeBugCheckEx+0x0
nt!
??
::FNODOBFM::`string'+0x13DDC
nt!KiUpdateRunTime+0x57
nt!KiUpdateTime+0x63C
nt!KeClockInterruptNotify+0x5C
hal!HalpTimerClockInterrupt+0x4F
nt!KiCallInterruptServiceRoutine+0xA3
nt!KiInterruptSubDispatchNoLockNoEtw+0xEA
nt!KiInterruptDispatchLBControl+0x11F
nt!KxWaitForSpinLockAndAcquire+0x20
nt!KeAcquireSpinLockRaiseToDpc+0x32
netbt!TimerExpiry+0x1A
nt!KiProcessExpiredTimerList+0x1D8
nt!KiExpireTimerTable+0x218
nt!KiTimerExpiration+0x148
nt!KiRetireDpcList+0x19C
nt!KiIdleLoop+0x5A

So what exactly is a DPC Watchdog Violation? If you search Google, you will see that a LOT of people have run into it with Windows 8. The technical explanation is given by Microsoft here. In summary, it looks like some driver or drivers are taking too much time in processing DPCs. It also seems (though I have not been able to definitively confirm this) that the DPC Watchdog did not bugcheck prior to Windows 8 -- it would silently log a debug message in Windows 7. This would explain why the ultimate culprit (which as the title indicates, is LogMeIn Hamachi) may not have caused an issue under my prior Windows 7 installation.

Diagnostic Process

I spent the next three weeks testing various combinations of drivers, hardware installed/removed, reinstalls of the OS, etc. This was a time consuming task of seeing what combinations can survive a 72 hr uptime acceptance criterion. After much troubleshooting, I identified two possible culprits: OpenHardwareMonitor (it has a kernel driver to read data) and LogMeIn Hamachi. The fishy thing was, neither software by itself would exhibit the DPC Watchdog Violation, but once both were added and used, the bugcheck would occur.

The Great Microsoft Driver Verifier

During my research, I had read about Microsoft's Driver Verifier. It will apply a series of selectable stress tests to drivers and will also check for various error conditions.

So, I activiated Driver Verifier with my normal operating set of software, including both Hamachi and OpenHardwareMonitor. On reboot -- immediate bugcheck with DRIVER_VERIFIER_IOMANAGER_VIOLATION. Here is the corresponding minidump file. And the WhoCrashed description is:

Crash dump file:        C:\Windows\Minidump\121813-18703-01.dmp
Date/time:              12/18/2013 11:37:51 AM GMT
Uptime:                 00:00:32
Machine:                WIN8
Bug check name:         DRIVER_VERIFIER_IOMANAGER_VIOLATION 
Bug check code:         0xC9
Bug check parm 1:       0x7
Bug check parm 2:       0xFFFFF8000221888C
Bug check parm 3:       0xFFFFCF802B720EA0
Bug check parm 4:       0x0
Probably caused by:     hamdrv.sys
Driver description:     
Driver product:         
Driver company:         
OS build:               Built by: 9600.16452.amd64fre.winblue_gdr.131030-1505
Architecture:           x64 (64 bit)
CPU count:              8
Page size:              4096

Bug check description: 
This is the bug check code for all Driver Verifier 

And what does the stacktrace indicate (actually the WhoCrashed description already identified it)?

nt!KeBugCheckEx+0x0
nt!IovCompleteRequest+0x73
Hamdrv+0x1E0B

Hamdrv is the Hamachi driver! Bug check param 1 indicates that: "The driver called IoCompleteRequest while its cancel routine was still set."

Is It Definitively Hamachi Then?

This is a hard question for me to answer definitively. OpenHardwareMonitor did not flag Driver Verifier and it is open source. I am not a Windows developer, but from my reading the code for the ring 0 (kernel mode) driver for OpenHardwareMonitor, it seemed quite innocuous. Here are the other evidence I've gathered:
  • Without Hamachi, my machine has been up 5+ days now, whereas before, it never survived 2 days, typically not even 1 day. The system has also survived my full burn-in criteria given below.
  • Someone else with more Windows development skills identified the DPC Watchdog bug check as being caused by Hamachi on this thread. His comments were:

    So at the very bottom (you read bottom to top) we can see two Hamdrv.sys calls, which is in regards to the LogMeIn Hamachi Virtual Network Adapter. At this specific time of crash, were you doing anything remotely? If not, it may be on startup likely and it just made a call.

    I would actually, for temporary troubleshooting purposes, recommend going ahead and removing the LogMeIn software. You can use Teamviewer or something as a replacement in the meantime if you truly need remote software at this very moment.

    (The original poster's issue was resolved after removing Hamachi. I also switched to TeamViewer.)

  • (Just in - Jan 3, 2014) I had posted a briefer version of this page before Christmas to the Hamachi and Windows 8.1 thread in the LogMeIn forums. Well, today, someone else has reported the same problem and also that removal of Hamachi fixed his problem. With independent replication and validation, the evidence now overwhelmingly in support of Hamachi being the culprit.

As to what version of Hamachi -- I believe it was 2.2.0.109. That is the version I get if I were to do a fresh install with the installer I used, but Hamachi does update itself, so I cannot be 100% certain.

Why I Consider LogMeIn's Engineering Efforts Extremely Lacking

Even if ultimately, Hamachi were exonnerated (unlikely), I still consider LogMeIn's engineering efforts to be so grossly negligent that I have discontinued my annual auto-renewal and am in the process of removing Hamachi from use. The reason is -- DriverVerifier is a fundamental test sequence that every driver should pass before it is released as a production product. Microsoft requires it as part of WHQL cerification -- of course, Hamachi not being tied to hardware does not undergo that certification. To release a commercial production driver that clearly fails Driver Verifier is just plain excusable.

Full Burn-In Criteria

Here is my current burn-in criteria to deem a new system stable (without failure or exceeding CPU thermal ratings)
  1. MemTest86 one full pass (this is not run from within Windows) to confirm that it passes and also to confirm memory throughput
  2. 10 minutes running Prime95 small FFTs
  3. 4 iterations of IntelBurnTest at Very High stress level
  4. 1 iteration of Intel Burn Test at Maximum stress level
  5. 5 minutes of AIDA64 FPU stress test
  6. 8 hrs of AIDA64 stress test (all memory, CPU stuff, but no I/O stuff) using 2GB per thread if you have 16GB+ memory, or using whatever maximum your memory supports
  7. At least one cycle of Prime95 custom blend with 5 minutes per test (for this i7-4770k 42x box, it takes about 6 hrs 45 minutes)
  8. Cinbench benchmark run (both CPU and graphics)
  9. PCMark 8 benchmark run
  10. Furmark 15 minutes run at 960x540
  11. 3DMark benchmark run
  12. AIDA64 Extreme Memory benchmark to confirm memory throughput
  13. Heaven benchmark at max size
  14. Valley benchmark at max size
  15. Folding@Home one project on both CPU and GPU
  16. Listen to USB audio (if you have a USB audio system) for 1 hr to insure no dropouts, pops, etc.
  17. 96 hr Windows up time under normal usage

On USB Audio

Why is USB Audio on my burn-in list? Because USB Audio is very sensitive to DPC latency issues. I've had major struggles with it in the past as described here. In retrospect, now I wonder if my underlying problem under Windows 7 was really due to Hamachi! (As Hamachi is implicated in DPC issues -- I don't think they are related to Windows 8 per se, but rather, Windows 8 flags them whereas Windows 7 does not.) Alas, I no longer have a Windows 7 machine to retest...

One more note -- Running LatencyMon does not give a clean run for me. It typically is clean for minutes, but if left for hours, it will flag issues, typically in the nVidia Windows Kernel Mode driver, the Microsoft Storage Port driver, netbt, or the Network Driver Interface Specification (NDIS) driver. I have the latest Z87 chipset, SSD and nVidia production drivers installed and despite these flags from LatencyMon, my USB audio is perfectly fine during hours of continuous listening. I guess I am concluding that whereas a lot of spikes indicate trouble, a spike every few hrs even to 2000 microseconds is not of concern if the audio empirically works fine.

Raymond Lau
January 1, 2014