DRIVER_POWER_STATE_FAILURE蓝屏问题分析(一)

DRIVER_POWER_STATE_FAILURE蓝屏问题分析

最近有网友新组装的电脑比较稳定的出现DRIVER_POWER_STATE_FAILURE蓝屏,尝试了网上许多的办法都没有解决,请求帮忙分析一下转储文件,本文记录一下整体的分析过程。

1. 问题背景

电脑比较有规律的出现蓝屏问题,如下:

4: kd> !targetinfo
Windows 10 Built by: 22000.1.amd64fre.co_release.210604-1628, 
Kernel debug session
Crash Time:               Mon Jan 24 18:55:57 2022
System Uptime:             0 days 0:06:25 
Version:                   10.0
Processors:                12 proc (F/M/S Vendor)
                            0 6/167/1 GenuineIntel
                            1 6/167/1 GenuineIntel
                            2 6/167/1 GenuineIntel
                            3 6/167/1 GenuineIntel
                           >4 6/167/1 GenuineIntel
                            5 6/167/1 GenuineIntel
                            6 6/167/1 GenuineIntel
                            7 6/167/1 GenuineIntel
                            8 6/167/1 GenuineIntel
                            9 6/167/1 GenuineIntel
                            10 6/167/1 GenuineIntel
                            11 6/167/1 GenuineIntel

这里我们可以发现,系统运行6分钟25秒的时候出现崩溃(据反馈,开机大致7分钟出现蓝屏),和现象基本一致。蓝屏代码如下:

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: ffff808f2bd19360, Physical Device Object of the stack
Arg3: ffffd501e185f090, nt!TRIAGE_9F_POWER on Win7 and higher, otherwise the Functional Device Object of the stack
Arg4: ffff808f2bc13970, The blocked IRP

DRIVER_POWER_STATE_FAILURE表示是电源状态错误(在指定时间内没有处理完成电源IRP请求出现蓝屏),蓝屏堆栈如下:

 # Child-SP          RetAddr           Call Site
00 ffffd501`e185f058 fffff801`4b9c25e7 nt!KeBugCheckEx
01 ffffd501`e185f060 fffff801`4b9c2501 nt!PopIrpWatchdogBugcheck+0xdf
02 ffffd501`e185f0d0 fffff801`4b6cc394 nt!PopIrpWatchdog+0x31
03 ffffd501`e185f120 fffff801`4b6ca984 nt!KiProcessExpiredTimerList+0x204
04 ffffd501`e185f250 fffff801`4b819a4e nt!KiRetireDpcList+0x714
05 ffffd501`e185f500 00000000`00000000 nt!KiIdleLoop+0x9e

2. 技术概要

通常,为了给一个设备发送电源相关的IRP,都是通过PoRequestPowerIrp来完成的,这个函数的声明如下:

NTSTATUS PoRequestPowerIrp(
  PDEVICE_OBJECT          DeviceObject,
  UCHAR                   MinorFunction,
  POWER_STATE             PowerState,
  PREQUEST_POWER_COMPLETE CompletionFunction,
  __drv_aliasesMem PVOID  Context,
  PIRP                    *Irp
);

在Win10系统下面,PoRequestPowerIrp这个函数的实现大致如下:

  1. 调用PopAllocateIrp来创建一个IRP对象,并将完成历程设置为PopRequestCompletion
  2. 调用PopQueueQuerySetIrp来完成一个查询或者设置电源请求的IRP。

对于PopQueueQuerySetIrp的实现可以总结为如下:

  1. 调用PopEnableIrpWatchdog给IRP设置一个看门狗定时器。
  2. 调用IofCallDriver来完成IRP。

2.1 PopEnableIrpWatchdog

PopEnableIrpWatchdog这个函数的流程大致可以总结为如下:

  1. 通过PopComputeWatchdogTimeout计算出看门狗的超时时间。
  2. 然后初始化一个DPC,并设置好定时器。
  3. DPC的回调例程是PopIrpWatchdog.
2.1.1 PopComputeWatchdogTimeout

PopComputeWatchdogTimeout是获取的如下两个值中的一个:

  1. PopWatchdogSleepTimeout
  2. PopWatchdogResumeTimeout

这两个值的结果如下:

4: kd> dd nt!PopWatchdogSleepTimeout L1
fffff801`4c105078  0000012c
4: kd> dd nt!PopWatchdogResumeTimeout L1
fffff801`4c105150  00000078

这里PopWatchdogSleepTimeout是300秒,跟我们开机7分钟之后蓝屏比较接近(也就是说,这个IRP大致是开机后2分钟之内发起的)。

2.1.2 PopIrpWatchdog

PopIrpWatchdog这个是看门狗定时器的回调例程,这个函数大致实现如下:

void __thiscall __noreturn PopIrpWatchdogBugcheck(_DWORD *this, int a2)
{
	//...
	TriagePower.IrpList = (_LIST_ENTRY *)&PopIrpList;
	TriagePower.Signature = 0x8000u;
	TriagePower.Revision = 2;
	TriagePower.DelayedWorkQueue = (_TRIAGE_EX_WORK_QUEUE *)ExWorkerQueue;
	TriagePower.DelayedIoWorkQueue = (_TRIAGE_EX_WORK_QUEUE *)IoWorkerQueue;
	TriagePower.ThreadList = (_LIST_ENTRY *)&PopIrpThreadList;
	KeBugCheckEx(0x9Fu, 3u, DeviceObject, &TriagePower, Irp);
}

在这个超时回调函数中,回调用KeBugCheckEx(0x9Fu, 3u, DeviceObject, &TriagePower, Irp);来使得系统强制蓝屏。

2.2 IofCallDriver

在电源IRP中,IofCallDriver函数里面使用PoHandleIrp来处理IRP的调用,PoHandleIrp这个函数使用PopDispatchQuerySetIrp来分发函数。

PopDispatchQuerySetIrp实现两个东西:

  1. 将IRP挂入PopIrpWorkerList队列中。
  2. 使用KeReleaseSemaphore(&PopIrpWorkerSemaphore, 0, 1, 0)通知PopIrpWorker来处理。

2.3 PopIrpWorker

PopIrpWorker这个worker是用来处理电源请求的线程,这个线程的线程栈如下:

4: kd> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr           Call Site
00 ffffd501`e1817060 fffff801`4b7324a7 nt!KiSwapContext+0x76
01 ffffd501`e18171a0 fffff801`4b734359 nt!KiSwapThread+0x3a7
02 ffffd501`e1817280 fffff801`4b72e274 nt!KiCommitThreadWait+0x159
03 ffffd501`e1817320 fffff801`4b7ab612 nt!KeWaitForSingleObject+0x234
04 ffffd501`e1817410 fffff801`4b6478f5 nt!PopIrpWorker+0x102
05 ffffd501`e18174b0 fffff801`4b819bb4 nt!PspSystemThreadStartup+0x55
06 ffffd501`e1817500 00000000`00000000 nt!KiStartSystemThread+0x34

在这个线程中等待PopIrpWorkerSemaphore信号,然后从PopIrpWorkerList去取出IRP执行,我们可以看一下整个Worker的响应线程调用栈,如下:

kd> !thread
THREAD 8a153040  Cid 0004.16d4  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 0
Not impersonating
DeviceMap                 8ac04330
Owning Process            8a1837c0       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      4667           Ticks: 2 (0:00:00:00.031)
Context Switch Count      9              IdealProcessor: 0             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address nt!PopIrpWorker (0x81bb604e)
Stack Init c1503ca0 Current c15037dc Base c1504000 Limit c1501000 Call 00000000
Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr  Args to Child              
80e49ce0 8287f85f 00000000 00000000 8287fac0 ACPI!AsyncEvalObject (FPO: [Non-Fpo])
80e49d58 82879fab a0041802 8ead9a24 a763ac20 ACPI!ACPIGet+0x31f (FPO: [Non-Fpo])
80e49d94 8287c304 90dfe818 80e12300 80e49f18 ACPI!ACPIDevicePowerProcessPhase0DeviceSubPhase1+0x51 (FPO: [Non-Fpo])
80e49de0 81ab2898 828d0da0 00000000 00000000 ACPI!ACPIDevicePowerDpc+0x394 (FPO: [Non-Fpo])
80e49eb8 81ab2083 80e49f18 00000000 90dfe818 nt!KiExecuteAllDpcs+0x2b8 (FPO: [Non-Fpo])
80e49ff4 81bed85e c1503904 00000000 00000000 nt!KiRetireDpcList+0x123 (FPO: [Non-Fpo])
c1503928 81b7db53 00000008 828d0dc4 c150395c nt!KiDispatchInterrupt+0x2e (FPO: [Uses EBP] [0,0,1])
c1503938 8287a51a 910cca70 8eae4c30 00000000 nt!KfReleaseSpinLock+0x23 (FPO: [Non-Fpo])
c150395c 82897c9c 00000004 910cca70 00000006 ACPI!ACPIDeviceInitializePowerRequest+0x114 (FPO: [Non-Fpo])
c1503994 81aac394 00000004 828990b0 82896e34 ACPI!ACPIDeviceIrpDeviceFilterRequest+0xea (FPO: [Non-Fpo])
c15039e8 81aac14a c1503a18 82986c8d 910ccb44 nt!IopfCompleteRequest+0x244 (FPO: [Non-Fpo])
c15039f0 82986c8d 910ccb44 910cca70 910cca16 nt!IofCompleteRequest+0x1a (FPO: [Non-Fpo])
c1503a18 81bb636f 00963030 910cca70 90963030 pci!PciDispatchPnpPower+0xdd (FPO: [Non-Fpo])
c1503a30 81aaf96c 910ccb68 910cca70 c1503a48 nt!IopPoHandleIrp+0x27 (FPO: [Non-Fpo])
c1503a40 81bbb78a c1503a78 82895c65 90963030 nt!IofCallDriver+0x5c (FPO: [Non-Fpo])
c1503a48 82895c65 90963030 910cca70 82895bbe nt!IoCallDriver+0x10 (FPO: [Non-Fpo])
c1503a78 82872395 909338d0 910cca70 910ccb68 ACPI!ACPIFilterIrpSetPower+0xa7 (FPO: [Non-Fpo])
c1503ab0 81bb636f 909338d0 910cca70 909338d0 ACPI!ACPIDispatchIrp+0x345 (FPO: [Non-Fpo])
c1503ac8 81aaf96c 910cca70 909b50e0 c1503ae0 nt!IopPoHandleIrp+0x27 (FPO: [Non-Fpo])
c1503ad8 81bbb78a c1503b00 82c0f5be 909338d0 nt!IofCallDriver+0x5c (FPO: [Non-Fpo])
c1503ae0 82c0f5be 909338d0 910cca70 910cca70 nt!IoCallDriver+0x10 (FPO: [Non-Fpo])
c1503b00 82c26f6c 909b5028 909b5320 82c57001 storport!RaidAdapterDevicePowerstopAdapter+0x98 (FPO: [Non-Fpo])
c1503b14 82c0ed97 909b5028 910cca70 00000000 storport!RaidAdapterStopOnPowerdown+0x181cc
c1503b44 82c0ec95 910cca70 909b6cd8 909b6cd8 storport!GatewayRegisterForEmptyNotification+0x4d (FPO: [Non-Fpo])
c1503b60 82c0e551 909b50e0 00000004 910cca70 storport!RaidAdapterPowerDownDevice+0x4f (FPO: [0,0,4])
c1503b90 82c0e262 910cca70 00000002 909b50e0 storport!RaidAdapterSetDevicePowerIrp+0x89 (FPO: [Non-Fpo])
c1503bac 82c0e0d2 910f94c0 910cca70 909b5028 storport!RaidAdapterSetPowerIrp+0x48 (FPO: [Non-Fpo])
c1503bc4 82c0e049 909b5028 910f94c0 910cca70 storport!RaidAdapterPowerIrp+0x52 (FPO: [Non-Fpo])
c1503bd8 81bb61d4 909b5028 910cca70 00000000 storport!RaDriverPowerIrp+0x39 (FPO: [Non-Fpo])
c1503c38 81b120f0 a358d590 7c778104 00000000 nt!PopIrpWorker+0x186 (FPO: [Non-Fpo])
c1503c70 81bee18d 81bb604e a358d590 00000000 nt!PspSystemThreadStartup+0x4a (FPO: [Non-Fpo])
c1503c7c 00000000 00000000 0401ff00 00000023 nt!KiThreadStartup+0x15

2.4 总结

综上,我们总结一下PoRequestPowerIrp发起电源IRP的整个过程。

  1. 创建一个IRP。
  2. 设置好IRP对应的DPC定时器看门狗,如果看门狗超时,那么就会直接蓝屏。
  3. 将IRP放入到PopIrpWorkerList队列中,提交给PopIrpWorker来处理。
  4. PopIrpWorker从队列中取出IRP,然后发送IRP到指定的设备栈。

这里我们可以推测的两种情况:

  1. IRP正常完成,那么取消DPC的看门狗。
  2. 如果IRP超时,那么DPC定时器相应,然后蓝屏。

3. 问题分析

通过上面分析,我们可以看一下整个分析过程,我们可以初步得出的信息有:

  1. IRP发往的设备对象为ffff808f2bd19360.
  2. 蓝屏时候的TRIAGE_9F_POWER结构地址为ffffd501e185f090.
  3. 超时的IRP地址为ffff808f2bc13970.

在Windbg中提供了!poaction来解析上述的两个链表(PopIrpListPopIrpThreadList), 如下:

4: kd> !poaction
PopAction: fffff8014c022640
  State..........: 0 - Idle
  Updates........: 0 
  Action.........: None
  Lightest State.: Unspecified
  Flags..........: 10000003 QueryApps|UIAllowed
  Irp minor......: ??
  System State...: Unspecified
  Hiber Context..: 0000000000000000

Allocated power irps (PopIrpList - fffff8014c022e20)
  IRP: ffff808f33a045a0 (wait-wake/S0), PDO: ffff808f338dddc0
  IRP: ffff808f33a389a0 (wait-wake/S0), PDO: ffff808f33910d40
  IRP: ffff808f3395b010 (wait-wake/S4), PDO: ffff808f33a570a0
  IRP: ffff808f3395c010 (wait-wake/S4), PDO: ffff808f33a95060
  IRP: ffff808f33957010 (wait-wake/S4), PDO: ffff808f33a86120
  IRP: ffff808f33959010 (wait-wake/S4), PDO: ffff808f33ab7060
  IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50

Irp worker threads (PopIrpThreadList - fffff8014c01f970)
  THREAD: ffff808f2a745040 (static)
  THREAD: ffff808f2a744040 (static)

Broadcast in progress: FALSE
Is Directed DRIPS Transition: FALSE

No Device State present

针对IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50,我们看看每个设备的电源状态是否设置正常,如下:

4: kd> !podev ffff808f2bd19360
Device object is for:
  DriverObject 2bbd4c90
Current Irp 00000000 RefCount 0 Type 00000004 AttachedDev ffff808f2bcc5d50 DevFlags 00001040
Device queue is not busy.
Device Object Extension: ffff808f2bd19c68:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:

4: kd> !podev ffff808f2bcc5d50
Device object is for:
  DriverObject 2a9cf060
Current Irp 00000000 RefCount 0 Type 00000032 AttachedDev ffff808f2bc0f050 DevFlags 00000000
Device queue is not busy.
Device Object Extension: ffff808f2bcc5ea0:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:

4: kd> !podev ffff808f2bc0f050
Device object is for:
  DriverObject 2bc6e830
Current Irp 00000000 RefCount 0 Type 00000004 DevFlags 00000050
Device queue is not busy.
Device Object Extension: ffff808f2bc138e0:
PowerFlags: 00000040 =>SystemState=0 DeviceState=4
Dope: 00000000:

可以发现每个设备的电源状态为DeviceState=4,看起来似乎是正常的,已经设置成功了,针对这个IRP状态如下:

4: kd> !irp ffff808f2bc13970
Irp is active with 6 stacks 4 is current (= 0xffff808f2bc13b18)
 No Mdl: No System Buffer: Thread 00000000:  Irp stack trace.  
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [IRP_MJ_POWER(16), IRP_MN_WAIT_WAKE(0)]
            0  0 ffff808f2bd19360 00000000 fffff8014ddee1d0-fffff8014ddee5b0    
	       \Driver\pci	ACPI!ACPIDeviceIrpDeviceFilterRequest
			Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0 e1 ffff808f2bcc5d50 00000000 fffff8014e321b60-00000000 Success Error Cancel pending
	       \Driver\ACPI	storport!RaidAdapterPowerDownDeviceCompletion
			Args: 00000000 00000001 00000004 00000000
 [IRP_MJ_POWER(16), IRP_MN_SET_POWER(2)]
            0 e1 ffff808f2bc0f050 00000000 fffff8014b79f830-ffff808f2a6e1208 Success Error Cancel pending
	       \Driver\storahci	nt!PopRequestCompletion
			Args: 00000000 00000001 00000004 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-ffff808f2a6e1208    

			Args: 00000000 00000000 00000000 00000000

我们得知IRP_MN_SET_POWERIRP没有完成导致超时蓝屏,这里有两个可疑点:

  1. IRP: ffff808f2bc13970 (set/D3,), PDO: ffff808f2bd19360, CURRENT: ffff808f2bcc5d50:说明这个IRP是将硬件设备设置为D3状态,对于PowerDeviceD3表明是将设备进行关闭的调用,这对于一个存储设备来说是比较可疑的。
  2. IRP_MN_SET_POWERIRP发往哪里进行处理了。

我们先从比较简单的情况来看,看一下IRP在什么地方进行处理了,从!irp ffff808f2bc13970命令大致可以得到,PCI总线已经响应了电源请求处理(也就是说调用了IoCompleteRequest),但是在完成例程中再次对IRP处理的时候失去了响应。

我们对电源处理的Worker和PCI处理的Worker线程进行查看,发现如下:

4: kd> !process 0xffff808f2a6e7040 7
        THREAD ffff808f2a744040  Cid 0004.0010  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
            fffff8014c022600  Semaphore Limit 0x7fffffff
        Not impersonating
        DeviceMap                 ffffae88c1438c3f
        Owning Process            ffff808f2a6e7040       Image:         System
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      9730           Ticks: 14919 (0:00:03:53.109)
        Context Switch Count      63             IdealProcessor: 0             
        UserTime                  00:00:00.000
        KernelTime                00:00:00.000
        Win32 Start Address nt!PopIrpWorker (0xfffff8014b7ab510)
        Stack Init ffffd501e1817530 Current ffffd501e1817020
        Base ffffd501e1818000 Limit ffffd501e1811000 Call 0000000000000000
        Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           : Args to Child                                                           : Call Site
        ffffd501`e1817060 fffff801`4b7324a7 : fffff801`47b24180 fffff801`ffffffff ffff808f`00000000 00000000`00000001 : nt!KiSwapContext+0x76
        ffffd501`e18171a0 fffff801`4b734359 : ffff808f`00000000 fffff801`0000000a ffffd501`e1817380 00000000`00000000 : nt!KiSwapThread+0x3a7
        ffffd501`e1817280 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159
        ffffd501`e1817320 fffff801`4b7ab612 : fffff801`4c022600 ffff808f`00000000 00000000`00000000 00000000`00000000 : nt!KeWaitForSingleObject+0x234
        ffffd501`e1817410 fffff801`4b6478f5 : 00000000`00000000 fffff801`4b7ab510 00000000`00000000 00000000`000000d4 : nt!PopIrpWorker+0x102
        ffffd501`e18174b0 fffff801`4b819bb4 : ffffbf01`2b372180 ffff808f`2a744040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
        ffffd501`e1817500 00000000`00000000 : ffffd501`e1818000 ffffd501`e1811000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34

        THREAD ffff808f2a745040  Cid 0004.0014  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
            fffff8014c022600  Semaphore Limit 0x7fffffff
        Not impersonating
        DeviceMap                 ffffae88c1438c3f
        Owning Process            ffff808f2a6e7040       Image:         System
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      9413           Ticks: 15236 (0:00:03:58.062)
        Context Switch Count      108            IdealProcessor: 0             
        UserTime                  00:00:00.000
        KernelTime                00:00:00.000
        Win32 Start Address nt!PopIrpWorker (0xfffff8014b7ab510)
        Stack Init ffffd501e181f530 Current ffffd501e181f020
        Base ffffd501e1820000 Limit ffffd501e1819000 Call 0000000000000000
        Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           : Args to Child                                                           : Call Site
        ffffd501`e181f060 fffff801`4b7324a7 : fffff801`47b24180 fffff801`47b24180 ffff808f`00000000 00000000`00000003 : nt!KiSwapContext+0x76
        ffffd501`e181f1a0 fffff801`4b734359 : ffff808f`00000000 fffff801`00000003 ffffd501`e181f380 00007f70`00000000 : nt!KiSwapThread+0x3a7
        ffffd501`e181f280 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159
        ffffd501`e181f320 fffff801`4b7ab612 : fffff801`4c022600 ffff808f`00000000 00000000`00000000 00000000`00000000 : nt!KeWaitForSingleObject+0x234
        ffffd501`e181f410 fffff801`4b6478f5 : 00000000`00000000 fffff801`4b7ab510 00000000`00000000 00000000`000000da : nt!PopIrpWorker+0x102
        ffffd501`e181f4b0 fffff801`4b819bb4 : ffffbf01`2b1ce180 ffff808f`2a745040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
        ffffd501`e181f500 00000000`00000000 : ffffd501`e1820000 ffffd501`e1819000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34

        THREAD ffff808f2a6d9040  Cid 0004.014c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
            fffff8014de40a60  NotificationEvent
            fffff8014de40a20  NotificationEvent
        Not impersonating
        DeviceMap                 ffffae88c1438c3f
        Owning Process            ffff808f2a6e7040       Image:         System
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      5451           Ticks: 19198 (0:00:04:59.968)
        Context Switch Count      845            IdealProcessor: 4             
        UserTime                  00:00:00.000
        KernelTime                00:00:00.062
        Win32 Start Address ACPI!ACPIWorkerThread (0xfffff8014dde5550)
        Stack Init ffffd501e1b78530 Current ffffd501e1b78040
        Base ffffd501e1b79000 Limit ffffd501e1b72000 Call 0000000000000000
        Priority 16 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           : Args to Child                                                           : Call Site
        ffffd501`e1b78080 fffff801`4b7324a7 : ffffbf01`2b1ce180 00000000`ffffffff ffff808f`2eb32158 fffff801`4de0208a : nt!KiSwapContext+0x76
        ffffd501`e1b781c0 fffff801`4b734359 : 00180001`00000004 00000000`00000003 ffffd501`e1b783d8 ffff808f`00000000 : nt!KiSwapThread+0x3a7
        ffffd501`e1b782a0 fffff801`4b68ecd1 : ffff808f`00000000 fffff801`00000000 ffff808f`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159
        ffffd501`e1b78340 fffff801`4dde55ea : ffff808f`00000000 fffff801`4de40a00 00000000`00000000 ffffbf01`2b1ce180 : nt!KeWaitForMultipleObjects+0x2b1
        ffffd501`e1b78440 fffff801`4b6478f5 : ffff808f`2a6d9040 fffff801`4dde5550 00000000`00000000 ffff808f`2a6e7040 : ACPI!ACPIWorkerThread+0x9a
        ffffd501`e1b784b0 fffff801`4b819bb4 : ffffbf01`2b1ce180 ffff808f`2a6d9040 fffff801`4b6478a0 00000000`00000000 : nt!PspSystemThreadStartup+0x55
        ffffd501`e1b78500 00000000`00000000 : ffffd501`e1b79000 ffffd501`e1b72000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34

从上面来看,暂时没有发现相关的异常,因此从IRP来跟踪,难度已经比较大了。不过这里有个比较特殊的点就是Wait Start TickCount 5451 Ticks: 19198 (0:00:04:59.968),似乎ACPIWorkerThread大致已经等了5分钟,这和我们发起的IRP_MN_SET_POWER超时时间基本一致,但是这也并不能说明什么。

接下来我们看一下IRP对应设备栈的状态,大致如下:

4: kd> !devstack ffff808f2bc0f050
  !DevObj           !DrvObj            !DevExt           ObjectName
> ffff808f2bc0f050  \Driver\storahci   ffff808f2bc0f1a0  RaidPort0
  ffff808f2bcc5d50  \Driver\ACPI       ffff808f2bafa420  
  ffff808f2bd19360  \Driver\pci        ffff808f2bd194b0  NTPNP_PCI0006
!DevNode ffff808f2bbdbc40 :
  DeviceInst is "PCI\VEN_8086&DEV_43D2&SUBSYS_86941043&REV_11\3&11583659&0&B8"
  ServiceName is "storahci"
4: kd> !DevNode ffff808f2bbdbc40
DevNode 0xffff808f2bbdbc40 for PDO 0xffff808f2bd19360
  Parent 0xffff808f2a715c40   Sibling 0xffff808f2bbe9c40   Child 0000000000   
  InstancePath is "PCI\VEN_8086&DEV_43D2&SUBSYS_86941043&REV_11\3&11583659&0&B8"
  ServiceName is "storahci"
  State = DeviceNodeStopped (0x30a)
  Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
  StateHistory[09] = DeviceNodeAwaitingQueuedRemoval (0x30f)
  StateHistory[08] = DeviceNodeAwaitingQueuedDeletion (0x30e)
  StateHistory[07] = DeviceNodeStopped (0x30a)
  StateHistory[06] = DeviceNodeQueryStopped (0x309)
  StateHistory[05] = DeviceNodeStarted (0x308)
  StateHistory[04] = DeviceNodeStartPostWork (0x307)
  StateHistory[03] = DeviceNodeStartCompletion (0x306)
  StateHistory[02] = DeviceNodeStartPending (0x305)
  StateHistory[01] = DeviceNodeResourcesAssigned (0x304)
  StateHistory[00] = DeviceNodeUninitialized (0x301)
  StateHistory[19] = Unknown State (0x0)
  StateHistory[18] = Unknown State (0x0)
  StateHistory[17] = Unknown State (0x0)
  StateHistory[16] = Unknown State (0x0)
  StateHistory[15] = Unknown State (0x0)
  StateHistory[14] = Unknown State (0x0)
  StateHistory[13] = Unknown State (0x0)
  StateHistory[12] = Unknown State (0x0)
  StateHistory[11] = Unknown State (0x0)
  StateHistory[10] = Unknown State (0x0)
  Flags (0x6c000030)  DNF_ENUMERATED, DNF_IDS_QUERIED, 
                      DNF_NO_LOWER_DEVICE_FILTERS, DNF_NO_LOWER_CLASS_FILTERS, 
                      DNF_NO_UPPER_DEVICE_FILTERS, DNF_NO_UPPER_CLASS_FILTERS

这里有一个非常奇怪的地方就是State = DeviceNodeStopped (0x30a),设备的状态已经处于Stop状态了,看起来这个状态是异常的,因为一个设备处于工作状态应该是State = DeviceNodeStarted (0x308),但是缺少信息继续深入下去了。

我们看一下整个系统有没有处于异常情况的操作状态,结果发现如下:

4: kd> !thread ffff808f2a6d4040
THREAD ffff808f2a6d4040  Cid 0004.016c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
    ffffd501e1bb7a00  SynchronizationEvent
IRP List:
    ffff808f40b82010: (0006,01f0) Flags: 00000000  Mdl: 00000000
Not impersonating
DeviceMap                 ffffae88c1438c3f
Owning Process            ffff808f2a6e7040       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      9546           Ticks: 15103 (0:00:03:55.984)
Context Switch Count      23469          IdealProcessor: 6             
UserTime                  00:00:00.000
KernelTime                00:00:01.015
Win32 Start Address nt!ExpWorkerThread (0xfffff8014b71cf50)
Stack Init ffffd501e1bb8530 Current ffffd501e1bb75c0
Base ffffd501e1bb9000 Limit ffffd501e1bb2000 Call 0000000000000000
Priority 13 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd501`e1bb7600 fffff801`4b7324a7 : ffffbf01`2b372180 fffff801`ffffffff 00000000`00000001 ffffd501`00000000 : nt!KiSwapContext+0x76
ffffd501`e1bb7740 fffff801`4b734359 : ffff808f`00000006 00000000`00000004 ffffd501`e1bb7920 00000000`00000000 : nt!KiSwapThread+0x3a7
ffffd501`e1bb7820 fffff801`4b72e274 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x159
ffffd501`e1bb78c0 fffff801`4de6f0d5 : ffffd501`e1bb7a00 ffff808f`00000000 ffff808f`2cf8f300 fffff801`4de2fb00 : nt!KeWaitForSingleObject+0x234
ffffd501`e1bb79b0 fffff801`4ddecdf0 : 00000000`00000000 ffff808f`2a6b0aa0 ffff808f`40b82010 ffff808f`2a6b0aa8 : ACPI!ACPIInitStopDevice+0xd9
ffffd501`e1bb7a30 fffff801`4ddc13d9 : ffff808f`2a6b0aa0 ffff808f`40b82002 ffff808f`40b82010 ffff808f`40b82170 : ACPI!ACPIFilterIrpRemoveDevice+0x250
ffffd501`e1bb7ae0 fffff801`4b703025 : 00000000`00000007 ffff808f`3141c180 ffff808f`40b82010 00000000`74727044 : ACPI!ACPIDispatchIrp+0x3c9
ffffd501`e1bb7b60 fffff801`63adbad3 : ffff808f`3141c180 ffff808f`3141c180 ffff808f`40b82010 ffff808f`3141c030 : nt!IofCallDriver+0x55
ffffd501`e1bb7ba0 fffff801`6396ed55 : ffff808f`3141c102 ffff808f`3141c030 00000000`00000000 ffff808f`40b82010 : dxgkrnl!DpiFdoHandleRemoveDevice+0x3b3
ffffd501`e1bb7be0 fffff801`6392c7ea : ffff808f`3141c030 ffff808f`40b82010 ffff808f`40b82010 00000000`00000000 : dxgkrnl!DpiFdoDispatchPnp+0xd5
ffffd501`e1bb7c80 fffff801`676856db : 00000000`c0000002 ffffd501`e1bb7e19 ffff808f`31587000 ffff808f`3141c030 : dxgkrnl!DpiDispatchPnp+0xea
ffffd501`e1bb7da0 fffff801`67685104 : ffff808f`3141c030 00000000`c0000002 ffff808f`31587000 ffffae88`d2ae2e70 : nvlddmkm!nvDumpConfig+0x4e423b
ffffd501`e1bb7e80 fffff801`4b703025 : ffff808f`3141c030 00000000`00000000 ffffd501`e1bb7fa0 00000000`69706e04 : nvlddmkm!nvDumpConfig+0x4e3c64
ffffd501`e1bb7eb0 fffff801`4badf364 : 00000000`c00000bb ffff808f`3141c030 00000000`00000000 ffff808f`2bd04b08 : nt!IofCallDriver+0x55
ffffd501`e1bb7ef0 fffff801`4bbcab04 : 00000000`00000002 ffff808f`2bd04120 ffff808f`2bd04ae0 ffff808f`2bd04120 : nt!IopSynchronousCall+0xf8
ffffd501`e1bb7f60 fffff801`4b76b961 : ffffae88`d2e84f00 ffff808f`2bd04ae0 00000000`00000200 00000000`0000000a : nt!IopRemoveDevice+0x108
ffffd501`e1bb8010 fffff801`4bbcba9e : ffff808f`2bd04ae0 ffffae88`00000015 00000000`00000000 cb3a4008`00200001 : nt!PnpRemoveLockedDeviceNode+0x1a9
ffffd501`e1bb8070 fffff801`4bbcb803 : ffff808f`2bd04ae0 ffffd501`e1bb80f0 ffff808f`2bd04120 00000000`00000000 : nt!PnpDeleteLockedDeviceNode+0x52
ffffd501`e1bb80b0 fffff801`4bbc9b4c : ffff808f`2bd04120 00000000`00000002 ffff808f`2bd04120 00000000`00000000 : nt!PnpDeleteLockedDeviceNodes+0xcf
ffffd501`e1bb8130 fffff801`4bbc6d99 : ffffd501`e1bb8270 ffff808f`2bd04a00 ffff808f`414e9d00 ffffae88`00000002 : nt!PnpProcessQueryRemoveAndEject+0x2fc
ffffd501`e1bb8210 fffff801`4bba6605 : ffffae88`d2e84f00 ffffae88`d150fe10 ffff808f`2a6e1b00 00000000`00000000 : nt!PnpProcessTargetDeviceEvent+0x109
ffffd501`e1bb8240 fffff801`4b71d09f : ffff808f`2a6e1bd0 ffff808f`2a6d4040 ffff808f`414e9d90 fffff801`00000000 : nt!PnpDeviceEventWorker+0x2c5
ffffd501`e1bb82c0 fffff801`4b6478f5 : ffff808f`2a6d4040 ffffbf01`2b6b0000 ffff808f`2a6d4040 005fa4ef`b59bbfff : nt!ExpWorkerThread+0x14f
ffffd501`e1bb84b0 fffff801`4b819bb4 : ffffbf01`2b6a1180 ffff808f`2a6d4040 fffff801`4b6478a0 e0458b48`217503fa : nt!PspSystemThreadStartup+0x55
ffffd501`e1bb8500 00000000`00000000 : ffffd501`e1bb9000 ffffd501`e1bb2000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x34

这个操作看起来也比较奇怪(dxgkrnl!DpiFdoHandleRemoveDevice),因为显卡设备一般来说都不会拔除掉(并且这个IRP似乎也没有被底层设备完成了),我们看下显卡设备的状态如下:

4: kd> !DevNode ffff808f2bd04ae0
DevNode 0xffff808f2bd04ae0 for PDO 0xffff808f2bd04120
  Parent 0xffff808f2bbd6c40   Sibling 0xffff808f2bd06ae0   Child 0xffff808f36386cb0   
  InstancePath is "PCI\VEN_10DE&DEV_2489&SUBSYS_150A7377&REV_A1\4&39fddc03&0&0008"
  ServiceName is "nvlddmkm"
  State = DeviceNodeRemoved (0x312)
  Previous State = DeviceNodeStopped (0x30a)
  StateHistory[13] = DeviceNodeStopped (0x30a)
  StateHistory[12] = DeviceNodeAwaitingQueuedRemoval (0x30f)
  StateHistory[11] = DeviceNodeAwaitingQueuedDeletion (0x30e)
  StateHistory[10] = DeviceNodeStopped (0x30a)
  StateHistory[09] = DeviceNodeAwaitingQueuedRemoval (0x30f)
  StateHistory[08] = DeviceNodeAwaitingQueuedDeletion (0x30e)
  StateHistory[07] = DeviceNodeStopped (0x30a)
  StateHistory[06] = DeviceNodeQueryStopped (0x309)
  StateHistory[05] = DeviceNodeStarted (0x308)
  StateHistory[04] = DeviceNodeStartPostWork (0x307)
  StateHistory[03] = DeviceNodeStartCompletion (0x306)
  StateHistory[02] = DeviceNodeStartPending (0x305)
  StateHistory[01] = DeviceNodeResourcesAssigned (0x304)
  StateHistory[00] = DeviceNodeUninitialized (0x301)
  StateHistory[19] = Unknown State (0x0)
  StateHistory[18] = Unknown State (0x0)
  StateHistory[17] = Unknown State (0x0)
  StateHistory[16] = Unknown State (0x0)
  StateHistory[15] = Unknown State (0x0)
  StateHistory[14] = Unknown State (0x0)
  Flags (0x6e0000f0)  DNF_ENUMERATED, DNF_IDS_QUERIED, 
                      DNF_HAS_BOOT_CONFIG, DNF_BOOT_CONFIG_RESERVED, 
                      DNF_UNINSTALLED, DNF_NO_LOWER_DEVICE_FILTERS, 
                      DNF_NO_LOWER_CLASS_FILTERS, DNF_NO_UPPER_DEVICE_FILTERS, 
                      DNF_NO_UPPER_CLASS_FILTERS
  UserFlags (0x00000300)  
                          Unknown flags 0x00000300
  CapabilityFlags (0x00600400)  WakeFromD0
                                Unknown flags 0x00600000

处于DeviceNodeRemoved状态,显然这个也是有点异常的。

接下来我们看一下所有硬件设备树的状态信息,如下:

4: kd> !devnode 0 1
Dumping IopRootDeviceNode (= 0xffff808f2a7dbaa0)
DevNode 0xffff808f2a7dbaa0 for PDO 0xffff808f2a8cbd50
  InstancePath is "HTREE\ROOT\0"
  State = DeviceNodeStopped (0x30a)
  Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
  DevNode 0xffff808f2a8c6aa0 for PDO 0xffff808f2a8ced70
    InstancePath is "ROOT\volmgr\0000"
    ServiceName is "volmgr"
    State = DeviceNodeStopped (0x30a)
    Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
    DevNode 0xffff808f2e510ca0 for PDO 0xffff808f2e50cc10
      InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#0000000000200000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffae88c18b0990  b 0xffffae88c4069c90
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff808f2e514ca0 for PDO 0xffff808f2e50d8f0
      InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#0000000012E00000"
      ServiceName is "volume"
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff808f2e515ca0 for PDO 0xffff808f2e5108f0
      InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#000000001AE00000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffae88c18b0910  b 0xffffae88c4069090
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff808f2e5478a0 for PDO 0xffff808f2e5148f0
      InstancePath is "STORAGE\Volume\{29d2189a-7903-11ec-b987-806e6f6e6963}#00000013EEF00000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffae88c1859690  b 0xffffae88c4069c10
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)

竟然发现整个设备树的设备都是DeviceNodeStopped状态,是什么情况导致的呢?还未可知。

分析进行到这一步似乎在继续下去已经比较困难了,是否这里可以做一个相关猜想:由于硬件的异常,导致设备都处于Stopped状态了,这样硬件设备对于默认IRP的处理就出现问题,例如IRP_MN_SET_POWER,导致IRP处理超时蓝屏。但这仅仅是一个猜想,真实原因是否这样也不可知。

当然这个问题后续的方法可以通过动态调试,从开机开始观察整个系统和硬件设备的状态,来更进一步确认原因;由于现场环境的缺失(并且网友后面通过多次重装系统解决了),并且本人对于硬件驱动的了解也不是特别深入(上述设备的状态是否异常,已经是否有跟进一步的信息确认),本次分析就到这里了。

4. 总结

我们先来看一个问题,据反馈系统大致运行7分钟之后出现蓝屏,为什么会是这个时间呢?因为下面这个超时时间:

4: kd> dd nt!PopWatchdogSleepTimeout L1
fffff801`4c105078  0000012c
4: kd> ? 0000012c
Evaluate expression: 300 = 00000000`0000012c

300秒刚好是5分钟,估计这个IRP应该是在开机大致2分钟上下的时间产生的,因此也就导致系统在7分钟的时间出现蓝屏。

综上,上述问题的产生似乎并非软件引起,那么是否是硬件上面的不兼容呢(组装硬件的兼容性问题)?看起来比较大概率是的,但是没有最终的结论。

很遗憾,这个问题花了我周末一天的时间,依旧没有找到具体的根因?如果有网友看到本文章,有思路和建议欢迎留言给出。

5. 附录

这个问题网友后面通过重装了多次操作系统然后解决了,让其帮忙取一下正常情况的转储文件,发现设备状态如下:

0: kd> !devnode 0 1
Dumping IopRootDeviceNode (= 0xffff8d87e27f1aa0)
DevNode 0xffff8d87e27f1aa0 for PDO 0xffff8d87e27f0d70
  InstancePath is "HTREE\ROOT\0"
  State = DeviceNodeStopped (0x30a)
  Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
  DevNode 0xffff8d87e27f2aa0 for PDO 0xffff8d87e28e7d70
    InstancePath is "ROOT\volmgr\0000"
    ServiceName is "volmgr"
    State = DeviceNodeStopped (0x30a)
    Previous State = DeviceNodeAwaitingQueuedRemoval (0x30f)
    DevNode 0xffff8d87e6527ca0 for PDO 0xffff8d87e65218f0
      InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#0000000000200000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffe5889e602f10  b 0xffffe5889e3b7b90
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff8d87e6528ca0 for PDO 0xffff8d87e65278f0
      InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#0000000012E00000"
      ServiceName is "volume"
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff8d87e6550ca0 for PDO 0xffff8d87e65288f0
      InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#000000001AE00000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffe5889e603a90  b 0xffffe5889e3b7610
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff8d87e6555ca0 for PDO 0xffff8d87e65508f0
      InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#00000013EEF00000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffe5889e603290  b 0xffffe5889e3b7790
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)
    DevNode 0xffff8d87e65568a0 for PDO 0xffff8d87e65558f0
      InstancePath is "STORAGE\Volume\{97e02de8-85b9-11ec-baee-806e6f6e6963}#000000141B000000"
      ServiceName is "volume"
      TargetDeviceNotify List - f 0xffffe5889e60f790  b 0xffffe5889e3b7f90
      State = DeviceNodeStopped (0x30a)
      Previous State = DeviceNodeQueryStopped (0x309)

这个设备状态依旧是State = DeviceNodeStopped (0x30a),但是系统仍然可以正常使用;看起来似乎挺奇怪的,但也先到此为止吧!

  • 23
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 9
    评论
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值