Unkillable Processes

转载 2007年09月14日 03:27:00
Have you ever terminated an application only to see in your favorite task manager (Process Explorer, of course) that the process still exists? Or have you tried logging out or shutting down only to have the logoff or shutdown stall indefinitely for no apparent reason? These scenarios are usually the result of buggy device drivers that don’t properly handle the cancellation of outstanding I/O requests.

Over the last few years I’ve developed a tool called Notmyfault that demonstrates a number of common device driver bugs, including accessing freed memory, overrunning buffers, and leaking memory. The crashes generated by Notmyfault are featured in the crash analysis chapter of Windows Internals book I coauthored with Dave Solomon. I’ve recently added a new error selection, Hang Irp, in order to show the effects of drivers that don’t cancel I/O requests.

When you run Notmyfault and select the Hang Irp bug Notmyfault sends an I/O request into its helper driver, Myfault.sys, that Myfault.sys never completes. The names of the executable and driver reinforce the fact that user-mode code can never directly cause a Windows crash: Notmyfault relies on the Myfault driver to do the dirty work. The Notmyfault thread that issues the request never continues executing because it ends up stuck in the kernel waiting for the I/O request to complete. However, because Notmyfault issues the request from a second thread the UI remains responsive and you can issue other bugs, more hanging IRPs, or try to terminate the process.

Terminating Notmyfault reveals the effect of a hung IRP. Even after you close the Notmyfault window the Notmyfault process still shows in Process Explorer’s process list. Logging off and back in, even into a different account, does not cause the zombied process to exit. So what’s going on under the hood? If you’ve configured Process Explorer to take advantage of Microsoft’s symbol support (steps for doing so are documented in Process Explorer’s help file) you can view the stack of the hung thread by double-clicking on the Notmyfault process, navigating to the resulting Process Properties dialog’s Threads tab, and double-clicking on the thread:

A stack reflects a history of subroutine invocation and reads top to bottom from most to least recent. The stack above indicates that Notmyfault called DeviceIoControlFile, which called ZwDeviceIoControlFile. ZwDeviceIoControlFile transitioned into kernel-mode (the frames that are prefixed with “ntkrnlpa.exe”) where the kernel’s system call dispatcher executed NtDeviceIoControlFile. Since the I/O request was synchronous the I/O manager waits for the driver at which the I/O is targeted to complete the request.

When a process terminates the Process Manager performs process rundown, which includes terminating all the threads in the process, closing handles to opened system resources (e.g. files and registry keys) and tearing down the address space of the process. When the Process Manager sees a terminating thread has outstanding I/O requests it informs the drivers processing the requests that the requests should be cancelled. You can see that in the stack as the call to IopCancelAlertedRequest. Because the completion of an I/O request requires access to the address space of the owning thread’s process the system can’t finish tearing down a process until all its I/O requests have completed or cancelled. The I/O Manager has no choice but to wait indefinitely, which you can see in the stack as the call to KeWaitForSingleObject.

If you run across this type of problem in the real world you’ll need to run a kernel debugger to look at the outstanding I/O requests of any hung threads and the determine driver that owns them. If the system is hung you need to debug it from a second computer running a kernel debugger. Since the system as a whole isn’t hung when you create a hung thread with Notmyfault you can use local kernel debugging with LiveKd or, if you’re running Windows XP or higher, the Windows Debugging Tools for Windows built-in local kernel debugging. If you’ve never used a kernel debugger the easiest approach is to download the Debugging Tools for Windows and then run Livekd from the directory in which you install the tools.

The first kernel debugger command to execute is one to look at the hung process and its threads. Look at the IRP List area, which a list of outstanding I/O requests, of any threads that are listed. Here’s the command to dump hung process and partial output that includes the IRP list for the Notmyfault thread:

kd> !process 0 7 notmyfault.exe
PROCESS 8183ad18 SessionId: 0 Cid: 02dc Peb: 7ffdf000 ParentCid: 04e4
DirBase: 08b40280 ObjectTable: e107cd10 HandleCount: 23.
Image: NotMyfault.exe
VadRoot 817d8d68 Vads 44 Clone 0 Private 98. Modified 1. Locked 0.

THREAD 81810560 Cid 02dc.02e4 Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
81821d0c NotificationEvent
IRP List:
: (0006,0094) Flags: 40000000 Mdl: 00000000

The next step is to look at the IRP (I/O Request Packet) or IRPs you find:

kd> !irp 82370f68
Irp is active with 1 stacks 1 is current (= 0x82370fd8)
No Mdl Thread 81810560: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ e, 0] 5 0 8172daa8 81821cb0 00000000-00000000
*** ERROR: Module load completed but symbols could not be loaded for myfault.sys
Args: 00000000 00000000 83360020 00000000

The output reports that /Driver/Myfault, the internal name of the Myfault driver, owns the IRP and is therefore the driver that’s guilty of not completing the I/O and not responding to the system’s cancellation request. The error regarding missing symbols for myfault.sys is expected since Microsoft only stores symbols for its own drivers and components.

The reason that the Notmyfault bug does not result in logoff or shutdown hangs is that the system doesn’t care if user applications really terminate during either of those activities. As long as the TerminateProcess API returns success, which it does for such zombie processes, the system is happy. However, if Explorer or one of the core system processes gets into a zombie state the system will be effectively hung.  


查看这个参数的方法是以sys登录后执行: show parameter job; a、job_queue_processes参数决定了job作业能够使用的总进程数。 b、当该参数为0值,...
  • e_wsq
  • e_wsq
  • 2016年02月03日 10:56
  • 1156

Boost.Interprocess使用手册翻译之四:在进程间共享内存 (Sharing memory between processes)

四.在进程间共享内存 共享内存 内存映射文件 更多关于映射区域 在映射区域构建对象的限制   共享内存 什么是共享内存 创建能在进程间共...

有限马尔可夫决策过程(Finite Markov Decision Processes)



Oracle的sessions和processes的关系是 sessions=1.1*processes + 5 -----实验部分----Microsoft Windows [版本 5.2.3790...

oracle session和processes的设置

1.sessions 在初始化参数所设定的限制中,最为人所知的估计就是sessions和processes Sessions 参数指定了一个 Instance中能够同时存在的session...

The lists of TASK_RUNNING processes 1 The lists of TASK_RUNNING processes When looking for a new process to run on a CPU, the ke...

ORA-00020:maximum number of processes (150) exceeded 错误解决方法

用sqlplus有一段时间了,今天碰到了了ORA-00020:maximum number of processes (150) exceeded这个错误,导致不能连接Oracle(同时,PL/SQL...

Communication between Processes in Python

Reprint: As with threads, a common use pattern for multiple processes is to divide a job up among ...
  • yxc135
  • yxc135
  • 2013年08月21日 21:01
  • 856

android studio的Android Monitor没有连接上--No Debuggable Processes

如果android studio的Android monitor不可用,也就是想看当前运行的程序的内存和CPU占用情况时,发现没有一个进程可看(下图),显示No Debuggable Processe...

reinforcement learning,增强学习:Markov Decision Processes

所有内容来自:http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html 传统的RL的研究对象就是MDP。直接假设就是【环境是完全可观察...
  • mmc2015
  • mmc2015
  • 2016年10月18日 21:03
  • 1362
您举报文章:Unkillable Processes