Unkillable Processes

转载 2007年09月14日 03:27:00
Have you ever terminated an application only to see in your favorite task manager (Process Explorer, of course) that the process still exists? Or have you tried logging out or shutting down only to have the logoff or shutdown stall indefinitely for no apparent reason? These scenarios are usually the result of buggy device drivers that don’t properly handle the cancellation of outstanding I/O requests.

Over the last few years I’ve developed a tool called Notmyfault that demonstrates a number of common device driver bugs, including accessing freed memory, overrunning buffers, and leaking memory. The crashes generated by Notmyfault are featured in the crash analysis chapter of Windows Internals book I coauthored with Dave Solomon. I’ve recently added a new error selection, Hang Irp, in order to show the effects of drivers that don’t cancel I/O requests.



When you run Notmyfault and select the Hang Irp bug Notmyfault sends an I/O request into its helper driver, Myfault.sys, that Myfault.sys never completes. The names of the executable and driver reinforce the fact that user-mode code can never directly cause a Windows crash: Notmyfault relies on the Myfault driver to do the dirty work. The Notmyfault thread that issues the request never continues executing because it ends up stuck in the kernel waiting for the I/O request to complete. However, because Notmyfault issues the request from a second thread the UI remains responsive and you can issue other bugs, more hanging IRPs, or try to terminate the process.

Terminating Notmyfault reveals the effect of a hung IRP. Even after you close the Notmyfault window the Notmyfault process still shows in Process Explorer’s process list. Logging off and back in, even into a different account, does not cause the zombied process to exit. So what’s going on under the hood? If you’ve configured Process Explorer to take advantage of Microsoft’s symbol support (steps for doing so are documented in Process Explorer’s help file) you can view the stack of the hung thread by double-clicking on the Notmyfault process, navigating to the resulting Process Properties dialog’s Threads tab, and double-clicking on the thread:



A stack reflects a history of subroutine invocation and reads top to bottom from most to least recent. The stack above indicates that Notmyfault called DeviceIoControlFile, which called ZwDeviceIoControlFile. ZwDeviceIoControlFile transitioned into kernel-mode (the frames that are prefixed with “ntkrnlpa.exe”) where the kernel’s system call dispatcher executed NtDeviceIoControlFile. Since the I/O request was synchronous the I/O manager waits for the driver at which the I/O is targeted to complete the request.

When a process terminates the Process Manager performs process rundown, which includes terminating all the threads in the process, closing handles to opened system resources (e.g. files and registry keys) and tearing down the address space of the process. When the Process Manager sees a terminating thread has outstanding I/O requests it informs the drivers processing the requests that the requests should be cancelled. You can see that in the stack as the call to IopCancelAlertedRequest. Because the completion of an I/O request requires access to the address space of the owning thread’s process the system can’t finish tearing down a process until all its I/O requests have completed or cancelled. The I/O Manager has no choice but to wait indefinitely, which you can see in the stack as the call to KeWaitForSingleObject.

If you run across this type of problem in the real world you’ll need to run a kernel debugger to look at the outstanding I/O requests of any hung threads and the determine driver that owns them. If the system is hung you need to debug it from a second computer running a kernel debugger. Since the system as a whole isn’t hung when you create a hung thread with Notmyfault you can use local kernel debugging with LiveKd or, if you’re running Windows XP or higher, the Windows Debugging Tools for Windows built-in local kernel debugging. If you’ve never used a kernel debugger the easiest approach is to download the Debugging Tools for Windows and then run Livekd from the directory in which you install the tools.

The first kernel debugger command to execute is one to look at the hung process and its threads. Look at the IRP List area, which a list of outstanding I/O requests, of any threads that are listed. Here’s the command to dump hung process and partial output that includes the IRP list for the Notmyfault thread:

kd> !process 0 7 notmyfault.exe
PROCESS 8183ad18 SessionId: 0 Cid: 02dc Peb: 7ffdf000 ParentCid: 04e4
DirBase: 08b40280 ObjectTable: e107cd10 HandleCount: 23.
Image: NotMyfault.exe
VadRoot 817d8d68 Vads 44 Clone 0 Private 98. Modified 1. Locked 0.

THREAD 81810560 Cid 02dc.02e4 Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
81821d0c NotificationEvent
IRP List:
82370f68
: (0006,0094) Flags: 40000000 Mdl: 00000000


The next step is to look at the IRP (I/O Request Packet) or IRPs you find:

kd> !irp 82370f68
Irp is active with 1 stacks 1 is current (= 0x82370fd8)
No Mdl Thread 81810560: Irp stack trace.
cmd flg cl Device File Completion-Context
>[ e, 0] 5 0 8172daa8 81821cb0 00000000-00000000
*** ERROR: Module load completed but symbols could not be loaded for myfault.sys
/Driver/MYFAULT
Args: 00000000 00000000 83360020 00000000


The output reports that /Driver/Myfault, the internal name of the Myfault driver, owns the IRP and is therefore the driver that’s guilty of not completing the I/O and not responding to the system’s cancellation request. The error regarding missing symbols for myfault.sys is expected since Microsoft only stores symbols for its own drivers and components.

The reason that the Notmyfault bug does not result in logoff or shutdown hangs is that the system doesn’t care if user applications really terminate during either of those activities. As long as the TerminateProcess API returns success, which it does for such zombie processes, the system is happy. However, if Explorer or one of the core system processes gets into a zombie state the system will be effectively hung.  

解决ORA-00020:maximum number of processes (150) exceeded 错误

问题原因:对于数据库的进程数超过最大值,只有资源打开之后,没有及时的关闭,或者说资源打开之后不正常的关闭界面造成数据库中的process处于inactive,没有释放,需要修改数据库的进程数据。 解...
  • u012372584
  • u012372584
  • 2016年12月04日 10:24
  • 1619

ORACLE修改processes和sessions参数

SQL> show parameter processes; SQL>show parameter sessions ; SQL> alter system set processes=300 s...
  • u014677702
  • u014677702
  • 2016年11月24日 13:16
  • 6165

android studio no debuggable processes

解决办法Tools->Android->勾选上Enable ADB Integration->重新部署App
  • u010575303
  • u010575303
  • 2017年03月30日 10:20
  • 4506

android studio 出现NoDebuggable processes

1.如果出现NOdebuggable proxesses
  • huangjiamingboke
  • huangjiamingboke
  • 2017年07月09日 13:41
  • 734

linux多线程环境下的抢尸行为(system返回-1:No child processes)

故事背景:    我们这边开发了一个动态库给客户用,动态库里面会调用system来做insmod/rmmod模块的事情。这些模块都是我们这边提供给客户的。拿到客户那边去测试,会随机性的出现system...
  • Joseph_1118
  • Joseph_1118
  • 2014年01月01日 21:27
  • 2948

oracle 修改session 和processes

1、查看当前这两个参数的值 sqlplus / nolog sql>conn / as sysdba sql>select count(*) from v$session;  从这里可以看出当前的se...
  • xtdhqdhq
  • xtdhqdhq
  • 2014年12月31日 10:15
  • 3723

zabbix报警Too many processes on zabbix server

zabbix大量报警,运行进程过多,但实际有部分机器可以忽略,需要关闭相关的报警 Configuration-->Templates找到Template_Linux点该行的 Triggers...
  • reblue520
  • reblue520
  • 2016年09月19日 16:49
  • 5701

[oracle10g]dbca创建数据库时processes参数的最小取值

dbca新建数据库时processes这个参数的默认值是150,一般没有会蛋疼关心其最小取值,只是看到《2 day dba》和dbca中说最小值似乎是6,因为它包括了数据库实例的几个必须启动的后台进程...
  • joyeu
  • joyeu
  • 2013年08月26日 15:14
  • 1223

mysql的limit max user processes

现象描述:
  • yanggd1987
  • yanggd1987
  • 2014年05月08日 16:29
  • 809

linux修改max user processes limits

具体: 最近新上了一批服务器,内核升级到了2.6.32版本,部署完MySQL实例后上到线上,直接负载冲到15,cpu使用达到700%。 01:20:01 PM   runq-sz  plist-s...
  • bbaiggey
  • bbaiggey
  • 2016年03月29日 12:24
  • 4227
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Unkillable Processes
举报原因:
原因补充:

(最多只允许输入30个字)