Using the Windows Debugger: Some Handy/Fun/Clever Debugger Commands

最新推荐文章于 2021-05-30 00:13:17 发布

iiprogram

最新推荐文章于 2021-05-30 00:13:17 发布

阅读量2.9k

点赞数

分类专栏：病毒汇编和调试逆向技术加脱壳文章标签： windows c pointers command reference timer

病毒汇编和调试逆向技术加脱壳专栏收录该内容

506 篇文章 18 订阅

订阅专栏

Christmas time is here, by golly. (Disapproval would be folly; apologies to Tom Lehrer.) I was planning to do a tutorial on assembly language and call stacks this time, but it seems like the wrong time of year to ask anyone to absorb such a thing. So instead I'm going to fill up some space... er, fill up your stockings with a much easier read: a roundup of some very handy but perhaps non-obvious debugger commands and techniques.

The Windows Debugger is an incredibly rich environment, and is constantly becoming more so, as the tools group keeps adding features. The reference section alone of the debugger help, if printed, would take over 600 pages of paper! (I determined this by "printing" it to a PDF and then examining the PDF with the viewer; I didn't waste the trees! Plenty of that going on this time of year without my help.) That's a lot of functionality. Most developers don't have a fraction of the time it would take to learn the debugger thoroughly; most of us started out by looking for commands that would do things similar to what we were used to in debuggers on previous systems. And with our first and with each new debugging problem we generally learn just about enough to solve the problem at hand. So we all tend to develop skills that include different subsets of the debugger's capabilities. It's uncommon to look over another developer's shoulder during a debugging session and not learn at least a little something new. Accordingly, here's a look over my shoulder. Chances are at least one thing here will be something valuable for you to take back to work with you next year.

Debugger help not considered harmful

Speaking of those 600+ pages... this isn't intended to be a replacement for them. With each command I talk about, I strongly encourage you to go to the debugger help and find out what else it can do. You'd be amazed how many options and variations and subtle behaviors there are on even the d (display memory) command. Did you know, for example, that it's no longer necessary to use !dd (an extension command) to display memory by physical address? dd /p will do it now. That's "p" for "physical," of course. I'd been using !dd for years before I happened to notice dd /p in the help. Too bad we can't 'diff' the various versions of debugger help against each other, the way we can with DDK header files.

Help me whitewash this fence!

By the way, if you have a favorite command or trick you'd like to share, feel free to email me! I'm going to try to include at least one of these tidbits as a part of each article in this series from now on. And of course I'll give full credit to the contributor.

Oops

Before we get started, here's a correction to the previous article. I was talking about the debugger's display of the stack when looking at a memory dump, specifically this one:

kd> kv
ChildEBP RetAddr Args to Child
f645e62c fcc494f6 80465402 00000030 00000030 nt!KiTrap0E+0x27c (FPO: [0,0] TrapFrame @ f645e62c)
f645e69c f7095181 004c0407 f645e74c f645e748 NDIS!NdisQueryBufferOffset+0x8 (FPO: [3,0,0])
f645e770 f70950c7 81381008 813090b8 813b0800 el90xbc5!SendPacket+0x61 (FPO: [Non-Fpo])
f645e794 fcc604d9 81381008 f645e7b4 00000001 el90xbc5!NICSendPackets+0xa5 (FPO: [Non-Fpo])
f645e808 fcc536a0 813090b8 813b0890 813090e0 NDIS!ndisMStartSendPackets+0x1fb (FPO: [Non-Fpo])
f645e82c fcc60643 812d1e48 813090b8 00000000 NDIS!ndisMProcessDeferred+0x37 (FPO: [Non-Fpo])
[...]

and in trying to explain an apparent anomaly I wrote:

The stack doesn't show a call to KeBugCheckEx, by the way, because the stack only shows the current instruction pointer value and calls from routines, not calls into routines... and in this stack, KeBugCheckEx hasn't called anything.

The puzzle here is that it's KeBugCheckEx (or inner routines called therefrom) that actually writes the memory dump, so why doesn't the stack show that our current location is within KeBugCheckEx? Well, the above is not the answer. It is true that the stack shows the current instruction pointer (the symbolized value shown in blue here) and calls from routines, that is, the places to which those calls will return; and that's a bit of answer that my brain pulled out of the file and dropped into this spot. It is not, though, the answer to this puzzle.

The location KiTrap0E+0x27c actually is a "call site," that is, the instruction immediately following the call to KeBugCheckEx:

kd> r
eax=ffdff13c ebx=00000000 ecx=00000000 edx=40000000 esi=fcc494f6 edi=004c041f
eip=80464b1f esp=f645e618 ebp=f645e62c iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
nt!KiTrap0E+0x27c:
80464b1f f7457000000200 test dword ptr [ebp+0x70],0x20000 ss:0010:f645e69c=00010202

Now look at the disassembly: the instruction at 80464b1f is AFTER the call to KeBugCheckEx:

kd> u nt!KiTrap0E+0x27c-e
nt!KiTrap0E+26f:
80464b12 7568             jnz nt!KiTrap0E+0x2d9 (80464b7c)
80464b14 56               push esi
80464b15 51               push ecx
80464b16 50               push eax
80464b17 57               push edi
80464b18 6a0a             push 0xa
80464b1a e8f573fcff       call nt!KeBugCheckEx (8042bf14)
80464b1f f7457000000200   test dword ptr [ebp+0x70],0x20000
kd>

So it looks like we just returned from KeBugCheckEx when the current state of the machine was recorded in the dump file. In other words, the debugger doesn't show KeBugCheckEx on the stack because we're not in KeBugCheckEx at the time things were recorded!

But that can't be correct! After all, it's KeBugCheckEx (or something it calls) that writes the dump file! How can we not be in KeBugCheckEx according to the information in the dump file?

The answer, as very kindly pointed out to me by Jake Oshins of Microsoft, is simply that there's a problem in the symbol information for KiTrap0E, and possibly in the code that writes the dump file. Symbol information and in particular FPO (frame pointer optimization) information is difficult to get right in routines that use the stack in nonstandard ways, which KiTrap0E and KeBugCheckEx and its friends certainly do.

So there you have it. My apologies for the mistake. Presumably this will be fixed in a later version of the symbol files. On to our debugger commands!

dds

No, I'm not talking about your dentist! Our first command means "display words and symbols." If you want something a bit more mnemonic, you can think of it as "display DWORDs and symbols," or perhaps "display data and symbols." Note that there are also dqs (display quadwords and symbols) and dps (display pointers and symbols); the former is of course useful in 64-bit environments, the latter picks the correct size for a pointer in the target machine's architecture.

Probably the most common use of dds is as an aid to finding a "lost stack." The simplest cause of a lost stack is that the code being examined has used the base pointer register, ebp, as something other than a "frame pointer."

If you read the first column in this series you might recall an example of this. When we first opened the memory dump, the stack looked like this:

kd> kv
ChildEBP RetAddr  Args to Child 
f08535f4 8045261e f085361c 8045cd0b f0853624 nt!PspUnhandledExceptionInSystemThread+0x18 (FPO: [1,0,0])
f0853ddc 80467122 80416bf2 00000001 00000000 nt!PspSystemThreadStartup+0x5e (FPO: [Non-Fpo])
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

kd>

We did our "setting the register context" procedure:

kd> dd f085361c L 2
f085361c f0853aa4 f08536fc

kd> .exr f0853aa4
ExceptionAddress: f07a973b (vicamusb+0x0000173b)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 0000000c
Attempt to read from address 0000000c

kd> .cxr f08536fc
eax=00000000 ebx=802ac6c8 ecx=802ac610 edx=00000000 esi=802ac6c8 edi=802ac610
eip=f07a973b esp=f0853b6c ebp=00000000 iopl=0         nv up ei pl zr na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000             efl=00210246
vicamusb+173b:
f07a973b 39450c           cmp     [ebp+0xc],eax     ss:0010:0000000c=????????

kd>

Then we tried to display the stack, which we hope would show the point of the exception and the sequence of calls leading up to it:

0: kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child 
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 vicamusb+0x173b

Oops. What happened? Well, the debugger happens to use the ebp register to begin walking the stack, and as we see after we do the .cxr command, the ebp register contains zeroes! This is not necessarily a bug in the code -- it simply means that the code being debugged has used this register for something other than as the "base pointer" or "frame pointer."

The debugger's display stack command (k) can take arguments that tell it to use a particular ebp value. But what ebp value to use? Fortunately we do have what looks like a reasonable stack pointer value in esp, and so we can look in memory "around" that address and see what looks like call frames and thereby the address of the topmost call frame on the stack. The debugger can take it from there.

This is made about a thousand times easier by the dds command. Here we ask it to display memory as DWORDs, starting at the address in esp, for length 60 (hex) DWORDs:

kd> dds esp L 60
f0853b6c 802ac610
f0853b70 802ac6c8
f0853b74 f0853bc4
f0853b78 843b3488
f0853b7c 00000000
f0853b80 00000000
f0853b84 f07a87e4 vicamusb+0x7e4
f0853b88 802ac610
f0853b8c 802ac610
f0853b90 843b3488
f0853b94 843b3488
f0853b98 00000000
f0853b9c f0853dcc
f0853ba0 00000190
f0853ba4 802ac610
f0853ba8 843b3488
f0853bac 843b3488
f0853bb0 f0853c04
f0853bb4 00000000
f0853bb8 802ac6c8
f0853bbc 00000000
f0853bc0 8139faf0
f0853bc4 f0853c04
f0853bc8 8041d637 nt!IopfCallDriver+0x35
f0853bcc 802ac610
f0853bd0 843b3564
f0853bd4 802ac610
f0853bd8 f0853c48
f0853bdc 804bb505 nt!IopSynchronousCall+0xca
f0853be0 8139faf0
f0853be4 8139faf0
f0853be8 00000002
f0853bec 00040001
f0853bf0 00000000
f0853bf4 f0853bf4
f0853bf8 f0853bf4
f0853bfc c00000bb
f0853c00 00000000
f0853c04 f0853c4c
f0853c08 804bb72e nt!IopRemoveDevice+0x86
f0853c0c 802ac610
f0853c10 f0853c24
f0853c14 f0853c54
f0853c18 8139faf0
f0853c1c e244d86c
f0853c20 8139fcc8
f0853c24 0000021b
f0853c28 00000000
f0853c2c 00000000
f0853c30 00000000
f0853c34 00000000
f0853c38 00000000
f0853c3c 00000000
f0853c40 00000000
f0853c44 00000000
f0853c48 00853c74
f0853c4c f0853c74
f0853c50 804bf10c nt!IopDeleteLockedDeviceNode+0x24c
f0853c54 8139faf0
f0853c58 00000002
f0853c5c e239c408
f0853c60 8139fcc8
f0853c64 00000000
f0853c68 00000001
f0853c6c 8139faf0
f0853c70 01853cb0
f0853c74 f0853cb0
f0853c78 804bf316 nt!IopDeleteLockedDeviceNodes+0xb0
f0853c7c 802ac5f8
f0853c80 e214f9e8
f0853c84 e239c408
f0853c88 e244d868
f0853c8c 0000000a
f0853c90 8139fcc8
f0853c94 e2380848
f0853c98 00000000
f0853c9c 00000000
f0853ca0 80000000
f0853ca4 00000002
f0853ca8 8139faf0
f0853cac 02380848
f0853cb0 f0853d3c
f0853cb4 80508fb0 nt!PiProcessQueryRemoveAndEject+0x7a4
f0853cb8 8139faf0
[...]

Now, a call frame consists of (at minimum) a saved ebp value followed by a saved eip value -- the latter being the point to which the called procedure will return. The saved eip values in the above are obvious: they're the ones with procedure names and offsets attached to them. The saved ebp values are in the DWORDs immediately preceding the saved eip values. The ebp values on the stack furthermore form a pointer chain: Each ebp value on the stack is the address where a previous saved ebp is stored, and so on. We've highlighted the ebp values in magenta above.

If that went a bit over your head, don't worry about it... we'll cover stacks and base pointers and all that in a later article. The point for now is that dds is a very handy tool for finding your way around a chunk of memory that includes pointers to routines, or any other locations that are near to symbolized locations. The stack is a prime example.

But it doesn't have to be a stack!

Here's some debugger interaction from another crash. We're about to do our "set the register context" ritual, having already found the trap frame address on the stack as it's first displayed:

kd> .trap f143bb2c
ErrCode = 00000000
eax=fd261930 ebx=fd261968 ecx=fd71e4f8 edx=fd75c708 esi=00000098 edi=fd2619a0
eip=00000000 esp=f143bba0 ebp=f143bbc0 iopl=0         nv up ei ng nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000             efl=00010282
00000000 ??               ???

kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
f143bb9c f64dbea0 fd261930 f143bcf0 fd76a008 0x0
f143bbc0 f64dbff3 fd75c708 fd71e4f8 00000014 NDIS!ndisMTransferData+0x79 (FPO: [Non-Fpo])
f143bbe0 f12fa8eb f143bc10 fd77b8e8 fd71e4f8 NDIS!NdisTransferData+0x1b (FPO: [Non-Fpo])
WARNING: Stack unwind information not available. Following frames may be wrong.
f143bc04 f64dbea0 fd261930 f143bcf0 fd758000 cpqteam+0x28eb
f143bc28 f6250a0a fd6bf608 fd71e4f8 00000014 NDIS!ndisMTransferData+0x79 (FPO: [Non-Fpo])
f143bc48 f62421f9 fd6bf708 fd71e4f8 00000000 tcpip!ARPXferData+0x24 (FPO: [Non-Fpo])
f143bcf8 f6221049 fd7db688 fe588232 fd6b5008 tcpip!IPRcvPacket+0x39f (FPO: [Non-Fpo])
[...]

kd>

The problem here seems to be that eip, the instruction pointer, contains zero. Where'd the zero come from? This crash seems to involve the NDIS environment, and we know that NDIS lives on structures containing pointers to routines. Can we find the structure and see if there's a zero in it? Let's look at the code for the next entry on the stack:

kd> u ndisMTransferData+79
NDIS!ndisMTransferData+79:
f64dbea0 5e               pop     esi                         ; this is what the saved EIP on the stack points to
f64dbea1 5d               pop     ebp
f64dbea2 c21800           ret     0x18
f64dbea5 b8010000c0       mov     eax,0xc0000001
f64dbeaa e9b4e4ffff       jmp     NDIS!ndisMWanSend+0x134 (f64da363)
f64dbeaf 8d4e30           lea     ecx,[esi+0x30]
f64dbeb2 ff1580664cf6     call    dword ptr [NDIS!_imp_ (f64c6680)]
f64dbeb8 8845ff           mov     [ebp-0x1],al

Well, ndisMTransferData+79 is merely where the outstanding call will return to. We really want to look a little earlier in memory than that, at what led up to that call. This sometimes involves a bit of “fishing:”

kd> u ndisMTransferData+70
NDIS!ndisMTransferData+70:
f64dbe97 7210             jb      NDIS!ndisMWanSend+0x19 (f64dbea9) ; bogus! (see below)
f64dbe99 ff751c           push    dword ptr [ebp+0x1c]
f64dbe9c 50               push    eax
f64dbe9d ff5234           call    dword ptr [edx+0x34]
f64dbea0 5e               pop     esi                         ; this is what the saved eip on the stack points to
f64dbea1 5d               pop     ebp
f64dbea2 c21800           ret     0x18
f64dbea5 b8010000c0       mov     eax,0xc0000001

That’s better – we can see the call now. The problem is that the call at f64dbe9d is apparently trying to call address zero. >From the disassembly we see that it is actually finding the address to be called from an area in memory – the edx register points to the beginning of the area, and we are looking for the pointer to the routine to be called 0x34 bytes from the beginning of that area. And, since the call didn’t actually do anything (other than crash the machine), the value the debugger sees in the edx register – obtained from the trap frame when we issue the .trap command – must be the same value that edx had at the time of the attempted call. So we should be able to look in memory, exactly as the call did, and see what’s at that location:

kd> dd edx+34
fd75c73c 00000000 f63d44ac f12f8be8 f12f8d3c
fd75c74c f12f89b4 f12f8b06 f12f8b55 f12f8c38
fd75c75c f12f8d22 f12f8be4 f12f8a16 00000002
fd75c76c 00000000 00000000 00000000 00000000
fd75c77c 00000000 08018004 69436d4d e14b3008
fd75c78c 00000000 00000000 00000000 00000004
fd75c79c 00000001 00000006 00000001 090000a0
fd75c7ac fd1cf108 00000000 00000000 00000080

Sure enough, there’s a zero there. The NDIS port driver finds the routines in its miniport driver via pointers to routines. Apparently one of those pointers was never initialized when the miniport driver was loaded (or, perhaps, this particular pointer in the table was corrupted later).

To confirm that we are looking at the right area of memory, we can use the dds command to format this area and interpret its contents in symbolic terms:

kd> dds edx+34
fd75c73c 00000000
fd75c740 f63d44ac n100nt5!MiniportSendPackets
fd75c744 f12f8be8 cpqteam+0xbe8
fd75c748 f12f8d3c cpqteam+0xd3c
fd75c74c f12f89b4 cpqteam+0x9b4
fd75c750 f12f8b06 cpqteam+0xb06
fd75c754 f12f8b55 cpqteam+0xb55
[...]

Sure enough, this appears to be a table of pointers to routines. But the one at edx+34 hasn't been filled in.

It's worth pointing out that simply seeing a zero in eip does not necessarily mean that the next outer routine tried to call address zero. It might have performed a perfectly legitimate call and then the called routine might have branched to zero. The stack would look the same, and we'd have found a valid pointer to a routine at edx+0x34. Then we could have looked at that routine to see how its execution reached a jump to zero, or simply put a breakpoint on that routine the next time we tested it.

"Display Type." You might know about this already, but perhaps you don't know everything it can do. A lot of what it can do is as a result of symbol file changes in XP and later.

Unlike dd and its many close relatives, dt uses type information from the symbol files known to the debugger to display memory with appropriate type formatting. The simplest way to show this is by example. Suppose we have the address of a device object -- we can use our old friend, the !devobj extension, to display it:

kd> !devobj 89f92040
Device object (89f92040) is for:
0000005d /Driver/xyzspud DriverObject 8a164cc8
Current Irp 00000000 RefCount 0 Type 0000001d Flags 00003050
Dacl e1698b94 DevExt 89f920f8 DevObjExt 89f92110 DevNode 89f9c008
ExtensionFlags (0000000000)
AttachedDevice (Upper) 8a096040 /Driver/gameenum
Device queue is not busy.

kd>

...and that's fine, but we know there's more to a device object than that. It happens that the type name for device object structures is _DEVICE_OBJECT. (The leading underscore is important; type names created with typedef won't work with dt .) Try this:

kd> dt _DEVICE_OBJECT 89f92040
   +0x000 Type             : 3
   +0x002 Size             : 0xcc
   +0x004 ReferenceCount   : 0
   +0x008 DriverObject     : 0x8a164cc8
   +0x00c NextDevice       : 0x89fa0040
   +0x010 AttachedDevice   : 0x8a096040
   +0x014 CurrentIrp       : (null)
   +0x018 Timer            : (null)
   +0x01c Flags            : 0x3050
   +0x020 Characteristics : 0x80
   +0x024 Vpb              : (null)
   +0x028 DeviceExtension : 0x89f920f8
   +0x02c DeviceType       : 0x1d
   +0x030 StackSize        : 1 ''
   +0x034 Queue            : __unnamed
   +0x05c AlignmentRequirement : 0
   +0x060 DeviceQueue      : _KDEVICE_QUEUE
   +0x074 Dpc              : _KDPC
   +0x094 ActiveThreadCount : 0
   +0x098 SecurityDescriptor : 0xe1698b80
   +0x09c DeviceLock       : _KEVENT
   +0x0ac SectorSize       : 0
   +0x0ae Spare1           : 1
   +0x0b0 DeviceObjectExtension : 0x89f92110
   +0x0b4 Reserved         : (null)

kd>

That's every field in a driver object. If you want to really go crazy you can add the -r option; this causes the command to recursively dump all subtypes, following all pointers and expanding the things they point to:

kd> dt -r _DEVICE_OBJECT 89f92040
   +0x000 Type             : 3
   +0x002 Size             : 0xcc
   +0x004 ReferenceCount   : 0
   +0x008 DriverObject     : 0x8a164cc8
      +0x000 Type             : 4
      +0x002 Size             : 168
      +0x004 DeviceObject     : 0x89f92040
         +0x000 Type             : 3
         +0x002 Size             : 0xcc
         +0x004 ReferenceCount   : 0
         +0x008 DriverObject     : 0x8a164cc8
         +0x00c NextDevice       : 0x89fa0040
         +0x010 AttachedDevice   : 0x8a096040
         +0x014 CurrentIrp       : (null)
         +0x018 Timer            : (null)
         +0x01c Flags            : 0x3050
[...]

How does it know? For example, the DriverObject member of a device object is defined as being a pointer to _DRIVER_OBJECT -- and of course the debugger has the type information for a _DRIVER_OBJECT as well. It's also clever (as you can see above) about not endlessly following loops, as, for example, when the driver object points back to the original device object. This also does the right thing for doubly-linked lists; an empty list is simply shown as the list head, whereas a non-empty list wll be "walked" and its members all displayed, once.

If you omit the address, dt simply shows the structure definition:

kd> dt _IRP
   +0x000 Type             : Int2B
   +0x002 Size             : Uint2B
   +0x004 MdlAddress       : Ptr32 _MDL
   +0x008 Flags            : Uint4B
   +0x00c AssociatedIrp    : __unnamed
   +0x010 ThreadListEntry : _LIST_ENTRY
   +0x018 IoStatus         : _IO_STATUS_BLOCK
   +0x020 RequestorMode    : Char
   +0x021 PendingReturned : UChar
   +0x022 StackCount       : Char
   +0x023 CurrentLocation : Char
   +0x024 Cancel           : UChar
   +0x025 CancelIrql       : UChar
   +0x026 ApcEnvironment   : Char
   +0x027 AllocationFlags : UChar
   +0x028 UserIosb         : Ptr32 _IO_STATUS_BLOCK
   +0x02c UserEvent        : Ptr32 _KEVENT
   +0x030 Overlay          : __unnamed
   +0x038 CancelRoutine    : Ptr32
   +0x03c UserBuffer       : Ptr32 Void
   +0x040 Tail             : __unnamed

kd>

And of course, all this fun stuff works fine on type names you defined for your own structures. _YOUR_DEVICE_EXTENSION, for example. As long as the debugger can see symbol files for your driver.

What makes dt particularly fun is that the public symbol files for Windows XP and later include type information for many common system data structures. Including many that are not defined in DDK header files! For example, our familiar friend the !process command shows what the debugger wants to tell us about a process:

1: kd> !process -1 1
PROCESS 88ccd3c0 SessionId: 0 Cid: 07a0    Peb: 7ffdf000 ParentCid: 02b8
    DirBase: 3ad3f000 ObjectTable: e2f944f8 HandleCount: 76.
    Image: rundll32.exe
  VadRoot 88c8e9c8 Vads 88 Clone 0 Private 428. Modified 11. Locked 0.
    DeviceMap e2816808
    Token                             e263fbc0
    ElapsedTime                       2:22:11.0780
    UserTime                          0:00:00.0578
    KernelTime                        0:00:02.0968
    QuotaPoolUsage[PagedPool]         32468
    QuotaPoolUsage[NonPagedPool]      3520
    Working Set Sizes (now,min,max) (1299, 50, 345) (5196KB, 200KB, 1380KB)
    PeakWorkingSetSize                1313
    VirtualSize                       35 Mb
    PeakVirtualSize                   35 Mb
    PageFaultCount                    1558
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      876

But dt can show us more. All we have to know is that the "process address" is actually the address of two overlaid data structures, the _KPROCESS and the _EPROCESS:

1: kd> dt _KPROCESS 88ccd3c0
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 ProfileListHead : _LIST_ENTRY [ 0x88ccd3d0 - 0x88ccd3d0 ]
   +0x018 DirectoryTableBase : [2] 0x3ad3f000
   +0x020 LdtDescriptor    : _KGDTENTRY
   +0x028 Int21Descriptor : _KIDTENTRY
   +0x030 IopmOffset       : 0x20ac
   +0x032 Iopl             : 0 ''
   +0x033 Unused           : 0 ''
   +0x034 ActiveProcessors : 2
   +0x038 KernelTime       : 0xbe
   +0x03c UserTime         : 0x25
   +0x040 ReadyListHead    : _LIST_ENTRY [ 0x88ccd400 - 0x88ccd400 ]
   +0x048 SwapListEntry    : _SINGLE_LIST_ENTRY
   +0x04c VdmTrapcHandler : (null)
   +0x050 ThreadListHead   : _LIST_ENTRY [ 0x88cad1e0 - 0x888be508 ]
   +0x058 ProcessLock      : 0
   +0x05c Affinity         : 3
   +0x060 StackCount       : 3
   +0x062 BasePriority     : 8 ''
   +0x063 ThreadQuantum    : 6 ''
   +0x064 AutoAlignment    : 0 ''
   +0x065 State            : 0 ''
   +0x066 ThreadSeed       : 0x1 ''
   +0x067 DisableBoost     : 0 ''
   +0x068 PowerState       : 0 ''
   +0x069 DisableQuantum   : 0 ''
   +0x06a IdealNode        : 0 ''
   +0x06b Spare            : 0 ''

The _EPROCESS is much larger; I'll leave it to your imagination.

IRPs and MDLs have public type information too: Just use the type names _IRP and _MDL. And if you want to know what type names of this format the debugger's current symbol files know about from ntoskrnl.exe, try dt nt!_* .

!timer

!timer shows the system's timer tree in a nicely formatted display. It shows all "live" timer objects in the system (that is, those actually on the tree), when they will come due or "fire," and what they will do at that time.

I've learned to use !timer whenever I see any timer-related routines on the stack, other than KiTimerExpiration during live debugging. That's an exception because whenever you "break in" to the system in a kernel mode debugging session, you end up in a routine called from KiTimerExpiration. You can use the !timer command if you want to there, but it's unlikely to show you anything useful.

In the case I'll use for an example, though, I was looking at a memory dump, a case of bugcheck A, IRQL_NOT_LESS_OR_EQUAL. Setting the register context resulted in the following:

0: kd> .trap f241ff44
ErrCode = 00000002
eax=80482a80 ebx=e14c9618 ecx=8136afe0 edx=80482a80 esi=8136afe0 edi=00000010
eip=80431bc1 esp=f241ffb8 ebp=f241ffdc iopl=0         nv up ei ng nz ac po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000             efl=00010297
nt!KiTimerExpiration+91:
80431bc1 894304           mov     [ebx+0x4],eax     ds:0023:e14c961c=8136afe0

0: kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
f241ffdc 80464a18 80482e60 00000000 0052498f nt!KiTimerExpiration+0x91 (FPO: [Non-Fpo])
f241fff4 80469e3b ba2e4d44 00000000 00000000 nt!KiRetireDpcList+0x47 (FPO: [0,1,0])

It appears that the instruction at address 80431bc1 tried to reference address e14c961c, and this reference failed, resulting in a page fault or access violation.

Here we see KiTimerExpiration on the stack, and we didn't just break into the system in a kernel debugging session; so it's reasonable to look at the timer tree for anything unusual:

0: kd> !timer
Dump system timers

Interrupt time: 303748ec 000000c4 [ 1/ 7/2002 18:39:52.383 (Pacific Daylight Time)]

List Timer    Interrupt Low/High     Fire Time             DPC/thread
2 ba1b9ae0   33a96f00 000000c4 [ 1/ 7/2002 18:39:58.164] srv!ScavengerTimerRoutine
6 81299628   31509e68 000000c4 [ 1/ 7/2002 18:39:54.227] thread 81299540
7 812a3c00   6fb19e69 7e3e68d6 [ 9/13/30828 18:48:04.249] Msfs!MsReadTimeoutHandler
9 83478d08   e2f6f676 000000c4 [ 1/ 7/2002 18:44:52.274] thread 83478c20
    81ea05d0   3d925176 000000c9 [ 1/ 7/2002 19:16:02.274] netbt!TimerExpiry
13 804832c0   44741ede 000000c4 [ 1/ 7/2002 18:40:26.336] nt!MiModifiedPageWriterTimerDispatch
14 ffb85828   d1213f38 000000c4 [ 1/ 7/2002 18:44:22.352] thread ffb85740
    fe2c4d88   d1213f38 000000c4 [ 1/ 7/2002 18:44:22.352] thread fe2c4ca0
15 ffbc9108   5195d292 000000c4 [ 1/ 7/2002 18:40:48.367] thread ffbc9020
    810e14a8   e3054492 000000c4 [ 1/ 7/2002 18:44:52.367] thread 810e13c0
16 e14c9600 P 3037c2fe 000000c4 [ 1/ 7/2002 18:39:52.386] xyzwdm+12e00
   Timer at e14c9600 has wrong Blink! (Blink 8136afe0, should be 80482a80)
    bd9687e0   3038864e 000000c4 [ 1/ 7/2002 18:39:52.391] rdbss!RxTimerDispatch
    826dbe88   4218ebec 000000c4 [ 1/ 7/2002 18:40:22.383] thread 826dbda0
    8115e9c8   d12603ec 000000c4 [ 1/ 7/2002 18:44:22.383] thread 8115e8e0
    811feba8   d12603ec 000000c4 [ 1/ 7/2002 18:44:22.383] thread 811feac0

    ff2f3108   d12603ec 000000c4 [ 1/ 7/2002 18:44:22.383] thread ff2f3020
17 81213608   30397a72 000000c4 [ 1/ 7/2002 18:39:52.397] thread 81213520
    fe19e5c8   33cd3246 000000c4 [ 1/ 7/2002 18:39:58.399] thread fe19e4e0
19 81122a08   303e6ffa 000000c4 [ 1/ 7/2002 18:39:52.430] thread 81122920
    bda80c70 P 303f641e 000000c4 [ 1/ 7/2002 18:39:52.436] tcpip!TCBTimeoutdpc
    bda80cc8 P 303f641e 000000c4 [ 1/ 7/2002 18:39:52.436] tcpip!TCBTimeoutdpc
[...]

!timer does report something unusual, a "wrong Blink" in the tree structure. It turns out that this is not the cause of the crash; it's simply a result of the fact that KiTimerExpiration was working on that particular _KTIMER object at the time of the crash, and was in the process of removing it from the tree. So of course the backward links don't match up. We have some confidence that KiTimerExpiration was looking at this structure because the address being referenced at the time of the failure looks like it's probably inside that timer object.

Come to think of it, we can even use dt to make sure:

0: kd> dt _KTIMER e14c9600
   +0x000 Header           : _DISPATCHER_HEADER
   +0x010 DueTime          : _ULARGE_INTEGER 0xc4`3037c2fe
   +0x018 TimerListEntry   : _LIST_ENTRY [ 0xbd9687f8 - 0x8136afe0 ]
   +0x020 Dpc              : 0xe14c958c
   +0x024 Period           : 32

Sure enough, the code was referencing address e14c961c, and that's in the _KTIMER -- in the backward link field, in fact.

What is important here is the fact that this _KTIMER is pointing to a DPC routine defined by a particular driver: the xyzwdm driver. (Names were changed to protect the guilty.) Whatever is wrong with this _KTIMER that is causing KiTimerExpiration to throw up its hands and die, xyzwdm.sys is a strong candidate.

!pool

It turns out that _KTIMER objects are most often allocated from nonpaged pool. so let's use !pool to see what's happening there. You want to use !pool any time you are dealing with an address that might be part of the pool. Unlike !timer, though, this one takes an argument:

0: kd> !pool e14c961c
e14c9000 size:   60 previous size:    0 (Allocated) Sect (Protected)
e14c9060 size:   20 previous size:   60 (Free)       ....
e14c9080 size:   20 previous size:   20 (Allocated) ObDi
e14c90a0 size:   60 previous size:   20 (Allocated) Ntfc
e14c9100 size: 100 previous size:   60 (Allocated) ObNm
e14c9200 size:   60 previous size: 100 (Allocated) CMkb (Protected)
e14c9260 size:   60 previous size:   60 (Allocated) ObNm
e14c92c0 size: 100 previous size:   60 (Allocated) Ppio
e14c93c0 size:   60 previous size: 100 (Allocated) SeSd
e14c9420 size:   80 previous size:   60 (Allocated) NtfC
e14c94a0 size:   20 previous size:   80 (Free)       Ppre
e14c94c0 size:   20 previous size:   20 (Allocated) ObNm
e14c94e0 size:   20 previous size:   20 (Allocated) ObDi
*e14c9500 size: 180 previous size:   20 (Allocated) *PcNw
e14c9680 size:   60 previous size: 180 (Allocated) CMDa
e14c96e0 size:   60 previous size:   60 (Allocated) CMDa
[...]

The asterisk in the left margin indicates which extent of pool includes the address we asked about. The cryptic four-character identifiers in the right column are "pool tag values." You can look them up in the text file /Program files/Debugging Tools for Windows/triage/pooltag.txt. For example, for "PcNw" it says "WDM audio stuff." This confirms what we thought before, since we happen to know that the xyzwdm.sys driver is an audio driver.

On Windows XP the !pool command actually looks up the tag value and displays the corresponding information from pooltag.txt. The display also indicates whether the region is paged or nonpaged pool. We'll see that that's an important bit of information.

!pte

That's PTE for "page table entry." This command takes a virtual address and looks up, formats, and displays the page table entries corresponding to the address. We showed an example of its use last time, to determine why a particular memory reference had generated a crash when it looked like a reasonable address. In that case, though, we already had a sign that the address was somehow inaccessible, because the debugger couldn't display the memory around the suspect address. It showed all question marks. Here we're going to show you a similar, but much more interesting, case where !pte saved the day. Or at least made it a lot more pleasant.

We're continuing to look at the same memory dump as in the previous section. Bugcheck A, IRQL_NOT_LESS_OR_EQUAL. Here again are the results of setting the register context:

Once again, it appears that the instruction at address 80431bc1 tried to reference address e14c961c, and this reference failed, resulting in a page fault or access violation.

Now the puzzling thing is that the debugger seems to have no problems accessing that address -- the debugger says its contents are 8136afe0. In fact, it can access more:

0: kd> dd e14c9610
e14c9610 3037c2fe 000000c4 bd9687f8 8136afe0
e14c9620 e14c958c 00000032 81f30508 00011510
e14c9630 00000000 00000101 00000001 00000000
e14c9640 00000000 0000bb80 00000000 00000000
[...]

So why couldn't the code access it, if the debugger can? Usually, if we try to reference a bad address, the debugger displays question marks for the contents, as for example in:

0: kd> dd 0
00000000 ???????? ???????? ???????? ????????
00000010 ???????? ???????? ???????? ????????
00000020 ???????? ???????? ???????? ????????
00000030 ???????? ???????? ???????? ????????
[...]

As you might expect, the !pte command will tell us:

0: kd> !pte e14c961c
E14C961C - PDE at C0300E14        PTE at C0385324
          contains 02BE2963      contains 0414A882
        pfn 2be2 G-DA--KWV       not valid
                               Transition:   414a
                               Protect: 4

We were perhaps expecting to see something more subtle, namely the absence of the “W” (writeable) bit in either the PDE (page directory entry) or the PTE (page table entry). Instead, this is telling us that the virtual page containing the address in question is not valid in address translation terms: The valid bit in either the PTE or PDE – in this case the PTE – is not set. That doesn’t mean that it’s a completely wrong address as, for example, address 0 would be; it simply means that any reference to any address within the page will result in a page fault. Incidently, the !analyze –v output for this memory dump indicated that we were at IRQL 0x1C, or 28 decimal, at the time of the crash. Page faults are of course not allowed at IRQL 2 (known as IRQL DISPATCH_LEVEL) or above. The xyzwdm.sys driver probably is using paged pool for its timer objects, rather than nonpaged as it should.

But now we have to ask another question: If this page is "paged out," how can the debugger display its contents? (See the output of the dd e14c9610 command.) The answer has to do with the more complex aspects of virtual memory: "Paged out" doesn't necessarily mean "out of RAM." It simply means "out of the process working set." In this case the contents of the page were being held, not in the paging file or other disk file, but in physical page number 414a. Note that the PTE also says "Transition." This is a state of a page, indicating that it's on either the standby or modified page list.

Had the address e14c9610 been referenced at IRQL 1 or 0, the pager would have simply resolved the page fault from the standby or modified list, putting the referenced physical page back into the current working set. Unfortunately this reference happened at an IRQL 2 or above; much higher above in fact. The pager can't resolve page faults at such a time, due to possible serialization errors.

But the debugger, when looking at a memory dump, isn't executing the operating system and doesn't have to worry about serialization. It isn't really running at any particular IRQL, either. So if you ask it to display a virtual address, and that virtual address's contents are in a physical page that's "in transition," the debugger will happily mimic the pager and show you the contents derived from the page on the standby or modified list. Very handy for some environments, but misleading in others. The !pte command will bring you back to reality, telling you that a page fault would have to be resolved to access that page.

A lot of us do wish the debugger would flag the output of a "successful but only because the debugger resolved a pagefault d command in some way.

One other point here: This little debugger trick of displaying "paged out" memory just as if it were paged in, can only happen when we're looking at a full memory dump. A kernel-only dump doesn't include the RAM that's on the standby or modified lists. Or the free or zeroed page lists for that matter.

Ok, one more: _KTIMER objects are usually allocated from nonpaged pool. It appears that the xyzwdm driver allocated this structure from paged pool instead. This did not normally cause a problem because the object usually is referenced often enough, and the RAM on modern systems is plentiful enough, that the _KTIMER stays in the system's working set. One day, though, I happened to be running a memory-intensive app, and the system crashed with the symptoms described above. On a normal system, this bug would only occur at times of extreme memory pressure. Had the driver's developers used Driver Verifier with the "force IRQL checking" option, this latent bug would have been found before the driver was ever shipped.

Conclusion

Thank you for your attention. Enjoy the holidays!

About the author:

Jamie Hanrahan has been writing device drivers for Windows NT and its successors since the 1992 Windows NT Driver Developer's Conference. His consulting practice operates under the name Kernel Mode Systems. He is also a partner, with Brian Catlin, in Azius Developer Training, developing and presenting seminars in kernel mode development, debugging, and troubleshooting.