在一台Windows Server 2003 SP2 x64的机器安装了一个卷数据块过滤驱动,跑了大概半个月左右,一次大数据量拷贝,导致机器蓝屏,有些纳闷:这个驱动在其他机器上都运行好好的,极少出现问题。
用windbg打开minidum文件:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 19, {20, fffffade6cf5d6e0, fffffade6cf5d750, 5070007}
GetUlongPtrFromAddress: unable to read from fffff800011db3c0
GetUlongFromAddress: unable to read from fffff800011db3e0
Probably caused by : ntkrnlmp.exe ( nt!ExFreePoolWithTag+45e )
Followup: MachineOwner
---------
13: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
BAD_POOL_HEADER (19)
The pool is already corrupt at the time of the current request.
This may or may not be due to the caller.
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.
Arguments:
Arg1: 0000000000000020, a pool block header size is corrupt.
Arg2: fffffade6cf5d6e0, The pool entry we were looking for within the page.
Arg3: fffffade6cf5d750, The next pool entry.
Arg4: 0000000005070007, (reserved)
Debugging Details:
------------------
BUGCHECK_STR: 0x19_20
POOL_ADDRESS: fffffade6cf5d6e0
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: 0
ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre
LAST_CONTROL_TRANSFER: from fffff800011ab36c to fffff8000102e890
STACK_TEXT:
fffffade`5bf3cb28 fffff800`011ab36c : 00000000`00000019 00000000`00000020 fffffade`6cf5d6e0 fffffade`6cf5d750 : nt!KeBugCheckEx
fffffade`5bf3cb30 fffff800`0131e6f4 : fffffa80`023a1740 fffffa80`023a1740 fffffa80`023a1740 fffffade`6cf5d6f0 : nt!ExFreePoolWithTag+0x45e
fffffade`5bf3cbf0 fffff800`010375ca : 00000000`00000000 fffff800`011dcee0 fffff800`0131e2f0 fffffade`70eed040 : nt!IopErrorLogThread+0x3fc
fffffade`5bf3cd00 fffff800`0124a972 : fffffade`70eed040 00000000`00000080 fffffade`70eed040 fffffade`5bae3680 : nt!ExpWorkerThread+0x13b
fffffade`5bf3cd70 fffff800`01020226 : fffffade`5badb180 fffffade`70eed040 fffffade`5bae3680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
fffffade`5bf3cdd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!ExFreePoolWithTag+45e
fffff800`011ab36c cc int 3
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: nt!ExFreePoolWithTag+45e
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
IMAGE_NAME: ntkrnlmp.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4cbdb81d
IMAGE_VERSION: 5.2.3790.4789
FAILURE_BUCKET_ID: X64_0x19_20_nt!ExFreePoolWithTag+45e
BUCKET_ID: X64_0x19_20_nt!ExFreePoolWithTag+45e
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:x64_0x19_20_nt!exfreepoolwithtag+45e
FAILURE_ID_HASH: {3400dfd1-2d81-3492-fd57-0d3e33001b86}
Followup: MachineOwner
---------
很明显,
a pool block header size is corrupt.
13: kd> dt nt!_pool_header fffffade6cf5d6e0
+0x000 PreviousSize : 0y00000111 (0x7)
+0x000 PoolIndex : 0y00000000 (0)
+0x000 BlockSize : 0y00000111 (0x7)
+0x000 PoolType : 0y00000101 (0x5)
+0x000 Ulong1 : 0x5070007
+0x004 PoolTag : 0x72456f49
+0x008 ProcessBilled : 0xfffffade`6ec79b00 _EPROCESS
+0x008 AllocatorBackTraceIndex : 0x9b00
+0x00a PoolTagHash : 0x6ec7
13: kd> !pool fffffade6cf5d6e0
Pool page fffffade6cf5d6e0 region is Unknown
fffffade6cf5d000 size: 640 previous size: 0 (Allocated) TCPB
fffffade6cf5d640 size: 30 previous size: 640 (Allocated) ReEv
fffffade6cf5d670 size: 70 previous size: 30 (Allocated) CMpa Process: fffffade6d9a88b0
*fffffade6cf5d6e0 size: 70 previous size: 70 (Allocated) *IoEr
Pooltag IoEr : Io error log packets, Binary : nt!io
fffffade6cf5d750 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...
GetUlongFromAddress: unable to read from fffff800011cf218
Unable to get pool big page table. Check for valid symbols.
fffffade6cf5d750 is not valid pool. Checking for freed (or corrupt) pool
Bad previous allocation size @fffffade6cf5d750, last size was 7
在地址 0xfffffade6cf5d750 的地方
_pool_header
被破坏,看看上一 pool_entry
0xfffffade6cf5d6e0
的内容:
13: kd> db fffffade6cf5d6e0 L80
fffffade`6cf5d6e0 07 00 07 05 49 6f 45 72-00 9b c7 6e de fa ff ff ....IoEr...n....
fffffade`6cf5d6f0 0b 00 60 00 00 00 00 00-f0 d6 1d 01 00 f8 ff ff ..`.............
fffffade`6cf5d700 f0 d6 1d 01 00 f8 ff ff-40 50 15 6f de fa ff ff ........@P.o....
fffffade`6cf5d710 80 52 2e 70 de fa ff ff-a2 0c e8 ec a3 e6 cf 01 .R.p............
fffffade`6cf5d720 00 00 00 00 01 00 28 00-00 00 00 00 07 00 05 80 ......(.........
fffffade`6cf5d730 e2 03 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
fffffade`6cf5d740 00 00 00 00 00 00 00 00-4d 00 65 00 6d 00 6f 00 ........M.e.m.o.
fffffade`6cf5d750 72 00 79 00 20 00 4f 00-76 00 65 00 72 00 20 00 r.y. .O.v.e.r. .
本应该是下一块 _pool_header 的内存区域 0x
fffffade`6cf5d750
被覆写了,内容还是有意义的,
看的不明显,在多一点:
13: kd> db fffffade6cf5d6e0 L112
fffffade`6cf5d6e0 07 00 07 05 49 6f 45 72-00 9b c7 6e de fa ff ff ....IoEr...n....
fffffade`6cf5d6f0 0b 00 60 00 00 00 00 00-f0 d6 1d 01 00 f8 ff ff ..`.............
fffffade`6cf5d700 f0 d6 1d 01 00 f8 ff ff-40 50 15 6f de fa ff ff ........@P.o....
fffffade`6cf5d710 80 52 2e 70 de fa ff ff-a2 0c e8 ec a3 e6 cf 01 .R.p............
fffffade`6cf5d720 00 00 00 00 01 00 28 00-00 00 00 00 07 00 05 80 ......(.........
fffffade`6cf5d730 e2 03 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
fffffade`6cf5d740 00 00 00 00 00 00 00 00-4d 00 65 00 6d 00 6f 00 ........M.e.m.o.
fffffade`6cf5d750 72 00 79 00 20 00 4f 00-76 00 65 00 72 00 20 00 r.y. .O.v.e.r. .
fffffade`6cf5d760 4c 00 69 00 6d 00 69 00-74 00 2c 00 20 00 4c 00 L.i.m.i.t.,. .L.
fffffade`6cf5d770 69 00 6d 00 69 00 74 00-20 00 69 00 73 00 20 00 i.m.i.t. .i.s. .
fffffade`6cf5d780 3a 00 20 00 28 00 31 00-30 00 37 00 33 00 37 00 :. .(.1.0.7.3.7.
fffffade`6cf5d790 34 00 31 00 38 00 32 00-34 00 29 00 2c 00 20 00 4.1.8.2.4.).,. .
fffffade`6cf5d7a0 43 00 75 00 72 00 72 00-65 00 6e 00 74 00 20 00 C.u.r.r.e.n.t. .
fffffade`6cf5d7b0 55 00 73 00 61 00 67 00-65 00 3a 00 20 00 28 00 U.s.a.g.e.:. .(.
fffffade`6cf5d7c0 38 00 35 00 38 00 39 00-39 00 36 00 32 00 32 00 8.5.8.9.9.6.2.2.
fffffade`6cf5d7d0 34 00 29 00 00 00 00 00-00 00 00 00 00 00 00 00 4.).............
fffffade`6cf5d7e0 40 2d 4a 70 de fa ff ff-00 10 00 40 63 96 2a df @-Jp.......@c.*.
fffffade`6cf5d7f0 60 0f
额,貌似是写日志的时内存越界,出现
覆写。
查找驱动源码:
len1 = ErrorString ? (wcslen(ErrorString) + 1)*sizeof(WCHAR) : 0;
len = len1 + FIELD_OFFSET(IO_ERROR_LOG_PACKET, DumpData);
len = max(len, sizeof(IO_ERROR_LOG_PACKET));
if ( len < ERROR_LOG_MAXIMUM_SIZE ) {
pIoErrorLogPacket = (PIO_ERROR_LOG_PACKET) IoAllocateErrorLogEntry((PVOID)pDeviceObject, len);
if ( NULL != pIoErrorLogPacket ) {
RtlZeroMemory(pIoErrorLogPacket, sizeof(IO_ERROR_LOG_PACKET));
pIoErrorLogPacket->RetryCount = 0; //tried once
pIoErrorLogPacket->ErrorCode = ErrorCode;
pIoErrorLogPacket->UniqueErrorValue = UniqueErrorValue;
pIoErrorLogPacket->FinalStatus = FinalStatus;
pIoErrorLogPacket->DumpDataSize = 0;
if ( len1 > 0 ) {
pIoErrorLogPacket->NumberOfStrings = 1;
pIoErrorLogPacket->StringOffset = FIELD_OFFSET(IO_ERROR_LOG_PACKET, DumpData);
RtlCopyMemory((PWSTR)((PCHAR)pIoErrorLogPacket + pIoErrorLogPacket->StringOffset), ErrorString, len1);
}
//queues a given error log packet to the system error logging thread
IoWriteErrorLogEntry(pIoErrorLogPacket);
}
}
貌似没什么问题。Waitting...
PVOID IoAllocateErrorLogEntry(
_In_ PVOID IoObject, _In_ UCHAR EntrySize );
EntrySize [in]
Specifies the size, in bytes, of the error log entry to be allocated. This value cannot exceed ERROR_LOG_MAXIMUM_SIZE.
Warning EntrySize is a UCHAR value. If you specify a larger value, the compiler will silently truncate that value to a (wrong) UCHAR.
应该就是这里的问题了,增加判断条件:
// thoroughly check the value of len to prevent buffer underflows/overflows
if ((len < ERROR_LOG_MAXIMUM_SIZE) && (len >= len1) && (len > 0) && (len <= 255)) {
问题解决。