关于NT内核cancel irp的问题

最新推荐文章于 2022-10-29 13:00:00 发布

iiprogram

最新推荐文章于 2022-10-29 13:00:00 发布

阅读量1.5k

点赞数

分类专栏： windows底层核心編程文章标签： ddk extension null 文档 thread io

本文链接：https://blog.csdn.net/iiprogram/article/details/2599894

版权

windows底层核心編程专栏收录该内容

743 篇文章 18 订阅

订阅专栏

 
 
  
  NT内核中ＩＲＰ的cancel是一个复杂的问题，很容易出错导致系统崩溃，ddk中的文档其实对这部分说的很详细，只是需要认真体会，osr网站上以前在NT insider杂志中有过２篇文章研究这个问题，总结这些资料，写个贴子罐水如下：
  
  
１．为什么要取消irp？
ddk文档中说的很清楚：＂Any driver in which IRPs can be held in a pending state for an indefinite interval must have one or more Cancel routines. For example, a keyboard driver might wait indefinitely for a user to press a key.＂就是说如果你的驱动对于一些irp可能很长时间得不到完成，需要pending相当长的时间那么你就需要写个cancel例程在适合的时候把这些一直pending的irp取消．最常见的一种情况是：issue（触发）这些irp的线程终止了，而这些irp还没来得及完成，这时候就需要取消这些未完成的irp．
２．NT内核如何cancel irp?
根据nt insider的说法，NT IO manager取消irp的过程分成三步．当一个线程要终止了，内核会调用NtTerminateThread（）这个native API处理终止线程的事情.NtTerminateThread（）会检查该thread的ETHREAD结构中有一个IrpList域，这个是一个list，连接着所有该线程触发的未完成irp．通过遍历这个链表，对每个未完成的irp调用IoCancelIrp()检查有没有注册cancel例程，如果有运行之，这样就完成了三步曲的第一步．
接着开始cancel irp的第二步：这一步会执行一个等待，等待线程的irplist变成空，这说明所有未完成的irp都被各自的cancel routine处理完了（取消了，并从这个irplist链表中remove掉了），但如果有的驱动程序cancel routine出了问题，不能取消irp，则可能导致线程的irplist永远也不会为空．为了避免这种情况导致的死循环，NT内核做了个限定，如果等待了５分钟这个irplist还没有变空，则认为某些驱动出错了，这种情况下，NT会执行三步曲的第３步．
第３步内核会强行将这些未完成的irp从线程的irplist上remove掉，然后尽可能的释放掉线程占用的资源，最终让这个线程终止，但是这些未完成的irp却不会被释放，它们保存在内存种，这样会导致系统内存资源的减少，这些被irp站用的内存永远不会被释放．
３．什么时候需要编写cancel routine？
（１）ddk document说的很清楚：＂if a driver will never queue more IRPs than it can complete in five minutes, it probably does not need a Cancel routine. ＂．就是说如果一个驱动能保证它的irp总是能及时完成，或者说在短时间内（比较几秒）总是会完成，那么根本不需要编写cancel routine．
（２）即使需要编写cancel routine，也不需要弄很复杂的算法，不需要这个cancel routine的效率有多高，弄个简单的cancel routine足够了，原因是：发生这种必须要取消irp的可能性是很少的，不值得为了这偶尔发生的事情花那么大气力去写一个高效的cancel routine
ddk document中给出了一个指导原则：
＂The highest-level driver in a chain of layered drivers must have at least one Cancel routine if it queues IRPs or otherwise holds IRPs in a cancelable state. It can have more than one Cancel routine, if necessary. 

Lower-level drivers in which IRPs can be held in a cancelable state for relatively long intervals also should have one or more Cancel routines. 

If a driver manages its own internal queues of IRPs, it should have a separate Cancel routine for each of its queues. ＂
４．编写cancel routine的原则
ddk document种说的很清楚，一定遵循：＂pending---IoSetCancelRoutine--Queue＂的顺序．你首先要在分派例程中pending一个irp，因为只有pending的irp才有可能需要取消，如果你总是能在分派例程中完成irp，那么就就不需要取消它．pending irp 的方法在＂关于NT内核irp pending的注意事项＂中说了，必须调用IoMarkIrpPending()后返回STATUS_PENDING．
pending 后再调用IoSetCancelRoutine设置一个取消例程(cancel routine)．再把它插入队列中（系统全局队列或者你自己的私有队列），这个过程一定要hold　spin lock（或者是global cancel spinlock,或者是你自己定义的保护私有irp队列的spinlock），否则会导致竞争，系统会崩溃．
示例：
NTSTATUS
CancelReadWrite(
    IN PDEVICE_OBJECT DeviceObject,
    IN PIRP Irp
    )

{
    PDEVICE_EXTENSION devExtension = (PDEVICE_EXTENSION) DeviceObject->DeviceExtension;
    KIRQL irql;

    //
    // mark the irp pending NOW before we queue this IRP
    //

    IoMarkIrpPending(Irp);

    //
    // serialize all driver activity for this device object
    //

    KeAcquireSpinLock(&devExtension->lock, &irql); 

    //
    // set the cancel routine
    // 

    IoSetCancelRoutine(Irp, CancelCancel);

    //
    // queue the IRP
    //

    InsertTailList(&devExtension->irpList, &Irp->Tail.Overlay.ListEntry);

    //
    // release the spinlock
    //

    KeReleaseSpinLock(&devExtension->lock, irql);

    //
    // always return status pending -
    // note that we might very well have already completed,
    // or cancelled this IRP before we get here.
    //

    return STATUS_PENDING;

}
在你的取消例程中完成取消irp的操作，比如把irp从队列中remove，并调用IoCompleteRequest()完成它，这里需要返回STATUS_CANCELLED．示例：
VOID CancelCancel ( 
    IN PDEVICE_OBJECT DeviceObject, 
    IN PIRP Irp 
    )

{
    PDEVICE_EXTENSION devExtension = DeviceObject->DeviceExtension;
    PLIST_ENTRY nextEl = NULL;
    PIRP cancelIrp = NULL;
    KIRQL irql;
    KIRQL cancelIrql = Irp->CancelIrql;

    //
    // release the cancel spinlock now
    //

    IoReleaseCancelSpinLock(cancelIrql);

    //
    // A thread has terminated and we should find a
    // cancelled Irp in our queue and complete it
    // 

    KeAcquireSpinLock(&devExtension->lock, &irql);

    // 
    // search our queue for an Irp to cancel
    //

    for (nextEl = devExtension->irpList.Flink; 
        nextEl != &devExtension->irpList; ) 

    {

        cancelIrp = CONTAINING_RECORD(nextEl, IRP, Tail.Overlay.ListEntry);
        nextEl = nextEl->Flink;
        if (cancelIrp->Cancel) {

            //
            // dequeue THIS irp
            //

            RemoveEntryList(&cancelIrp->Tail.Overlay.ListEntry);

            //
            // and stop right here
            //
            break;

        }
        cancelIrp = NULL;

    } 
   KeReleaseSpinLock(&devExtension->lock, irql); 

    //
    // now if we found an irp to cancel, cancel it
    //

    if (cancelIrp) {

        //
        // this is our IRP to cancel
        // 

        cancelIrp->IoStatus.Status = STATUS_CANCELLED;
        cancelIrp->IoStatus.Information = 0;
        IoCompleteRequest(cancelIrp, IO_NO_INCREMENT);

    } 

    //
    // we are done.
    //

}

最后一点：如果你pending的irp最后被成功完成了，则你不再需要取消它了，那么你一定在完成它的时候调用IoSetCancelRoutine(Irp, NULL);清除你指定取消例程．示例：
//
// always remove cancel routine 
// or we bugcheck in free build, assert in checked
//

(void) IoSetCancelRoutine(Irp, NULL);

//
// indicate we are finished
//

Irp->IoStatus.Status = STATUS_SUCCESS;

//
// no actual data transfer.
//

Irp->IoStatus.Information = 0;

//
// complete the request
//

IoCompleteRequest(Irp, IO_NO_INCREMENT);

这通常在完成例程中做这件事．
５．对于使用system cancel spinlock来说，有良种情况，为了防止race condition，在设置取消例程的时候，要hold spinlock．ddk document说：＂If a device driver has a StartIo routine, its dispatch routines can register a Cancel routine by supplying its address as input to IoStartPacket.

If a driver does not have a StartIo routine, its dispatch routines must do the following before queuing an IRP for further processing by other driver routines:

Call IoAcquireCancelSpinLock. 
Call IoSetCancelRoutine with the input IRP and the entry point for a driver-supplied Cancel routine. 
Call IoReleaseCancelSpinLock. ＂
很清楚了．

iiprogram

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
关于NT内核cancel irp的问题

NT内核中ＩＲＰ的cancel是一个复杂的问题，很容易出错导致系统崩溃，ddk中的文档其实对这部分说的很详细，只是需要认真体会，osr网站上以前在NT insider杂志中有过２篇文章研究这个问题，总结这些资料，写个贴子罐水如下：１．为什么要取消irp？ddk文档中说的很清楚：＂Any driver in which IRPs can be held in a pending state
复制链接

扫一扫