当然 ,while 循环结束也可能是因为 1453 行的这两个判断 . 首先 req 如果没有了 , 另一个得看 scsi_dev_queue_ready() 的返回值 , 如果返回值为 0, 那么 break 也会被执行 , 从而结束循环 .
1270 /*
1271 * s csi_dev_queue_ready: if we can send requests to sdev, return 1 else
1272 * return 0.
1273 *
1274 * Called with the queue_lock held.
1275 */
1276 static inline int scsi_dev_queue_ready(struct request_queue *q,
1277 struct scsi_device *sdev)
1278 {
1279 if (sdev->device_busy >= sdev->queue_depth)
1280 return 0;
1281 if (sdev->device_busy == 0 && sdev->device_blocked) {
1282 /*
1283 * unblock after device_blocked iterates to zero
1284 */
1285 if (--sdev->device_blocked == 0) {
1286 SCSI_LOG_MLQUEUE(3,
1287 sdev_printk(KERN_INFO, sdev,
1288 "unblocking device at zero depth/n"));
1289 } else {
1290 blk_plug_device(q);
1291 return 0;
1292 }
1293 }
1294 if (sdev->device_blocked)
1295 return 0;
1296
1297 return 1;
1298 }
这里需要判断的是 device_busy. 这个 flag 如果设置了 , 说明命令正在执行中 , 或者说命令已经传递到了底层驱动 . 因此 , 我们在调用 scsi_dispatch_cmd 之前先增加 device_busy, 即 1469 行 .
另一个 flag 是 device_blocked. 这个 flag 是告诉世人这个设备不能再接收新的命令了 , 因为它十有八九是正在处理命令 . 正常情况下这个 flag 的值为 0. 除非你调用了 scsi_queue_insert() 函数 . 友情提示一下 ,scsi 设备的这个 flag 是提供了 sysfs 的接口的 , 因此我们可以通过 sysfs 的接口看一下设备的这个值 , 下面列举了两个 scsi 设备的这个变量的值 , 可以看到都是 0, 应该说这是它的常态 .
[root@localhost ~]# ls /sys/bus/scsi/devices/
0:0:8:0/ 0:2:0:0/ 1:0:0:0/ 2:0:0:0/
[root@localhost ~]# ls /sys/bus/scsi/devices/2/:0/:0/:0/
block:sdb/ iocounterbits modalias rev subsystem/ bus/ iodone_cnt model scsi_device:2:0:0:0/ timeout delete ioerr_cnt queue_depth scsi_disk:2:0:0:0/ type device_blocked iorequest_cnt queue_type scsi_level uevent driver/ max_sectors rescan state vendor
[root@localhost ~]# cat /sys/bus/scsi/devices/2/:0/:0/:0/device_blocked
0
[root@localhost ~]# cat /sys/bus/scsi/devices/0/:0/:8/:0/device_blocked
0
所以正常情况下 ,scsi_dev_queue_ready() 函数的返回值就是 1, 这一点正如其注释里说的那样 . 但是所谓的常态 , 指的是单独执行一个命令 , 如果要执行多个命令 , 或者说我们提交了多个 request, 那么 device_busy 就会一次次的在 1469 行增加 , 从而使得 device_busy 有可能将超过 queue_depth, 这样子 scsi_dev_queue_ready() 就会返回 0, 从而 scsi_request_fn() 就有可能结束 , 这之后 ,__generic_unplug_device 也将返回 , 之后 blk_execute_rq_nowait() 返回 , 回到 blk_execute_rq() 中 , 执行 wait_for_completion(), 于是就睡眠了 , 等待了 , 按照游戏规则 , 我们应该能找到一条 complete() 语句来唤醒它 , 那么这条语句在哪里呢 ? 答案是 blk_end_sync_rq.
网友 ” 宁失身不失眠 ” 非常好奇我是怎么知道的 . 说来话长 , 还记得我们当时在 usb-storage 中说的那个 scsi_done 么 ? 命令执行完了就会 call scsi_done. 而 scsi_done 来自 drivers/scsi/scsi.c, 很显然这个函数是我们的突破口 , 我们找到了这个函数就好比国民党找到了甫志高 , 就好比王佳芝找到了易先生 :
608 /**
609 * scsi_done - Enqueue the finished SCSI command into the done queue.
610 * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
611 * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
612 *
613 * This function is the mid-level's (SCSI Core) interrupt routine, which
614 * regains ownership of the SCSI command (de facto) from a LLDD, and enqueues
615 * the command to the done queue for further processing.
616 *
617 * This is the producer of the done queue who enqueues at the tail.
618 *
619 * This function is interrupt context safe.
620 */
621 static void scsi_done(struct scsi_cmnd *cmd)
622 {
623 /*
624 * We don't have to worry about this one timing out any more.
625 * If we are unable to remove the timer, then the command
626 * has already timed out. In which case, we have no choice but to
627 * let the timeout function run, as we have no idea where in fact
628 * that function could really be. It might be on another processor,
629 * etc, etc.
630 */
631 if (!scsi_delete_timer(cmd))
632 return;
633 __scsi_done(cmd);
634 }
躲躲闪闪的是来自同一文件的 __scsi_done,
636 /* Private entry to scsi_done() to complete a command when the timer
637 * isn't running --- used by scsi_times_out */
638 void __scsi_done(struct scsi_cmnd *cmd)
639 {
640 struct request *rq = cmd->request;
641
642 /*
643 * Set the serial numbers back to zero
644 */
645 cmd->serial_number = 0;
646
647 atomic_inc(&cmd->device->iodone_cnt);
648 if (cmd->result)
649 atomic_inc(&cmd->device->ioerr_cnt);
650
651 BUG_ON(!rq);
652
653 /*
654 * The uptodate/nbytes values don't matter, as we allow partial
655 * completes and thus will check this in the softirq callback
656 */
657 rq->completion_data = cmd;
658 blk_complete_request(rq);
659 }
别的我们都不关心 , 就关心最后这个 blk_complete_request().
3588 /**
3589 * blk_complete_request - end I/O on a request
3590 * @req: the request being processed
3591 *
3592 * Description:
3593 * Ends all I/O on a request. It does not handle partial completions,
3594 * unless the driver actually implements this in its completion callback
3595 * through requeueing. Theh actual completion happens out-of-order,
3596 * through a softirq handler. The user must have registered a completion
3597 * callback through blk_queue_softirq_done().
3598 **/
3599
3600 void blk_complete_request(struct request *req)
3601 {
3602 struct list_head *cpu_list;
3603 unsigned long flags;
3604
3605 BUG_ON(!req->q->softirq_done_fn);
3606
3607 local_irq_save(flags);
3608
3609 cpu_list = &__get_cpu_var(blk_cpu_done);
3610 list_add_tail(&req->donelist, cpu_list);
3611 raise_softirq_irqoff(BLOCK_SOFTIRQ);
3612
3613 local_irq_restore(flags);
3614 }
其它的咱们不管 , 就管一管这个 raise_softirq_irqoff(). 在很久很久以前 , 有一个函数 , 它的名字叫做 blk_dev_init(). 它是我们这个故事的起源 . 在这个函数中我们曾经见过这么一行 ,
3720 open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
当时咱们就说过 , 它所做的就是初始化了一个 softirq, 即 BLOCK_SOFTIRQ. 并且绑定了 softirq 函数 blk_done_softirq, 而要触发这个软中断 , 咱们当时也说了 , 只要调用 raise_softirq_irqoff() 即可 . 所以现在我们也就这样做了 . 这也就意味着 ,blk_done_softirq 会被调用 .
3542 /*
3543 * splice the completion data to a local structure and hand off to
3544 * process_completion_queue() to complete the requests
3545 */
3546 static void blk_done_softirq(struct softirq_action *h)
3547 {
3548 struct list_head *cpu_list, local_list;
3549
3550 local_irq_disable();
3551 cpu_list = &__get_cpu_var(blk_cpu_done);
3552 list_replace_init(cpu_list, &local_list);
3553 local_irq_enable();
3554
3555 while (!list_empty(&local_list)) {
3556 struct request *rq = list_entry(local_list.next, struct request, donelist);
3557
3558 list_del_init(&rq->donelist);
3559 rq->q->softirq_done_fn(rq);
3560 }
3561 }
而这个 softirq_done_fn 是什么呢 ? 不要说你不知道 , 其实我们也讲过 . 不过忘记了也不要紧 , 人最大的烦恼便是记忆太好 , 健忘的人容易快乐 . 在 scsi_alloc_queue 中 , 我们调用 blk_queue_softirq_done 把 scsi_softirq_done 赋给了 q->softirq_done_fn, 所以到了这里 , 被调用的就是 scsi_softirq_done.
1376 static void scsi_softirq_done(struct request *rq)
1377 {
1378 struct scsi_cmnd *cmd = rq->completion_data;
1379 unsigned long wait_for = (cmd->allowed + 1) * cmd->timeout_per_command;
1380 int disposition;
1381
1382 INIT_LIST_HEAD(&cmd->eh_entry);
1383
1384 disposition = scsi_decide_disposition(cmd);
1385 if (disposition != SUCCESS &&
1386 time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) {
1387 sdev_printk(KERN_ERR, cmd->device,
1388 "timing out command, waited %lus/n",
1389 wait_for/HZ);
1390 disposition = SUCCESS;
1391 }
1392
1393 scsi_log_completion(cmd, disposition);
1394
1395 switch (disposition) {
1396 case SUCCESS:
1397 scsi_finish_command(cmd);
1398 break;
1399 case NEEDS_RETRY:
1400 scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);
1401 break;
1402 case ADD_TO_MLQUEUE:
1403 scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
1404 break;
1405 default:
1406 if (!scsi_eh_scmd_add(cmd, 0))
1407 scsi_finish_command(cmd);
1408 }
1409 }
不用我多说 , 你也知道 ,scsi_softirq_done 会调用 scsi_finish_command, 来自 drivers/scsi/scsi.c:
661 /*
662 * Function: scsi_finish_command
663 *
664 * Purpose: Pass command off to upper layer for finishing of I/O
665 * request, waking processes that are waiting on results,
666 * etc.
667 */
668 void scsi_finish_command(struct scsi_cmnd *cmd)
669 {
670 struct scsi_device *sdev = cmd->device;
671 struct Scsi_Host *shost = sdev->host;
672
673 scsi_device_unbusy(sdev);
674
675 /*
676 * Clear the flags which say that the device/host is no longer
677 * capable of accepting new commands. These are set in scsi_queue.c
678 * for both the queue full condition on a device, and for a
679 * host full condition on the host.
680 *
681 * XXX(hch): What about locking?
682 */
683 shost->host_blocked = 0;
684 sdev->device_blocked = 0;
685
686 /*
687 * If we have valid sense information, then some kind of recovery
688 * must have taken place. Make a note of this.
689 */
690 if (SCSI_SENSE_VALID(cmd))
691 cmd->result |= (DRIVER_SENSE << 24);
692
693 SCSI_LOG_MLCOMPLETE(4, sdev_printk(KERN_INFO, sdev,
694 "Notifying upper driver of completion "
695 "(result %x)/n", cmd->result));
696
697 cmd->done(cmd);
698 }
也就是说 ,cmd->done 会被调用 , 从而真正的幕后工作者 scsi_blk_pc_done 会被调用 . 因为 , 当初在 scsi_setup_blk_pc_cmnd() 中有这么一行 ,
1135 cmd->done = scsi_blk_pc_done;
而 scsi_blk_pc_done 来自 drivers/scsi/scsi_lib.c:
1078 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
1079 {
1080 BUG_ON(!blk_pc_request(cmd->request));
1081 /*
1082 * This will complete the whole command with uptodate=1 so
1083 * as far as the block layer is concerned the command completed
1084 * successfully. Since this is a REQ_BLOCK_PC command the
1085 * caller should check the request's errors value
1086 */
1087 scsi_io_completion(cmd, cmd->request_bufflen);
1088 }
来自 drivers/scsi/scsi_lib.c:
789 /*
790 * Function: scsi_io_completion()
791 *
792 * Purpose: Completion processing for block device I/O requests.
793 *
794 * Arguments: cmd - command that is finished.
795 *
796 * Lock status: Assumed that no lock is held upon entry.
797 *
798 * Returns: Nothing
799 *
800 * Notes: This function is matched in terms of capabilities to
801 * the function that created the scatter-gather list.
802 * In other words, if there are no bounce buffers
803 * (the normal case for most drivers), we don't need
804 * the logic to deal with cleaning up afterwards.
805 *
806 * We must do one of several things here:
807 *
808 * a) Call scsi_end_request. This will finish off the
809 * specified number of sectors. If we are done, the
810 * command block will be released, and the queue
811 * function will be goosed. If we are not done, then
812 * scsi_end_request will directly goose the queue.
813 *
814 * b) We can just use scsi_requeue_command() here. This would
815 * be used if we just wanted to retry, for example.
816 */
817 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
818 {
819 int result = cmd->result;
820 int this_count = cmd->request_bufflen;
821 request_queue_t *q = cmd->device->request_queue;
822 struct request *req = cmd->request;
823 int clear_errors = 1;
824 struct scsi_sense_hdr sshdr;
825 int sense_valid = 0;
826 int sense_deferred = 0;
827
828 scsi_release_buffers(cmd);
829
830 if (result) {
831 sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
832 if (sense_valid)
833 sense_deferred = scsi_sense_is_deferred(&sshdr);
834 }
835
836 if (blk_pc_request(req)) { /* SG_IO ioctl from block level */
837 req->errors = result;
838 if (result) {
839 clear_errors = 0;
840 if (sense_valid && req->sense) {
841 /*
842 * SG_IO wants current and deferred errors
843 */
844 int len = 8 + cmd->sense_buffer[7];
845
846 if (len > SCSI_SENSE_BUFFERSIZE)
847 len = SCSI_SENSE_BUFFERSIZE;
848 memcpy(req->sense, cmd->sense_buffer, len);
849 req->sense_len = len;
850 }
851 }
852 req->data_len = cmd->resid;
853 }
854
855 /*
856 * Next deal with any sectors which we were able to correctly
857 * handle.
858 */
859 SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
860 "%d bytes done./n",
861 req->nr_sectors, good_bytes));
862 SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d/n", cmd->use_sg));
863
864 if (clear_errors)
865 req->errors = 0;
866
867 /* A number of bytes were successfully read. If there
868 * are leftovers and there is some kind of error
869 * (result != 0), retry the rest.
870 */
871 if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
872 return;
873
874 /* good_bytes = 0, or (inclusive) there were leftovers and
875 * result = 0, so scsi_end_request couldn't retry.
876 */
877 if (sense_valid && !sense_deferred) {
878 switch (sshdr.sense_key) {
879 case UNIT_ATTENTION:
880 if (cmd->device->removable) {
881 /* Detected disc change. Set a bit
882 * and quietly refuse further access.
883 */
884 cmd->device->changed = 1;
885 scsi_end_request(cmd, 0, this_count, 1);
886 return;
887 } else {
888 /* Must have been a power glitch, or a
889 * bus reset. Could not have been a
890 * media change, so we just retry the
891 * request and see what happens.
892 */
893 scsi_requeue_command(q, cmd);
894 return;
895 }
896 break;
897 case ILLEGAL_REQUEST:
898 /* If we had an ILLEGAL REQUEST returned, then
899 * we may have performed an unsupported
900 * command. The only thing this should be
901 * would be a ten byte read where only a six
902 * byte read was supported. Also, on a system
903 * where READ CAPACITY failed, we may have
904 * read past the end of the disk.
905 */
906 if ((cmd->device->use_10_for_rw &&
907 sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
908 (cmd->cmnd[0] == READ_10 ||
909 cmd->cmnd[0] == WRITE_10)) {
910 cmd->device->use_10_for_rw = 0;
911 /* This will cause a retry with a
912 * 6-byte command.
913 */
914 scsi_requeue_command(q, cmd);
915 return;
916 } else {
917 scsi_end_request(cmd, 0, this_count, 1);
918 return;
919 }
920 break;
921 case NOT_READY:
922 /* If the device is in the process of becoming
923 * ready, or has a temporary blockage, retry.
924 */
925 if (sshdr.asc == 0x04) {
926 switch (sshdr.ascq) {
927 case 0x01: /* becoming ready */
928 case 0x04: /* format in progress */
929 case 0x05: /* rebuild in progress */
930 case 0x06: /* recalculation in progress */
931 case 0x07: /* operation in progress */
932 case 0x08: /* Long write in progress */
933 case 0x09: /* self test in progress */
934 scsi_requeue_command(q, cmd);
935 return;
936 default:
937 break;
938 }
939 }
940 if (!(req->cmd_flags & REQ_QUIET)) {
941 scmd_printk(KERN_INFO, cmd,
942 "Device not ready: ");
943 scsi_print_sense_hdr("", &sshdr);
944 }
945 scsi_end_request(cmd, 0, this_count, 1);
946 return;
947 case VOLUME_OVERFLOW:
948 if (!(req->cmd_flags & REQ_QUIET)) {
949 scmd_printk(KERN_INFO, cmd,
950 "Volume overflow, CDB: ");
951 __scsi_print_command(cmd->cmnd);
952 scsi_print_sense("", cmd);
953 }
954 /* See SSC3rXX or current. */
955 scsi_end_request(cmd, 0, this_count, 1);
956 return;
957 default:
958 break;
959 }
960 }
961 if (host_byte(result) == DID_RESET) {
962 /* Third party bus reset or reset for error recovery
963 * reasons. Just retry the request and see what
964 * happens.
965 */
966 scsi_requeue_command(q, cmd);
967 return;
968 }
969 if (result) {
970 if (!(req->cmd_flags & REQ_QUIET)) {
971 scsi_print_result(cmd);
972 if (driver_byte(result) & DRIVER_SENSE)
973 scsi_print_sense("", cmd);
974 }
975 }
976 scsi_end_request(cmd, 0, this_count, !result);
977 }
又是一个令人发指的函数 . 但我什么都不想多说了 . 直接跳到最后一行 ,scsi_end_request(). 来自 drivers/scsi_lib.c:
632 /*
633 * Function: scsi_end_request()
634 *
635 * Purpose: Post-processing of completed commands (usually invoked at end
636 * of upper level post-processing and scsi_io_completion).
637 *
638 * Arguments: cmd - command that is complete.
639 * uptodate - 1 if I/O indicates success, <= 0 for I/O error.
640 * bytes - number of bytes of completed I/O
641 * requeue - indicates whether we should requeue leftovers.
642 *
643 * Lock status: Assumed that lock is not held upon entry.
644 *
645 * Returns: cmd if requeue required, NULL otherwise.
646 *
647 * Notes: This is called for block device requests in order to
648 * mark some number of sectors as complete.
649 *
650 * We are guaranteeing that the request queue will be goosed
651 * at some point during this call.
652 * Notes: If cmd was requeued, upon return it will be a stale pointer.
653 */
654 static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
655 int bytes, int requeue)
656 {
657 request_queue_t *q = cmd->device->request_queue;
658 struct request *req = cmd->request;
659 unsigned long flags;
660
661 /*
662 * If there are blocks left over at the end, set up the command
663 * to queue the remainder of them.
664 */
665 if (end_that_request_chunk(req, uptodate, bytes)) {
666 int leftover = (req->hard_nr_sectors << 9);
667
668 if (blk_pc_request(req))
669 leftover = req->data_len;
670
671 /* kill remainder if no retrys */
672 if (!uptodate && blk_noretry_request(req))
673 end_that_request_chunk(req, 0, leftover);
674 else {
675 if (requeue) {
676 /*
677 * Bleah. Leftovers again. Stick the
678 * leftovers in the front of the
679 * queue, and goose the queue again.
680 */
681 scsi_requeue_command(q, cmd);
682 cmd = NULL;
683 }
684 return cmd;
685 }
686 }
687
688 add_disk_randomness(req->rq_disk);
689
690 spin_lock_irqsave(q->queue_lock, flags);
691 if (blk_rq_tagged(req))
692 blk_queue_end_tag(q, req);
693 end_that_request_last(req, uptodate);
694 spin_unlock_irqrestore(q->queue_lock, flags);
695
696 /*
697 * This will goose the queue request function at the end, so we don't
698 * need to worry about launching another command.
699 */
700 scsi_next_command(cmd);
701 return NULL;
702 }
而我们最需要关心的 , 是 693 行 end_that_request_last.
3618 /*
3619 * queue lock must be held
3620 */
3621 void end_that_request_last(struct request *req, int uptodate)
3622 {
3623 struct gendisk *disk = req->rq_disk;
3624 int error;
3625
3626 /*
3627 * extend uptodate bool to allow < 0 value to be direct io error
3628 */
3629 error = 0;
3630 if (end_io_error(uptodate))
3631 error = !uptodate ? -EIO : uptodate;
3632
3633 if (unlikely(laptop_mode) && blk_fs_request(req))
3634 laptop_io_completion();
3635
3636 /*
3637 * Account IO completion. bar_rq isn't accounted as a normal
3638 * IO on queueing nor completion. Accounting the containing
3639 * request is enough.
3640 */
3641 if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
3642 unsigned long duration = jiffies - req->start_time;
3643 const int rw = rq_data_dir(req);
3644
3645 __disk_stat_inc(disk, ios[rw]);
3646 __disk_stat_add(disk, ticks[rw], duration);
3647 disk_round_stats(disk);
3648 disk->in_flight--;
3649 }
3650 if (req->end_io)
3651 req->end_io(req, error);
3652 else
3653 __blk_put_request(req->q, req);
3654 }
好了 ,3651 行这个 end_io 是最关键的代码 . 也许你早已忘记我们曾经见过 end_io, 但是不要紧 , 有我在 . 在 blk_execute_rq_nowait() 中 , 曾经有一行
2596 rq->end_io = done;
而 done 是这个函数的第四个参数 . 当初我们在调用这个函数的时候 , 在 blk_execute_rq 中 , 我们是这样写的 :
2636 blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);
也就是说 ,rq->end_io 被赋上了 blk_end_sync_rq.
2786 /**
2787 * blk_end_sync_rq - executes a completion event on a request
2788 * @rq: request to complete
2789 * @error: end io status of the request
2790 */
2791 void blk_end_sync_rq(struct request *rq, int error)
2792 {
2793 struct completion *waiting = rq->end_io_data;
2794
2795 rq->end_io_data = NULL;
2796 __blk_put_request(rq->q, rq);
2797
2798 /*
2799 * complete last, if this is a stack request the process (and thus
2800 * the rq pointer) could be invalid right after this complete()
2801 */
2802 complete(waiting);
2803 }
终于我们找到了亲爱的可爱的相爱的深爱的最爱的 complete(). 那么如何确定此 waiting 就是彼 wait 呢 ? 对照一下这个 waiting, 当时在 blk_execute_rq 中我们有 :
2635 rq->end_io_data = &wait;
而眼下我们又有 :
2793 struct completion *waiting = rq->end_io_data;
由此可知我们没有搞错对象 , 毕竟我们深知 , 接吻可以搞错对象 , 发脾气则不可以 , 写代码则更加不可以 .
至此 ,blk_execute_rq 被唤醒 , 然后迅速返回 . 紧随其后的是 scsi_execute 的返回和 scsi_execute_req 的返回 . 这一刻 , 一个 scsi 命令终于从无到有最终到有 , 它经历了 scsi 命令到 request 的蜕变 , 也经历了 request 到 scsi 命令的历练 . 最终它完成了它的使命 . 对它来说 , 生命是一场幻觉 , 别离或者死亡是唯一的结局 .