io 错误: socket read timed out_SCSI 层IO处理流程 & 错误处理

1. IO 下发流程

SCSI IO路径上接 block 层

1.1 SQ

SCSI SQ 注册到Block层IO钩子函数:

	scsi_old_alloc_queue()
	|
	+---> q->request_fn = scsi_request_fn;

SQ 下发IO路径:

    scsi_request_fn()
    |
    +---> blk_peek_request()
    |     |
    |     +---> scsi_prep_fn()
    |           (Allocate and setup scsi_cmnd)
    |
    +---> cmd->scsi_done = scsi_done; // 设置回调,IO完成时驱动调用
    |
    +---> scsi_dispatch_cmd()
          |
          +---> hostt->queue_command()  	  

1.2 MQ

SCSI MQ 注册到Block层IO钩子函数:

	static const struct blk_mq_ops scsi_mq_ops = {
		.queue_rq	= scsi_queue_rq,
		...
	};

MQ 下发IO:

        scsi_queue_rq()
        |
	+---> cmd->scsi_done = scsi_mq_done  // 设置回调,IO完成时驱动调用
	|   
        +---> scsi_dispatch_cmd()
              |
              +---> hostt->queue_command()

2. IO 完成

驱动(LLDD)层完成IO后,调用scsi层提供的回调done函数:scsi_done/scsi_mq_done,

2.1 SQ

	scsi_done()
	|
	+---> blk_complete_request()
        |
	+---> causes softirq
               |
               +---> blk_done_softirq()
                      |
                      +---> scsi_softirq_done()

2.2 MQ

	scsi_mq_done()
	|
	+---> blk_mq_complete_request()
              |
	      +---> scsi_softirq_done()

2.3 公共处理部分

    scsi_softirq_done()
    |
    +---> scsi_decide_disposition() // 根据驱动返回IO状态码,确定下一步处理逻辑
    |      Takes a look at the scsi_cmnd->result and sense data to determine
    |      what is the best course of action to take. While reading this+    
    |      function code, one should not confuse SUCCESS as meaning the command
    |      was successful, or FAILED to mean the command failed etc. The return
    |      value of this function merely indicates the course of action to take
    |
    +---> case SUCCESS:            // IO处理成功,返回block层
    |      (Finish off the command to block layer. For e.g, the device may be
    |      offline, and hence complete the command - the block layer may retry
    |      on its own later, but that doesn't concern the SCSI ML)
    |      |
    |      +---> scsi_finish_command()
    |            |
    |            +---> scsi_io_completion() (*see note below)
    |                  |
    |                  +---> blk_finish_request()
    |
    +---> case RETRY/ADD_TO_MLQUEUE:  // 驱动层返回IO重试或者IO重新入队列,待重新调度下发执行
    |     (Requeue the command to request queue. For e.g. the device HW was
    |      busy, and thus SCSI ML knows that retrying may help)
    |      |
    |      +---> scsi_queue_insert()
    |            |
    |            +---> blk_requeue_request()
    |
    +---> case FAILED/default:   // IO 执行失败,将IO添加处理链表,进行错误恢复处理
          (Schedule the scsi_cmnd for EH. For e.g. there was a bus error that
          might need bus reset. Or we got CHECK_CONDITION and we need to issue
          REQ_SENSE to get more info about the failure. etc)
          |
          +---> scsi_eh_scmd_add()
                Add scsi_cmnd to the host EH queue
                    scsi_eh_wakeup()								

IO完成主要有以下三种处理逻辑:

a. IO处理成功,返回block层

b. 驱动层返回IO重试或者IO重新入队列,待重新调度下发执行

c. IO 执行失败,将IO添加处理链表,进行错误恢复处理

3. IO 超时

block提供块设备超时处理机制,块设备分配request_queue时,注册超时处理函数(也可以不注册超时处理,如DM设备)。通常默认超时时间为30S, 超时时间支持修改:

/sys/class/scsi_device/<#:#:#:#>/device/timeout

3.1 SQ

	scsi_old_alloc_queue
	|
	+---> blk_init_allocated_queue()
	|	  |
        |	  +---> INIT_WORK(&q->timeout_work, blk_timeout_work);  // 超时处理work
	|
	+---> blk_queue_rq_timed_out(q, scsi_times_out);  // 注册超时处理函数	

对于SQ设备, block下发到scsi层的request都会添加到timeout_list 链表中,blk_timeout_work 检测这个链表上的IO是否超时。

3.2 MQ

注册超时处理函数

static const struct blk_mq_ops scsi_mq_ops = {
	    .timeout	= scsi_timeout     // 注册超时处理函数
};	
	blk_mq_init_allocated_queue()
	|
	+---> INIT_WORK(&q->timeout_work, blk_mq_timeout_work); // 超时处理work

对于MQ设备,因为request是在初始化request_queue时预分配的,通过tag管理,blk_mq_timeout_work遍历tag,

通过bitmap找到request.

blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &next);

3.3 超时公共处理

    scsi_times_out()
    |
    |     // 如果驱动注册了超时处理,则执行驱动的超时处理逻辑 
    +---> scsi_transport_template->eh_timed_out() - Successful? If not...  
    |     (Gives transportt a chance to deal with it)
    |    // 如果上述超时处理失败,则执行abort处理
    +---> scsi_host_template->eh_timed_out() - Successful? If not...
    |     (Gives hostt a chance to deal with it)
    |     
    +---> scsi_abort_command() - Successful? If not...  // 指定abort操作
    |     (Schedule an ABORT of the scsi_cmnd. The abort handler will also
    |      requeue it if needed)
    |
    |      // 如果abort失败,将命令添加到 scsi 错误链表,由scsi错误线程处理。
    +---> scsi_eh_scmd_add()                           
          (Schedule the scsi_cmnd for EH. This'll definitely work. Because if it
           doesn't work, the EH handler will mark the device as offline, which
           counts as a good fix :-))

4. IO 错误处理

4.1 scsi 错误处理线程注册

在host初始化时, 每个host启动一个内核线程scsi_eh_#, 其中#为host_no,

通过ps -aux | grep scsi_eh 可以查看当前系统scsi错误出现线程。

    shost->ehandler = kthread_run(scsi_error_handler, shost, "scsi_eh_%d", shost->host_no);

4.2 唤醒错误处理线程

线程被唤醒有两条路径:

a. 将IO添加到错误处理(IO failed完成或者IO超时)

   scsi_eh_scmd_add()
   |
   +---> scsi_host_set_state(shost, SHOST_RECOVERY)   // 设置host状态为RECOVERY状态
   |
   +---> scsi_eh_wakeup()

b. 主动调用 scsi_schedule_eh()接口唤醒

   scsi_schedule_eh()
   |
   +---> scsi_host_set_state(shost, SHOST_RECOVERY)   // 设置host状态为RECOVERY状态
   |
   +---> scsi_eh_wakeup() 

4.3 错误处理

    scsi_error_handler()
    |
    +---> shost->transportt->eh_strategy_handler(shost) // 驱动有注册私有的错误处理 (如libsas)
    |
    +--> scsi_eh_get_sense() - Are we done? if not..// 如果驱动没有,则执行scsi 层提供的错误处理逻辑
    |   (For the commands that have CHECK_CONDITION, get sense_info)
    |    |
    |    +--> scsi_request_sense()
    |    |   (Use scsi_send_eh_cmnd() to send a "hijacked" REQ_SENSE cmnd)
    |    |
    |    +--> scsi_decide_disposition()
    |    |
    |    +--> Arrange to finish the scsi_cmnd if SUCCESS (by setting
    |         retries=allowed)
    |
    +--> scsi_eh_abort_cmds() - Are we done? If not...
    |   (Abort the commands that had timed out)
    |    |
    |    +--> scsi_try_to_abort_cmd()
    |    |    (Results in call to hostt->eh_abort_handler() which is responsible
    |    |     making the LLD and the HW forget about the scsi_cmnd)
    |    |
    |    +--> scsi_eh_test_devices()
    |         (Test if the device is responding now by sending appropriate EH
    |          commands (STU / TEST_UNIT_READY). Again, sending these EH
    |          commands involves highjacking the original scsi_cmnd, and later
    |          restoring the context)
    |
    +--> scsi_eh_ready_devs() - Are we done? if not...  // 进行reset恢复操作
    |    (Take increasing order of higher severity actions in order to recover)
    |    |
    |    +--> scsi_eh_bus_device_reset()  // device reset (Lun reset)
    |    |   (Reset the scsi_device. Results in call to
    |    |    hostt->eh_device_reset_handler())
    |    |
    |    +--> scsi_eh_target_reset()   // target reset 
    |    |   (Reset the scsi_target. Results in call to
    |    |    hostt->eh_target_reset_handler())
    |    |
    |    +--> scsi_eh_bus_reset()  // bus reset 
    |    |   (Reset the scsi_device. Results in call to
    |    |    hostt->eh_bus_reset_handler())
    |    |
    |    +--> scsi_eh_host_reset()  // host reset
    |    |   (Reset the Scsi_Host. Results in call to
    |    |    hostt->eh_host_reset_handler())
    |    |   // 上述reset参数都失败后,则将盘设备为offline状态
    |    +--> If nothing has worked - scsi_eh_offline_sdevs()  
    |         (The device is not recoverable, put it offline)
    |   // 上述处理完毕后,错误处理链表上的IO移到done链表,这里处理done链表上的cmd 
    +--> scsi_eh_flush_done_q() 
       
        (For all the EH commands on the done_q, either requeue them (via
         scsi_queue_insert()) if eligible, or finish them up to block layer
         (via scsi_finish_command())	 

上述几乎每一步都会去检查 host 的eh_deadline字段,如果是启动并过期,则立即返回,不执行对应的操作。eh_deadline 默认为off, 即不启动。

如果需要设置, 可以通过如下路径来修改:/sys/class/scsi_host/host#/eh_deadline。

IO添加到错误处理链表后, 会设置host设置为RECOVERY状态, 该状态会导致Host下所有的磁盘无法下发新的IO,出现IO为零状态。待IO错误处理完毕后,清除host上的为RECOVERY状态,则可以重新下新的IO。

4.3.1 libsas 错误处理

libsas 有注册私有的错误处理函数,不使用scsi提供的错误处理逻辑。

scsi层对scsi_cmnd级别的错误处理,libsas针对更底层一些,每个scsi_cmnd有对于一个sas_task,libsas是针对sas_task进行错误处理。

a. 注册:

 stt->eh_strategy_handler = sas_scsi_recover_host;	

b. 错误处理

  sas_scsi_recover_host
  |
  +---> sas_eh_handle_sas_errors()
  |	|
  |	+---> sas_scsi_find_task()  
  |	|     |
  |	|     +---> lldd_abort_task(task) // 执行abort
  |     |     |
  |	|     +--->  lldd_query_task()   // 查询命令状态
  |	|    
  |	+---> case TASK_IS_DONE:  sas_eh_finish_cmd(cmd)    // 命令先与abort完成,即以正常完成。
  |	|      
  |	+---> case TASK_IS_ABORTED:  sas_eh_finish_cmd(cmd) // 命令abort成功,命令
  |	|
  |	+---> case TASK_IS_AT_LU:  // 需要进入 lun recover 操作, 类似scsi的device reset
  |	|	   |
  |	|	   +---> sas_recover_lu()
  |	|
  |	+---> case TASK_IS_NOT_AT_LU/TASK_ABORT_FAILED  // 进入 I_T recover 恢复
  |	|           |
  |	|           +---> sas_recover_I_T(task->dev)    // 执行phy reset 
  |	|
  |	+---> try_to_reset_cmd_device(cmd)     // 其他情况
  |	|	  |
  |	|         +---> eh_device_reset_handler()  // 如果驱动有注册,执行 device reset
  |	|         |
  |	|         +---> eh_target_reset_handler()  // 如果驱动有注册,执行 target reset
  |	|
  |	+----> i->dft->lldd_clear_nexus_port() // 如果驱动有注册
  |	|    // 如果驱动有注册,进入ha级别的恢复,类似了scsi的host reset
  |	|----> i->dft->lldd_clear_nexus_ha()   
  |
  +--->  sas_ata_eh(shost, &eh_work_q, &ha->eh_done_q)  //进入sata盘专有的错误恢复处理
  |      |
  |	 +---> ata_scsi_cmd_error_handler()
  |      // 如果经过上述的错误恢复处理后,仍然还有待处理的错误IO,则执行scsi层提供的错误处理
  +--->  scsi_eh_ready_devs()      
		|
		+--->  sas_ata_strategy_handler() //进入sata盘专有的错误恢复处理
			|
			+---> ata_scsi_port_error_handler
			
1) sas_recover_lu() 执行 lun reset
2) sas_recover_I_T(), 进行phy reset操作, 对于sas磁盘:即执行hardreset, 对于sata盘,则为link reset
3) lldd_clear_nexus_ha(), 进行SAM TMF定义中的 _CLEAR_ACA 恢复
4) sas_ata_eh()/sas_ata_strategy_handler(), sata盘专有的错误处理,
   可见sata磁盘比sas磁盘多这个错误处理,错误处理时间更长。
		

4.3.2 libata 错误处理

(待补充)

5. IO 重试

以下5种情况,io会进行重试

5.1 blk_timeout_work 检查到IO超时,进行IO超时处理,abort命令成功后,IO重新入队列进行重试

    scsi_times_out()
        -> scsi_abort_command()
	      -> schedules scmd_eh_abort_handler()
		   -> scsi_queue_insert()
			-> blk_requeue_request()				

5.2 scsi错误处理线程,在处理完RECOVERY host后, IO重新入队列

    scsi_error_handler()
        -> scsi_unjam_host()
            -> scsi_eh_flush_done_q()
                -> scsi_queue_insert()
                    -> blk_requeue_request()

5.3 驱动完成IO后,驱动明确返回该IO需要重试(如驱动暂时忙场景)

    scsi_softirq_done()
		-> scsi_decide_disposition() returns NEEDS_RETRY
			-> scsi_queue_insert()               
				-> blk_requeue_request()			

5.4 block层下发IO到scsi层,scsi 设备或者host处于busy状态,IO重新入队列

	scsi_request_fn()
		-> case note_ready:  //  设备busy或者Host busy
			-> blk_requeue_request()

5.5 scsi多处调用 scsi_finish_command, 检查驱动返回的result不为0,IO重新入队列

	scsi_finish_command()
		-> scsi_io_completion()
			-> scsi_io_completion_action()
				-> blk_requeue_request()

参考资料:

[1]: Documentation/scsi: Documentation about scsi_cmnd lifecycle

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值