"Abort command Issued" messages appear in /var/log/messages file

环境

  • Red Hat Enterprise Linux (RHEL)
    • 5
    • 6
    • 7
  • Red Hat's qla2xxx driver
  • QLogic FC HBAs
  • Fibre Channel SAN

问题

  • We are noticing SAN errors across all our different system (SuSE, Solaris, even Windows).
  • Performance on our database servers is degraded and applications are slow responding.
  • Some systems are crashing after these errors.
  • What do the "Abort command issued" error messages mean?

Raw

kernel: qla2xxx 0000:46:00.0: scsi(1:0:105): Abort command issued -- 1 3e7bbc46 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:101): Abort command issued -- 1 3e7c1ec0 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:103): Abort command issued -- 1 3e7d02b8 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:115): Abort command issued -- 1 3e7d37a9 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:109): Abort command issued -- 1 3e7d44cd 2002.
  • What is the meaning of qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus message?

Raw

kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.
kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.
kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.

决议

  • These errors indicate an error condition being returned from the SAN.
  • Try to verify if there are any issues present from the FC switch, FC cabling, zoning or Storage array.
  • It would also be advised to engage the storage vendor to review the switch logs to verify if there are any error counters, CRC errors in FC switch logs.

根源

  • Error message qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 -- 1 2002 is explained below.
    • qla2xxx is the name of the driver or kernel module.
    • [0000:04:00.0] is the PCI bus information of the device.
    • 801c is a hexadecimal id which uniquely identifies the part of the code from where the message originated.
    • 1 is the host number of the scsi target.
    • Abort command issued nexus=1:0:0 The driver aborted the command that was in progress to the scsi target 1:0:0.
    • the last 1 means the driver spent time wait for the device to respond.
    • 2002 means the reset succeeded
  • Multiple underlying issues can cause abort messages and a slow SAN.
  • Initial areas to investigate include SAN related components, such as the switches or storage targets.
  • Command aborts are almost always caused by command timeouts. The first course of action is to abort it to make sure that any references to it are erased. Command timeout could be caused by many different things: SAN congestion, a flaky target, bad hardware somewhere, or an overloaded target that might be dropping commands.

诊断步骤

  • Enable extended logging on the qla2xxx driver

    Raw

     **CAUTION:** Turning on extended error logging under moderate to heavy IO loads can cause lockups! 
    The debug code logs information to `/var/log/messages`  about IO being processed. These debug messages 
    cause additional IO, which in turn causes more logging. This can get to the point of essentially locking up the 
    system. It is strongly suggested that the messages file be moved off any QLogic-controlled disks to a local 
    disk or via the network to a remote logging point to avoid this issue.
    
  • Enable extended logging for the qla2xxx driver to try to capture any additional error messages when the issue occurs

    Raw

    $ chmod u+w /sys/module/qla2xxx/parameters/ql2xextended_error_logging
    $ echo "1" > /sys/module/qla2xxx/parameters/ql2xextended_error_logging
    
  • Check for additional error logging in /var/log/messages when the issue occurs:

    Raw

    Mar 14 00:04:51 hostname kernel: qla2xxx_eh_abort(1): aborting sp ffff8102c5614680 from RISC. pid=1048458952.
    Mar 14 00:04:51 hostname kernel: scsi(1): ABORT status detected 0x5-0x0.
    Mar 14 00:04:51 hostname kernel: qla2xxx 0000:46:00.0: scsi(1:0:109): Abort command issued -- 1 3e7e36c8 2002.
    
  • Increase scsi extended event logging to get more information from the SCSI layer. It is possible to enable this without a reboot using sysctl in the following fashion:

    Raw

    $ sysctl -w dev.scsi.logging_level=0x1003
    
    • Note: Don't use other values, especially larger values such as 0xffff, unless you know exactly what each bit does. Turning on other values can flood the logs with so many messages that the important messages will be overwritten before ever being saved to disk and also cause huge log files to be created.
  • Please open cases with SAN and Fabric switch vendors involved in the case.

  • With scsi extended logging_level and ql2xextended_error_logging set, wait for a few events to occur and upload a fresh /var/log/messagesfile from the systems.

  • Check how many HBAs and if the errors are balanced over both or only on one of the HBA's:

    • Check HBA PCIID's:

      Raw

      $ lspci | grep -i qlogic
      02:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)
      46:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)
      
    • the 02:00.0 and 46:00.0 at the beginning of the output are the PCI address values for these cards.

  • Check the number of errors on each of the cards, based on the pci addresses found via above) from /var/log/messages:

    Raw

    $ grep 02:00.0 var/log/messages | grep qla2xxx | wc -l  <-------- These are sample values of PCI addresses for QLogic HBA
    4
    $ grep 46:00.0 var/log/messages | grep qla2xxx | wc -l  <-------- These are sample values of PCI addresses for QLogic HBA
    86
    
  • Check if there is something special on the fabric and paths for the device 46:00.0 (Please use the value that correspond to your own environment)

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值