"Abort command Issued" messages appear in /var/log/messages file

最新推荐文章于 2024-03-13 17:27:53 发布

victoruu

最新推荐文章于 2024-03-13 17:27:53 发布

阅读量7.1k

点赞数

分类专栏： Linux KB 文章标签： Redhat Knowledgebase

Linux KB 专栏收录该内容

35 篇文章 0 订阅

订阅专栏

环境

Red Hat Enterprise Linux (RHEL)
- 5
- 6
- 7
Red Hat's qla2xxx driver
QLogic FC HBAs
Fibre Channel SAN

问题

We are noticing SAN errors across all our different system (SuSE, Solaris, even Windows).
Performance on our database servers is degraded and applications are slow responding.
Some systems are crashing after these errors.
What do the "Abort command issued" error messages mean?

Raw

kernel: qla2xxx 0000:46:00.0: scsi(1:0:105): Abort command issued -- 1 3e7bbc46 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:101): Abort command issued -- 1 3e7c1ec0 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:103): Abort command issued -- 1 3e7d02b8 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:115): Abort command issued -- 1 3e7d37a9 2002.
kernel: qla2xxx 0000:46:00.0: scsi(1:0:109): Abort command issued -- 1 3e7d44cd 2002.

What is the meaning of qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus message?

Raw

kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.
kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.
kernel: qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 --  1 2002.

决议

These errors indicate an error condition being returned from the SAN.
Try to verify if there are any issues present from the FC switch, FC cabling, zoning or Storage array.
It would also be advised to engage the storage vendor to review the switch logs to verify if there are any error counters, CRC errors in FC switch logs.

根源

Error message qla2xxx [0000:04:00.0]-801c:1: Abort command issued nexus=1:0:0 -- 1 2002 is explained below.
- qla2xxx is the name of the driver or kernel module.
- [0000:04:00.0] is the PCI bus information of the device.
- 801c is a hexadecimal id which uniquely identifies the part of the code from where the message originated.
- 1 is the host number of the scsi target.
- Abort command issued nexus=1:0:0 The driver aborted the command that was in progress to the scsi target 1:0:0.
- the last 1 means the driver spent time wait for the device to respond.
- 2002 means the reset succeeded
Multiple underlying issues can cause abort messages and a slow SAN.
Initial areas to investigate include SAN related components, such as the switches or storage targets.
Command aborts are almost always caused by command timeouts. The first course of action is to abort it to make sure that any references to it are erased. Command timeout could be caused by many different things: SAN congestion, a flaky target, bad hardware somewhere, or an overloaded target that might be dropping commands.

诊断步骤

Enable extended logging on the qla2xxx driver

Raw

 **CAUTION:** Turning on extended error logging under moderate to heavy IO loads can cause lockups! 
The debug code logs information to `/var/log/messages`  about IO being processed. These debug messages 
cause additional IO, which in turn causes more logging. This can get to the point of essentially locking up the 
system. It is strongly suggested that the messages file be moved off any QLogic-controlled disks to a local 
disk or via the network to a remote logging point to avoid this issue.

Enable extended logging for the qla2xxx driver to try to capture any additional error messages when the issue occurs

Raw

$ chmod u+w /sys/module/qla2xxx/parameters/ql2xextended_error_logging
$ echo "1" > /sys/module/qla2xxx/parameters/ql2xextended_error_logging

Check for additional error logging in /var/log/messages when the issue occurs:

Raw

Mar 14 00:04:51 hostname kernel: qla2xxx_eh_abort(1): aborting sp ffff8102c5614680 from RISC. pid=1048458952.
Mar 14 00:04:51 hostname kernel: scsi(1): ABORT status detected 0x5-0x0.
Mar 14 00:04:51 hostname kernel: qla2xxx 0000:46:00.0: scsi(1:0:109): Abort command issued -- 1 3e7e36c8 2002.

Increase scsi extended event logging to get more information from the SCSI layer. It is possible to enable this without a reboot using sysctl in the following fashion:

Raw
```
$ sysctl -w dev.scsi.logging_level=0x1003
```
- Note: Don't use other values, especially larger values such as 0xffff, unless you know exactly what each bit does. Turning on other values can flood the logs with so many messages that the important messages will be overwritten before ever being saved to disk and also cause huge log files to be created.
Please open cases with SAN and Fabric switch vendors involved in the case.
With scsi extended logging_level and ql2xextended_error_logging set, wait for a few events to occur and upload a fresh /var/log/messagesfile from the systems.

Check how many HBAs and if the errors are balanced over both or only on one of the HBA's:

Check HBA PCIID's:

Raw

$ lspci | grep -i qlogic
02:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)
46:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)

the 02:00.0 and 46:00.0 at the beginning of the output are the PCI address values for these cards.

Check the number of errors on each of the cards, based on the pci addresses found via above) from /var/log/messages:

Raw

$ grep 02:00.0 var/log/messages | grep qla2xxx | wc -l  <-------- These are sample values of PCI addresses for QLogic HBA
4
$ grep 46:00.0 var/log/messages | grep qla2xxx | wc -l  <-------- These are sample values of PCI addresses for QLogic HBA
86

Check if there is something special on the fabric and paths for the device 46:00.0 (Please use the value that correspond to your own environment)

victoruu

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
"Abort command Issued" messages appear in /var/log/messages file

环境Red Hat Enterprise Linux (RHEL) 5 6 7 Red Hat'sqla2xxxdriver QLogic FC HBAs Fibre Channel SAN问题We are noticing SAN errors across all our different system (SuSE, Solaris, even Window...
复制链接

扫一扫

专栏目录