本帖最后由 jieyancai 于 2013-11-25 21:56 编辑
两台主机:IBM3850+Emulex光纤卡+DS5020阵列
操作系统:redhat6.3x64+redhat cluster+oracle ha
每台主机两根光纤分别直连存储阵列的A控和B控。
#uname -a
Linux bimsb 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
#lspci | grep -i fibre
0e:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
8b:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
# lsmod|grep lpfc
lpfc 653690 8
scsi_transport_fc 55235 4 bnx2fc,fcoe,libfc,lpfc
# cat /etc/multipath.conf
# multipath.conf written by anaconda
defaults {
user_friendly_names yes
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^dcssblk[0-9]*"
devnode "^sda"
device {
vendor "DGC"
product "LUNZ"
}
device {
vendor "IBM"
product "S/390.*"
}
# don't count normal SATA devices as multipaths
device {
vendor "ATA"
}
# don't count 3ware devices as multipaths
device {
vendor "3ware"
}
device {
vendor "AMCC"
}
# nor highpoint devices
device {
vendor "HPT"
}
wwid "3600605b0069550d0195ee4cd10aa9437"
device {
vendor IBM_SATA
product DEVICE_81Y3672
}
wwid "*"
}
blacklist_exceptions {
wwid "360080e50002cacde000003bc51ca525f"
wwid "360080e50002cacde000003b951ca5251"
wwid "360080e50002cacde000003be51ca5272"
}
multipaths {
multipath {
uid 0
gid 0
wwid "360080e50002cacde000003bc51ca525f"
mode 0600
}
multipath {
uid 0
gid 0
wwid "360080e50002cacde000003b951ca5251"
mode 0600
}
multipath {
uid 0
gid 0
wwid "360080e50002cacde000003be51ca5272"
mode 0600
}
}
#multipath -ll
mpathc (360080e50002cacde000003b951ca5251) dm-0 IBM,1814 FAStT
size=500G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 1:0:0:1 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
`- 2:0:0:1 sdd 8:48 active ghost running
mpathb (360080e50002cacde000003bc51ca525f) dm-1 IBM,1814 FAStT
size=200G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 1:0:0:2 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled `- 2:0:0:2 sde 8:64 active ghost running
多路径安装配置完,这样的显示结果正常吗?
目前遇到的问题是:数据库有IO的时候,系统日志就报错,而数据库反应是巨慢无比,但不报错。
系统日志错误信息如下:
tail -100 /var/log/messages
节点1:
Nov 14 15:51:48 bimsa rgmanager[14077]: [script] Executing /etc/init.d/dbora status
Nov 14 15:52:18 bimsa rgmanager[14831]: [script] Executing /etc/init.d/dbora status
Nov 14 15:52:19 bimsa rgmanager[14875]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:52:49 bimsa rgmanager[15823]: [script] Executing /etc/init.d/dbora status
Nov 14 15:52:59 bimsa rgmanager[16067]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:53:28 bimsa rgmanager[16876]: [script] Executing /etc/init.d/dbora status
Nov 14 15:53:29 bimsa rgmanager[16950]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:54:08 bimsa rgmanager[17842]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:54:08 bimsa rgmanager[17968]: [script] Executing /etc/init.d/dbora status
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:1305 Link Down Event x4262 received Data: x4262 x20 x80000 x0 x0
Nov 14 15:54:27 bimsa fcoemon: FC_HOST_EVENT 37754 at 1384415667 secs on host2:code 3=link_down datalen 4 data=0
Nov 14 15:54:27 bimsa fcoemon: FC_HOST_EVENT 37755 at 1384415667 secs on host2:code 2=link_up datalen 4 data=0
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:1303 Link Up Event x4263 received Data: x4263 x1 x20 x2 x0 x0 0
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:54:27 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):0100 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:54:27 bimsa fcoemon: FC_HOST_EVENT 37756 at 1384415667 secs on host2:code 65535=vendor_unique datalen 36 data=512
Nov 14 15:54:27 bimsa fcoemon: FC_HOST_EVENT 37757 at 1384415667 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:54:27 bimsa fcoemon: FC_HOST_EVENT 37758 at 1384415667 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:54:38 bimsa rgmanager[18881]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:54:39 bimsa rgmanager[18955]: [script] Executing /etc/init.d/dbora status
Nov 14 15:55:18 bimsa rgmanager[19852]: [script] Executing /etc/init.d/dbora status
Nov 14 15:55:18 bimsa rgmanager[19900]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:55:30 bimsa fcoemon: FC_HOST_EVENT 37759 at 1384415730 secs on host2:code 3=link_down datalen 4 data=0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:1305 Link Down Event x4264 received Data: x4264 x20 x80000 x0 x0
Nov 14 15:55:30 bimsa fcoemon: FC_HOST_EVENT 37760 at 1384415730 secs on host2:code 2=link_up datalen 4 data=0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:1303 Link Up Event x4265 received Data: x4265 x1 x20 x2 x0 x0 0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:30 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):0100 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:30 bimsa fcoemon: FC_HOST_EVENT 37761 at 1384415730 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:55:30 bimsa fcoemon: FC_HOST_EVENT 37762 at 1384415730 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:55:39 bimsa fcoemon: FC_HOST_EVENT 37763 at 1384415739 secs on host2:code 3=link_down datalen 4 data=0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:1305 Link Down Event x4266 received Data: x4266 x20 x80000 x0 x0
Nov 14 15:55:39 bimsa fcoemon: FC_HOST_EVENT 37764 at 1384415739 secs on host2:code 2=link_up datalen 4 data=0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:1303 Link Up Event x4267 received Data: x4267 x1 x20 x2 x0 x0 0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:39 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):0100 FLOGI failure Status:x3/x18 TMO:x0
Nov 14 15:55:39 bimsa fcoemon: FC_HOST_EVENT 37765 at 1384415739 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:55:39 bimsa fcoemon: FC_HOST_EVENT 37766 at 1384415739 secs on host2:code 65535=vendor_unique datalen 32 data=512
Nov 14 15:55:48 bimsa rgmanager[20746]: [script] Executing /etc/init.d/weblogic status
Nov 14 15:55:49 bimsa rgmanager[20899]: [script] Executing /etc/init.d/dbora status
Nov 14 15:56:02 bimsa fcoemon: FC_HOST_EVENT 37767 at 1384415762 secs on host2:code 3=link_down datalen 4 data=0
Nov 14 15:56:02 bimsa kernel: lpfc 0000:8b:00.0: 1:1305 Link Down Event x4268 received Data: x4268 x20 x80000 x0 x0
Nov 14 15:56:02 bimsa fcoemon: FC_HOST_EVENT 37768 at 1384415762 secs on host2:code 2=link_up datalen 4 data=0
Nov 14 15:56:02 bimsa kernel: lpfc 0000:8b:00.0: 1:1303 Link Up Event x4269 received Data: x4269 x1 x20 x2 x0 x0 0
Nov 14 15:56:02 bimsa kernel: lpfc 0000:8b:00.0: 1:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
。。。。。。
节点2:Nov 14 12:34:22 bimsb multipathd: mpathc: sdd - rdac checker reports path is ghost
Nov 14 12:34:22 bimsb multipathd: 8:48: reinstated
Nov 14 13:17:56 bimsb xinetd[5365]: START: telnet pid=65332 from=::ffff:60.250.69.98
Nov 14 13:17:56 bimsb telnetd[65332]: ttloop: peer died: EOF
Nov 14 13:17:56 bimsb xinetd[5365]: EXIT: telnet status=1 pid=65332 duration=0(sec)
Nov 14 13:39:48 bimsb multipathd: mpathb: sdc - rdac checker reports path is ghost
Nov 14 13:39:48 bimsb multipathd: 8:32: reinstated
Nov 14 13:39:48 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 13:39:49 bimsb multipathd: mpathb: sde - rdac checker reports path is up
Nov 14 13:39:49 bimsb multipathd: 8:64: reinstated
Nov 14 13:40:23 bimsb multipathd: mpathb: sdc - rdac checker reports path is up
Nov 14 13:40:23 bimsb multipathd: 8:32: reinstated
Nov 14 13:40:23 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 13:40:24 bimsb multipathd: mpathb: sde - rdac checker reports path is ghost
Nov 14 13:40:24 bimsb multipathd: 8:64: reinstated
Nov 14 14:30:19 bimsb multipathd: mpathb: sdc - rdac checker reports path is ghost
Nov 14 14:30:19 bimsb multipathd: 8:32: reinstated
Nov 14 14:30:19 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 14:30:20 bimsb multipathd: mpathb: sde - rdac checker reports path is up
Nov 14 14:30:20 bimsb multipathd: 8:64: reinstated
Nov 14 14:30:54 bimsb multipathd: mpathb: sdc - rdac checker reports path is up
Nov 14 14:30:54 bimsb multipathd: 8:32: reinstated
Nov 14 14:30:54 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 14:30:55 bimsb multipathd: mpathb: sde - rdac checker reports path is ghost
Nov 14 14:30:55 bimsb multipathd: 8:64: reinstated
Nov 14 14:34:19 bimsb xinetd[5365]: START: telnet pid=5266 from=::ffff:60.250.69.98
Nov 14 14:34:20 bimsb telnetd[5266]: ttloop: peer died: EOF
Nov 14 14:34:20 bimsb xinetd[5365]: EXIT: telnet status=1 pid=5266 duration=1(sec)
Nov 14 14:52:09 bimsb multipathd: mpathb: sdc - rdac checker reports path is ghost
Nov 14 14:52:09 bimsb multipathd: 8:32: reinstated
Nov 14 14:52:09 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 14:52:10 bimsb multipathd: mpathb: sde - rdac checker reports path is up
Nov 14 14:52:10 bimsb multipathd: 8:64: reinstated
Nov 14 14:52:45 bimsb multipathd: mpathb: sdc - rdac checker reports path is up
Nov 14 14:52:45 bimsb multipathd: 8:32: reinstated
Nov 14 14:52:45 bimsb multipathd: mpathb: load table [0 419430400 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:32 1 round-robin 0 1 1 8:64 1]
Nov 14 14:52:46 bimsb multipathd: mpathb: sde - rdac checker reports path is ghost
Nov 14 14:52:46 bimsb multipathd: 8:64: reinstated
Nov 14 15:09:11 bimsb fcoemon: FC_HOST_EVENT 195 at 1384412951 secs on host2:code 65535=vendor_unique datalen 28 data=512
Nov 14 15:09:11 bimsb kernel: lpfc 0000:8b:00.0: 1:(0):0713 SCSI layer issued Device Reset (0, 1) return x2002
Nov 14 15:09:11 bimsb multipathd: mpathc: sdd - rdac checker reports path is down
Nov 14 15:09:11 bimsb multipathd: checker failed path 8:48 in map mpathc
Nov 14 15:09:11 bimsb multipathd: mpathc: remaining active paths: 1
Nov 14 15:09:11 bimsb kernel: device-mapper: multipath: Failing path 8:48.
Nov 14 15:09:16 bimsb multipathd: mpathc: sdd - rdac checker reports path is ghost
Nov 14 15:09:16 bimsb multipathd: 8:48: reinstated
Nov 14 15:09:16 bimsb multipathd: mpathc: remaining active paths: 2
是多路径配置的问题还是什么问题?
查看光纤卡指示灯都是正常的,存储,主机都正常。
问题已经解决。
光纤线有问题,原先用户部署的光纤是黄色单模的,更换为红色多模光纤就一切正常了。
安装最新的驱动后的多路径状态:
[root@bimsa ~]# multipath -ll
mpathc (360080e50002cacde000003b951ca5251) dm-1 IBM,1814 FAStT
size=500G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=active
| `- 1:0:0:1 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
`- 2:0:0:1 sdd 8:48 active ready running
mpathb (360080e50002cacde000003bc51ca525f) dm-0 IBM,1814 FAStT
size=200G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=active
| `- 1:0:0:2 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
`- 2:0:0:2 sde 8:64 active ready running