ORACLE RAC其中几个节点突然宕机,原因:localhost kernel: end_request: I/O error, dev sdi, sector 873749760

7 篇文章 0 订阅

背景:

2019年5月22日,12点多,在另外一个厂商调用我司的应用接口时,突然报错;另外,我司的业务系统的菜单功能,点击进去也是报错。

前台报错如下:

通过分析应用日志和中间件控制台数据源,发现连接的那个节点宕机了。

第一步:srvctl status databse –d orcl 用此命令看的数据库状态;

第二步:srvctl status database –d orcl –I orcl1 尝试启动此节点;--报集群软件有问题

第三步:/u01/11.2.0/grid/bin/crsctl check cluster -all 发现集群软件挂掉了;

第四步:使用root 用户 /u01/11.2.0/grid/bin/crsctl stop crs -f 强制关闭crs资源;

第五步:用root 执行/u01/11.2.0/grid/bin/crsctl start crs 启动此节点crs资源;

第六步:接着使用grid用户启动节点1 srvctl status database –d orcl –I orcl1 ,节点1实例启动,业务恢复正常。

上面,使用常规的重新启动操作,数据库节点1恢复正常了,只能说这是万幸。具体的原因,还是需要继续查找的。

如果使用常规手段,启动不了  就需要进一步分析日志。

原因排查: 

第一步,查看告警日志alert_orcl1.log

Tue May 21 12:33:44 2019
WARNING: Write Failed. group:1 disk:7 AU:4389 offset:901120 size:131072
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw4_88627.trc:
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 131072
WARNING: failed to write mirror side 1 of virtual extent 2183 logical extent 0 of file 264 in group 1 on disk 7 allocation unit 4389 
KCF: read, write or open error, block=0x443ee online=1
        file=3 '+DATA/orcl/datafile/undotbs1.264.997101273'
        error=15081 txt: ''
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw4_88627.trc: 提示到此trace文件中查看。

上面只能看到是DATA磁盘组磁盘好像有些问题,但是具体的看不出来。

第二步,查看orcl1_dbw4_88627.trc

WARNING: Write Failed. group:1 disk:5 AU:4389 offset:393216 size:131072
path:/dev/asm-diskk --可以看到对应的磁盘确实出问题了,导致读写磁盘失败,IO报错。
     incarnation:0xd622c620 asynchronous result:'I/O error'
     subsys:System iop:0x7ffff4440148 bufp:0x1ddbd7f000 osderr:0x0 osderr1:0x0
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 131072
WARNING: failed to write mirror side 1 of virtual extent 2184 logical extent 0 of file 264 in group 1 on disk 5 allocation unit 4389 
KCF: read, write or open error, block=0x44430 online=1
        file=3 '+DATA/orcl/datafile/undotbs1.264.997101273'
        error=15081 txt: ''
Encountered write error

上面,可以看到确实磁盘出问题了,但是磁盘具体出什么问题了,trace里面看不出来,还需要看操作系统日志。

第三步,查看操作系统日志:/var/log/messages文件

May 21 03:28:45 localhost auditd[1926]: Audit daemon rotating log files
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi] CDB: Write(10): 2a 00 34 14 5d 00 00 00 10 00
May 21 12:33:38 localhost kernel: end_request: I/O error, dev sdi, sector 873749760
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi] CDB: Write(10): 2a 00 34 14 5c 00 00 00 10 00
May 21 12:33:38 localhost kernel: end_request: I/O error, dev sdi, sector 873749504 --传说中的磁盘坏道。
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi] CDB: Read(10): 28 00 2d 3c f8 40 00 04 00 00
May 21 12:33:38 localhost kernel: end_request: I/O error, dev sdi, sector 758970432
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:38 localhost kernel: sd 33:0:6:0: [sdi] CDB: Read(10): 28 00 2d 3c fc 40 00 03 c0 00
May 21 12:33:38 localhost kernel: end_request: I/O error, dev sdi, sector 758971456
May 21 12:33:42 localhost kernel: sd 33:0:6:0: [sdi]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:42 localhost kernel: sd 33:0:6:0: [sdi] CDB: Read(10): 28 00 2d cb 17 c0 00 00 10 00
May 21 12:33:42 localhost kernel: end_request: I/O error, dev sdi, sector 768284608
May 21 12:33:43 localhost kernel: sd 33:0:8:0: [sdj]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:43 localhost kernel: sd 33:0:8:0: [sdj] CDB: Read(10): 28 00 11 9d a0 60 00 00 20 00
May 21 12:33:43 localhost kernel: end_request: I/O error, dev sdj, sector 295542880
May 21 12:33:43 localhost kernel: sd 33:0:8:0: [sdj]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 21 12:33:43 localhost kernel: sd 33:0:8:0: [sdj] CDB: Write(10): 2a 00 0f 94 c0 14 00 00 02 00
May 21 12:33:43 localhost kernel: end_request: I/O error, dev sdj, sector 261406740

最后,查阅相关资料:

查到这里,给大家一些忠告:如果你是负责硬件运维的,日常就要做好监控了;如果你是负责应用系统运维的,把此事情给客户汇报,让客户协调硬件厂商去处理。

结语:经过一段时间的监控,这样的磁盘故障暂时没有出现,后续继续监控。

供大家学习,参考。 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

张陈亚

您的鼓励,将是我最大的坚持!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值