墨墨导读:本文详述硬件掉电后,Oracle集群无法启动的诡异故障处理过程。
一、 问题描述
现象:硬件掉电后,Oracle集群无法启动。
[root@rac2 ~]# crsctl stat res -tCRS-4535: Cannot communicate with Cluster Ready ServicesCRS-4000: Command Status failed, or completed with errors.[root@rac2 ~]# crsctl start crsCRS-4640: Oracle High Availability Services is already activeCRS-4000: Command Start failed, or completed with errors.
二、 故障处理
查看集群组件发现ora.asm状态为offline
root@rac2 ~]# crsctl stat res -t -init--------------------------------------------------------------------------------NAME TARGET STATE SERVER STATE_DETAILS--------------------------------------------------------------------------------Cluster Resources--------------------------------------------------------------------------------ora.asm 1 ONLINE OFFLINE Instance Shutdownora.cluster_interconnect.haip 1 ONLINE ONLINE rac2ora.crf 1 ONLINE ONLINE rac2ora.crsd 1 ONLINE OFFLINEora.cssd 1 ONLINE ONLINE rac2ora.cssdmonitor 1 ONLINE ONLINE rac2ora.ctssd 1 ONLINE ONLINE rac2 OBSERVERora.diskmon 1 OFFLINE OFFLINEora.drivers.acfs 1 ONLINE ONLINE rac2ora.evmd 1 ONLINE INTERMEDIATE rac2ora.gipcd 1 ONLINE ONLINE rac2ora.gpnpd 1 ONLINE ONLINE rac2ora.mdnsd 1 ONLINE ONLINE rac2
查看grid alert日志发现磁盘组没有mount
[ohasd(4329)]CRS-2769:Unable to failover resource 'ora.diskmon'.2018-05-08 04:12:24.940:[cssd(4576)]CRS-1707:Lease acquisition for node rac2 number 2 completed2018-05-08 04:12:26.188:[cssd(4576)]CRS-1605:CSSD voting file is online: /dev/asmdisk/oraasm-OCR_0000; details in /u01/app/11.2.0/grid/log/rac2/cssd/ocssd.log.2018-05-08 04:12:28.723:[cssd(4576)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .2018-05-08 04:12:30.617:[ctssd(4660)]CRS-2401:The Cluster Time Synchronization Service started on host rac2.2018-05-08 04:12:30.617:[ctssd(4660)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.2018-05-08 04:12:32.348:[ohasd(4329)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE2018-05-08 04:12:32.348:[ohasd(4329)]CRS-2769:Unable to failover resource 'ora.diskmon'.
查看asm_alert,出现ORA-00600 [kfrValAcd30]的报错
NOTE: GMON heartbeating for grp 2GMON querying group 2 at 6 for pid 23, osid 5727NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asmdisk/oraasm-ASM_0000NOTE: F1X0 found on disk 0 au 2 fcn 0.0NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asmdisk/oraasm-ASM_0001NOTE: F1X0 found on disk 1 au 2 fcn 0.0NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asmdisk/oraasm-ASM_0002NOTE: F1X0 found on disk 2 au 2 fcn 0.0NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/asmd