公司有一套oracle 11g r2(11.2.0.4.0) rac 双节点数据库,存储使用的是hp eva4000,光纤直连。
由于存储故障,导致数据库宕机。存储修复完毕之后,GI启动失败。
模拟环境:centos 6.6 +oracle 11gr2(11.2.0.4.0)+GI,双节点
首先查看crsd 日志:
2017-10-17 09:34:30.961: [ CRSD][3873916704] Logging level for Module: OCRASM 1
2017-10-17 09:34:30.961: [ CRSMAIN][3873916704] Checking the OCR device
2017-10-17 09:34:30.961: [ CRSMAIN][3873916704] Sync-up with OCR
2017-10-17 09:34:30.961: [ CRSMAIN][3873916704] Connecting to the CSS Daemon
2017-10-17 09:34:30.961: [ CRSMAIN][3873916704] Getting local node number
2017-10-17 09:34:30.967: [ CRSMAIN][3867465472] Policy Engine is not initialized yet!
2017-10-17 09:34:30.967: [ CRSMAIN][3873916704] Initializing OCR
[ CLWAL][3873916704]clsw_Initialize: OLR initlevel [70000]
2017-10-17 09:34:31.290: [ OCRASM][3873916704]proprasmo: Error in open/create file in dg [DATA]
[ OCRASM][3873916704]SLOS : SLOS: cat=8, opn=kgfoOpen01, dep=15056, loc=kgfokge
2017-10-17 09:34:31.290: [ OCRASM][3873916704]ASM Error Stack :
2017-10-17 09:34:31.331: [ OCRASM][3873916704]proprasmo: kgfoCheckMount returned [6]
2017-10-17 09:34:31.331: [ OCRASM][3873916704]proprasmo: The ASM disk group DATA is not found or not mounted
2017-10-17 09:34:31.332: [ OCRRAW][3873916704]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2017-10-17 09:34:31.332: [ OCRRAW][3873916704]proprioo: No OCR/OLR devices are usable
2017-10-17 09:34:31.332: [ OCRASM][3873916704]proprasmcl: asmhandle is NULL
2017-10-17 09:34:31.333: [ GIPC][3873916704] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
2017-10-17 09:34:31.336: [ default][3873916704]clsvactversion:4: Retrieving Active Version from local storage.
2017-10-17 09:34:31.340: [ CSSCLNT][3873916704]clssgsgrppubdata: group (ocr_node-cluster) not found
存储故障,导致crs重启,然后初始化OCR失败。
磁盘恢复,发现集群每个节点都无法正常启动。
分许:
1 . 分析哪个阶段出现问题
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-