一、环境描述:
RedHat5.8 + ORACLE11204 + RAC
二、问题描述:
OCR(Oracle Cluster Registry)、Voting disk(Voting disks manage information about node membership)对应的物理磁盘损坏,从自动备份的OCR_VOTE集群服务无法正常启动,报错如下:
ohasd.log:
[ohasd(18298)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]]. Details at (:OHAS00106:) in /u01/app/11.2.0/grid/log/kawjrmdb001l/ohasd/ohasd.log.
[client(18359)]CRS-10001:CRS-10132: No msg for has:crs-10132 [10][60]
ossd.log
2014-09-10 14:48:29.907: [ CRSOCR][2428572496] OCR context init failure. Error: PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]
2014-09-10 14:48:29.908: [ default][2428572496] Created alert : (:OHAS00106:) : OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]
2014-09-10 14:48:29.908: [ default][2428572496][PANIC] OHASD exiting; Could not init OLR
三、问题分析:
11gR2开始,OCR、Voting disk存放于ASM磁盘组里,OCR是记录着集群的配置信息,VOTEDISK是集群的仲裁盘,二者都起着重启性作用。如果OCR VOTEDISK损坏,将无法启动集群服务包括数据库。好在集群软件会每隔4小时做一次备份,可以通过集群命令ocrconfig -showbackup来查看具体的备份文件。
OLR:OLR resides on every node in the cluster and manages Oracle Clusterware configuration information for each particular node
四、解决方法:
1. 查看自动备份的全路径:
$ ocrconfig -showbackup
2. 还原OCR、VOTING DISK
# crsctl stop crs -f
# /u01/app/11.2.0/grid/bin/ocrconfig -local -restore /u01/app/11.2.0/grid/cdata/kawjrmd-cluster/backup00.ocr
3. 启动集群进程
# crsctl start crs -excl
CRS无法启动,报错信息详见本文“问题描述”
4. 无法初始化OLR的解决
1. 删除OLR配置
$GRID_HOME/crs/install/rootcrs.pl -deconfig -force
Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node2. 执行root.sh脚本
# $GRID_HOME/root.sh (忽略任何报错信息)
./root.sh
Performing root user operation for Oracle 11gThe following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/gridEnter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.gipcd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.cssdmonitor' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.diskmon' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.diskmon' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cssd' on 'kawjrmdb001l' succeededASM created and started successfully.
Disk Group OCR_VOTE created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk a9be444f48c84facbfb04d9fbd60f955.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE a9be444f48c84facbfb04d9fbd60f955 (/dev/oracleasm/disks/OCR_VOTE) [OCR_VOTE]
Located 1 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.OCR_VOTE.dg' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.OCR_VOTE.dg' on 'kawjrmdb001l' succeeded
/u01/app/11.2.0/grid/bin/srvctl start nodeapps -n kawjrmdb001l ... failed
FirstNode configuration failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 9380.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
3. 关闭集群进程
# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.crsd' on 'kawjrmdb001l'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.OCR_VOTE.dg' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.OCR_VOTE.dg' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'kawjrmdb001l' has completed
CRS-2677: Stop of 'ora.crsd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.ctssd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.evmd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.evmd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.crf' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l' has completed
5. 还原OCR、VOTING DISK
1. 以独占模式启动CRS进程
crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.gipcd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.cssdmonitor' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.diskmon' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.diskmon' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.crsd' on 'kawjrmdb001l' succeeded2. 关闭crsd进程
crsctl stop resource ora.crsd -init
CRS-2673: Attempting to stop 'ora.crsd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.crsd' on 'kawjrmdb001l' succeeded3. 从备份中还原OCR
# /u01/app/11.2.0/grid/bin/ocrconfig -restore /u01/app/11.2.0/grid/cdata/kawjrmd-cluster/backup00.ocr
$ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3124
Available space (kbytes) : 258996
ID : 742521882
Device/File Name : +OCR_VOTE
Device/File integrity check succeededDevice/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
4. 重启CRS进程
# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.ctssd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l' has completed
CRS-4133: Oracle High Availability Services has been stopped.
# crsctl start crs <all nodes>
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE OFFLINE kawjrmdb001l
ONLINE OFFLINE kawjrmdb002l
ora.LISTENER.lsnr
ONLINE OFFLINE kawjrmdb001l
ONLINE OFFLINE kawjrmdb002l
ora.OCR_VOTE.dg
ONLINE ONLINE kawjrmdb001l
ONLINE ONLINE kawjrmdb002l
ora.asm
ONLINE ONLINE kawjrmdb001l Started
ONLINE ONLINE kawjrmdb002l Started
ora.gsd
OFFLINE OFFLINE kawjrmdb001l
OFFLINE OFFLINE kawjrmdb002l
ora.net1.network
ONLINE OFFLINE kawjrmdb001l
ONLINE OFFLINE kawjrmdb002l
ora.ons
ONLINE OFFLINE kawjrmdb001l
ONLINE OFFLINE kawjrmdb002l
ora.registry.acfs
ONLINE ONLINE kawjrmdb001l
ONLINE ONLINE kawjrmdb002l
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE OFFLINE
ora.cvu
1 ONLINE OFFLINE
ora.filesrv.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE OFFLINE Instance Shutdown
ora.fjrcpmis.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE OFFLINE Instance Shutdown
ora.kawjrmdb001l.vip
1 ONLINE OFFLINE
ora.kawjrmdb002l.vip
1 ONLINE OFFLINE
ora.oc4j
1 ONLINE ONLINE kawjrmdb001l
ora.scan1.vip
1 ONLINE OFFLINE
至此,OCR、VOTING DISK已经恢复完成,集群服务也顺利启动。
五、启示总结
关键性的设备或文件尽量要做冗余,如OCR、VOTING DISK,controlfile,redo logfile...
-------------------------------------------------------------------------------------------------
本文来自于我的技术博客 http://blog.csdn.net/robo23
转载请标注源文链接,否则追究法律责任!