OCR损坏RAC集群服务无法启动:CRS-0704、CRS-10132: No msg for has:crs-10132 [10][60]、Could not init OLR

一、环境描述:

 RedHat5.8 + ORACLE11204 + RAC

 

二、问题描述:

OCR(Oracle Cluster Registry)、Voting disk(Voting disks manage information about node membership)对应的物理磁盘损坏,从自动备份的OCR_VOTE集群服务无法正常启动,报错如下:

 

 ohasd.log:

[ohasd(18298)]CRS-0704:Oracle High Availability Service aborted due to Oracle Local Registry error [PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]]. Details at (:OHAS00106:) in /u01/app/11.2.0/grid/log/kawjrmdb001l/ohasd/ohasd.log.
[client(18359)]CRS-10001:CRS-10132: No msg for has:crs-10132 [10][60]

 

ossd.log

2014-09-10 14:48:29.907: [  CRSOCR][2428572496] OCR context init failure.  Error: PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]
2014-09-10 14:48:29.908: [ default][2428572496] Created alert : (:OHAS00106:) :  OLR initialization failed, error: PROCL-24: Error in the messaging layer Messaging error [gipcretAddressInUse] [20]
2014-09-10 14:48:29.908: [ default][2428572496][PANIC] OHASD exiting; Could not init OLR

三、问题分析:

11gR2开始,OCR、Voting disk存放于ASM磁盘组里,OCR是记录着集群的配置信息,VOTEDISK是集群的仲裁盘,二者都起着重启性作用。如果OCR VOTEDISK损坏,将无法启动集群服务包括数据库。好在集群软件会每隔4小时做一次备份,可以通过集群命令ocrconfig -showbackup来查看具体的备份文件。

OLR:OLR resides on every node in the cluster and manages Oracle Clusterware configuration information for each particular node

 

四、解决方法:

1. 查看自动备份的全路径:

$ ocrconfig -showbackup

2. 还原OCR、VOTING DISK

# crsctl stop crs -f

# /u01/app/11.2.0/grid/bin/ocrconfig -local -restore /u01/app/11.2.0/grid/cdata/kawjrmd-cluster/backup00.ocr

3. 启动集群进程

# crsctl start crs -excl

CRS无法启动,报错信息详见本文“问题描述”

4. 无法初始化OLR的解决

1. 删除OLR配置

$GRID_HOME/crs/install/rootcrs.pl -deconfig -force

Using configuration parameter file: ./crsconfig_params
PRCR-1119 : Failed to look up CRS resources of ora.cluster_vip_net1.type type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Removing Trace File Analyzer
Successfully deconfigured Oracle clusterware stack on this node

 
 

2. 执行root.sh脚本 

# $GRID_HOME/root.sh (忽略任何报错信息)

./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.gipcd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.cssdmonitor' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.diskmon' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.diskmon' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cssd' on 'kawjrmdb001l' succeeded

ASM created and started successfully.

Disk Group OCR_VOTE created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk a9be444f48c84facbfb04d9fbd60f955.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a9be444f48c84facbfb04d9fbd60f955 (/dev/oracleasm/disks/OCR_VOTE) [OCR_VOTE]
Located 1 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.OCR_VOTE.dg' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.OCR_VOTE.dg' on 'kawjrmdb001l' succeeded
/u01/app/11.2.0/grid/bin/srvctl start nodeapps -n kawjrmdb001l ... failed
FirstNode configuration failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 9380.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

 

3. 关闭集群进程

# crsctl stop crs

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.crsd' on 'kawjrmdb001l'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.OCR_VOTE.dg' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.OCR_VOTE.dg' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'kawjrmdb001l' has completed
CRS-2677: Stop of 'ora.crsd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.ctssd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.evmd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.evmd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.crf' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l' has completed

5. 还原OCR、VOTING DISK 

1. 以独占模式启动CRS进程

 crsctl start crs -excl

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.gipcd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.cssdmonitor' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.diskmon' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.diskmon' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'kawjrmdb001l'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'kawjrmdb001l'
CRS-2676: Start of 'ora.crsd' on 'kawjrmdb001l' succeeded

2. 关闭crsd进程 

crsctl stop resource ora.crsd -init

CRS-2673: Attempting to stop 'ora.crsd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.crsd' on 'kawjrmdb001l' succeeded

3. 从备份中还原OCR

# /u01/app/11.2.0/grid/bin/ocrconfig -restore /u01/app/11.2.0/grid/cdata/kawjrmd-cluster/backup00.ocr

$ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3124
         Available space (kbytes) :     258996
         ID                       :  742521882
         Device/File Name         :  +OCR_VOTE
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

4. 重启CRS进程

# crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.ctssd' on 'kawjrmdb001l'
CRS-2673: Attempting to stop 'ora.asm' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.ctssd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.asm' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.cssd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.drivers.acfs' on 'kawjrmdb001l' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'kawjrmdb001l' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'kawjrmdb001l'
CRS-2677: Stop of 'ora.gpnpd' on 'kawjrmdb001l' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'kawjrmdb001l' has completed
CRS-4133: Oracle High Availability Services has been stopped.

 

# crsctl start crs <all nodes>

$ crsctl stat res -t 

--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  OFFLINE      kawjrmdb001l                                
               ONLINE  OFFLINE      kawjrmdb002l                                
ora.LISTENER.lsnr
               ONLINE  OFFLINE      kawjrmdb001l                                
               ONLINE  OFFLINE      kawjrmdb002l                                
ora.OCR_VOTE.dg
               ONLINE  ONLINE       kawjrmdb001l                                
               ONLINE  ONLINE       kawjrmdb002l                                
ora.asm
               ONLINE  ONLINE       kawjrmdb001l             Started            
               ONLINE  ONLINE       kawjrmdb002l             Started            
ora.gsd
               OFFLINE OFFLINE      kawjrmdb001l                                
               OFFLINE OFFLINE      kawjrmdb002l                                
ora.net1.network
               ONLINE  OFFLINE      kawjrmdb001l                                
               ONLINE  OFFLINE      kawjrmdb002l                                
ora.ons
               ONLINE  OFFLINE      kawjrmdb001l                                
               ONLINE  OFFLINE      kawjrmdb002l                                
ora.registry.acfs
               ONLINE  ONLINE       kawjrmdb001l                                
               ONLINE  ONLINE       kawjrmdb002l                                
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  OFFLINE                                                  
ora.cvu
      1        ONLINE  OFFLINE                                                  
ora.filesrv.db
      1        ONLINE  OFFLINE                               Instance Shutdown  
      2        ONLINE  OFFLINE                               Instance Shutdown  
ora.fjrcpmis.db
      1        ONLINE  OFFLINE                               Instance Shutdown  
      2        ONLINE  OFFLINE                               Instance Shutdown  
ora.kawjrmdb001l.vip
      1        ONLINE  OFFLINE                                                  
ora.kawjrmdb002l.vip
      1        ONLINE  OFFLINE                                                  
ora.oc4j
      1        ONLINE  ONLINE       kawjrmdb001l                                
ora.scan1.vip
      1        ONLINE  OFFLINE                                                  

 

至此,OCR、VOTING DISK已经恢复完成,集群服务也顺利启动。

 

五、启示总结

关键性的设备或文件尽量要做冗余,如OCR、VOTING DISK,controlfile,redo logfile...

-------------------------------------------------------------------------------------------------

本文来自于我的技术博客 http://blog.csdn.net/robo23

转载请标注源文链接,否则追究法律责任!

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值