关于10gR2(10.2.0.1)clusterware的bug修复方法和对CRS进程的一些理解


10gR2(10.2.0.1)同时启动2个节点主机,经常会出现一些莫名的问题,会莫名其妙的变成OFFLINE和UNKNOWN(之前做完实验关机前,所有资源的状态都是ONLINE的),如:

 

[root@RAC1 raw]# crs_stat -t

Name           Type           Target    State    Host       

------------------------------------------------------------

ora....B1.inst application    ONLINE   OFFLINE              

ora....B2.inst application    ONLINE   OFFLINE              

ora.RACDB.db   application    ONLINE   UNKNOWN   rac1       

ora....SM1.asm application    ONLINE   UNKNOWN   rac1       

ora....C1.lsnr application    ONLINE   UNKNOWN   rac1       

ora.rac1.gsd   application    ONLINE   UNKNOWN   rac1       

ora.rac1.ons   application    ONLINE   UNKNOWN   rac1       

ora.rac1.vip   application   ONLINE    ONLINE   rac1       

ora....SM2.asm application    ONLINE   OFFLINE              

ora....C2.lsnr application    ONLINE   OFFLINE              

ora.rac2.gsd   application    ONLINE   ONLINE    rac2       

ora.rac2.ons   application    ONLINE   ONLINE    rac2       

ora.rac2.vip   application    ONLINE   UNKNOWN   rac1       

 

怀疑这个可能是10.2.0.1版本在crs上的一个bug,看到很多人在运用同一版本的时候,都遇到过类似的问题,都是靠升级到10.2.0.4或10.2.0.5解决的。其实不升级数据库版本,也是可以避免这个问题的,只要顺序开启节点1和节点2的crs,不要同时启动,就可以使所有资源ONLINE了。

 

首先关闭节点2的crs,然后关闭节点1的crs,关闭节点1crs的时候,顺便trace一下crsd.log日志文件的内容,有助于我们更清楚的了解整个crs关闭,启动,以及注册和注销资源的过程,加深对crs的理解。

(以下命令运行在已经关闭了节点2上的crs之后:)

[root@RAC1 raw]# crsctl stop crs

Stopping resources.

Successfully stopped CRS resources

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

 

克隆一个节点1的会话,并执行trace跟踪crsd.log:

[root@RAC1 ~]# tail -10f /u01/app/oracle/product/10.2.0/crs_1/log/rac1/crsd/crsd.log

 

2013-07-18 10:37:14.579: [  CRSRES][2666597264]0StopResource: setting CLIvalues

2013-07-18 10:37:14.583: [  CRSRES][2687576976]0StopResource: setting CLIvalues

2013-07-18 10:37:14.587: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:14.592: [  CRSRES][2666597264]0Attempting to stop`ora.rac1.gsd` on member `rac1`

2013-07-18 10:37:14.643: [  CRSRES][2687576976]0Attempting to stop`ora.rac1.ons` on member `rac1`

2013-07-18 10:37:14.666: [  CRSRES][2719046544]0Attempting to stop`ora.RACDB.db` on member `rac1`

2013-07-18 10:37:14.939: [  CRSRES][2719046544]0Stop of `ora.RACDB.db` onmember `rac1` succeeded.

2013-07-18 10:37:15.044: [  CRSRES][2666597264]0Stop of `ora.rac1.gsd` onmember `rac1` succeeded.

2013-07-18 10:37:15.048: [  CRSRES][2687576976]0Stop of `ora.rac1.ons` onmember `rac1` succeeded.

2013-07-18 10:37:15.054: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:15.065: [  CRSRES][2687576976]0StopResource: setting CLIvalues

2013-07-18 10:37:15.083: [  CRSRES][2719046544]0Attempting to stop`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`

2013-07-18 10:37:15.204: [  CRSRES][2687576976]0StopResource: setting CLIvalues

2013-07-18 10:37:15.256: [  CRSRES][2687576976]0StopResource: setting CLIvalues

2013-07-18 10:37:15.621: [  CRSRES][2687576976]0StopResource: setting CLIvalues

2013-07-18 10:37:15.650: [  CRSRES][2687576976]0Attempting to stop`ora.rac2.vip` on member `rac1`

2013-07-18 10:37:15.929: [  CRSRES][2687576976]0Stop of `ora.rac2.vip` onmember `rac1` succeeded.

2013-07-18 10:37:15.961: [  CRSRES][2719046544]0Stop of`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.

2013-07-18 10:37:15.967: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:15.969: [  CRSRES][2719046544]0Attempting to stop`ora.RACDB.RACDB1.inst` on member `rac1`

2013-07-18 10:37:22.261: [  CRSRES][2719046544]0Stop of`ora.RACDB.RACDB1.inst` on member `rac1` succeeded.

2013-07-18 10:37:22.269: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:22.337: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:22.515: [  CRSRES][2719046544]0Attempting to stop`ora.rac1.ASM1.asm` on member `rac1`

2013-07-18 10:37:23.894: [  CRSRES][2719046544]0Stop of`ora.rac1.ASM1.asm` on member `rac1` succeeded.

2013-07-18 10:37:23.906: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:23.954: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:23.967: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:23.991: [  CRSRES][2719046544]0StopResource: setting CLIvalues

2013-07-18 10:37:23.996: [  CRSRES][2719046544]0Attempting to stop`ora.rac1.vip` on member `rac1`

2013-07-18 10:37:24.971: [  CRSRES][2719046544]0Stop of `ora.rac1.vip` onmember `rac1` succeeded.

2013-07-18 10:37:24.985: [CSSCLNT][2855414672]clssgsGroupGetStatus: CSS shutting down.

 

2013-07-18 10:37:24.985: [CSSCLNT][2855414672]clssgsGroupGetStatus: returning 22

 

2013-07-18 10:37:24.985: [CRSMAIN][2855414672]0CRSD exiting on CSS shutdown request

2013-07-18 10:37:24.985: [default][2855414672]Terminating clsd session

 

从以上crsd.log日志的trace信息可以发现,单实例执行crsctl stop crs,各资源关闭的先后顺序依次为:ora.RACDB.db->ora.rac1.gsd->ora.rac1.ons->ora.rac1.LISTENER_RAC1.lsnr->ora.rac2.vip->ora.RACDB.RACDB1.inst->ora.rac1.ASM1.asm->ora.rac1.vip

因为是单实例,所以当关闭该节点crs时,数据库就首先关闭了(因为没有实例对其提供服务),这里关闭监听以后首先关闭的是节点2上的vip,是节点2出现故障并关闭以后failover它的vip到节点1的,如果节点2没有关闭的话,那么数据库是不应该立即关闭的(因为节点1crs关闭后,仍然有节点2上的crs可以保证有数据库实例为数据库提供服务)

 

[root@RAC1 raw]# crsctl start crs

Attempting to start CRS stack

The CRS stack will be started shortly

[root@RAC1 raw]# crsctl check crs

Failure 1 contacting CSS daemon

Cannot communicate with CRS

Cannot communicate with EVM

 

继续在克隆的会话中查看crsd.log的trace信息:

 

2013-07-18 10:50:58.397: [default][1299232][ENTER]0

Oracle Database 10g CRS Release 10.2.0.1.0Production Copyright 1996, 2004, Oracle. All rights reserved

2013-07-18 10:50:58.398: [default][1299232]0CRS Daemon Starting

2013-07-18 10:50:58.398: [CRSMAIN][1299232]0Checking the OCR device

2013-07-18 10:50:58.648: [CRSMAIN][1299232]0Connecting to the CSS Daemon

2013-07-18 10:50:59.277: [COMMCRS][120855440]clsc_connect: (0x8770410) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

 

2013-07-18 10:50:59.277: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:50:59.277: [  CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:51:00.948: [ COMMCRS][120855440]clsc_connect:(0x86dbec8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

 

2013-07-18 10:51:00.948: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:51:00.949: [  CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:51:02.477: [COMMCRS][120855440]clsc_connect: (0x872f8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

 

2013-07-18 10:51:02.477: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:51:02.477: [  CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:51:04.085: [COMMCRS][120855440]clsc_connect: (0x872f8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

 

2013-07-18 10:51:04.085: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:51:04.085: [  CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:51:08.452: [    CRSD][1299232]0Daemon Version: 10.2.0.1.0Active Version: 10.2.0.1.0

2013-07-18 10:51:08.452: [    CRSD][1299232]0Active Version and SoftwareVersion are same

2013-07-18 10:51:08.452: [CRSMAIN][1299232]0Initializing OCR

2013-07-18 10:51:08.865: [  OCRRAW][1299232]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)

2013-07-18 10:51:08.923: [  OCRMAS][3044309904]th_master:12: I AM THE NEWOCR MASTER at incar 1. Node Number = 1

2013-07-18 10:51:09.110: [  OCRRAW][3044309904]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)

2013-07-18 10:51:09.167: [    CRSD][1299232]0ENV Logging level forModule: allcomp  0

2013-07-18 10:51:09.168: [    CRSD][1299232]0ENV Logging level forModule: default  0

2013-07-18 10:51:09.170: [    CRSD][1299232]0ENV Logging level for Module:COMMCRS  0

2013-07-18 10:51:09.172: [    CRSD][1299232]0ENV Logging level forModule: COMMNS  0

2013-07-18 10:51:09.172: [    CRSD][1299232]0ENV Logging level forModule: CRSUI  0

2013-07-18 10:51:09.173: [    CRSD][1299232]0ENV Logging level forModule: CRSCOMM  0

2013-07-18 10:51:09.175: [    CRSD][1299232]0ENV Logging level forModule: CRSRTI  0

2013-07-18 10:51:09.176: [    CRSD][1299232]0ENV Logging level forModule: CRSMAIN  0

2013-07-18 10:51:09.176: [    CRSD][1299232]0ENV Logging level for Module:CRSPLACE  0

2013-07-18 10:51:09.177: [    CRSD][1299232]0ENV Logging level forModule: CRSAPP  0

2013-07-18 10:51:09.178: [    CRSD][1299232]0ENV Logging level forModule: CRSRES  0

2013-07-18 10:51:09.179: [    CRSD][1299232]0ENV Logging level for Module:CRSOCR  0

2013-07-18 10:51:09.180: [    CRSD][1299232]0ENV Logging level forModule: CRSTIMER  0

2013-07-18 10:51:09.180: [    CRSD][1299232]0ENV Logging level forModule: CRSEVT  0

2013-07-18 10:51:09.181: [    CRSD][1299232]0ENV Logging level for Module:CRSD  0

2013-07-18 10:51:09.182: [    CRSD][1299232]0ENV Logging level forModule: CLUCLS  0

2013-07-18 10:51:09.183: [    CRSD][1299232]0ENV Logging level forModule: OCRRAW  0

2013-07-18 10:51:09.184: [    CRSD][1299232]0ENV Logging level forModule: OCROSD  0

2013-07-18 10:51:09.185: [    CRSD][1299232]0ENV Logging level forModule: CSSCLNT  0

2013-07-18 10:51:09.186: [    CRSD][1299232]0ENV Logging level forModule: OCRAPI  0

2013-07-18 10:51:09.186: [    CRSD][1299232]0ENV Logging level forModule: OCRUTL  0

2013-07-18 10:51:09.187: [    CRSD][1299232]0ENV Logging level forModule: OCRMSG  0

2013-07-18 10:51:09.188: [    CRSD][1299232]0ENV Logging level forModule: OCRCLI  0

2013-07-18 10:51:09.190: [    CRSD][1299232]0ENV Logging level forModule: OCRCAC  0

2013-07-18 10:51:09.190: [    CRSD][1299232]0ENV Logging level forModule: OCRSRV  0

2013-07-18 10:51:09.191: [    CRSD][1299232]0ENV Logging level forModule: OCRMAS  0

2013-07-18 10:51:09.192: [CRSMAIN][1299232]0Filename is /u01/app/oracle/product/10.2.0/crs_1/crs/init/rac1.pid

[ clsdmt][2845002640]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CRSD))

2013-07-18 10:51:09.239: [CRSMAIN][1299232]0Using Authorizer location: /u01/app/oracle/product/10.2.0/crs_1/crs/auth/

2013-07-18 10:51:09.283: [CRSMAIN][1299232]0Initializing RTI

2013-07-18 10:51:09.283:[CRSTIMER][2824022928]0Timer Thread Starting.

2013-07-18 10:51:09.285: [  CRSRES][1299232]0Parameter SECURITY = 1,running in USER Mode

2013-07-18 10:51:09.285: [CRSMAIN][1299232]0Initializing EVMMgr

2013-07-18 10:51:10.443: [CRSMAIN][1299232]0CRSD locked during state recovery, please wait.

2013-07-18 10:51:10.576: [CRSMAIN][1299232]0CRSD recovered, unlocked.

2013-07-18 10:51:10.577: [CRSMAIN][1299232]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2013-07-18 10:51:10.601: [CRSMAIN][1299232]0CRSD UI socket on:(ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))

2013-07-18 10:51:10.603: [CRSMAIN][1299232]0E2E socket on:(ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49896))

2013-07-18 10:51:10.603: [CRSMAIN][1299232]0Starting Threads

2013-07-18 10:51:10.603: [CRSMAIN][1299232]0CRS Daemon Started.

2013-07-18 10:51:11.395: [  CRSRES][2719124368]0startRunnable: settingCLI values

2013-07-18 10:51:11.404: [  CRSRES][2719124368]0Attempting to start`ora.rac1.vip` on member `rac1`

2013-07-18 10:51:11.635: [  CRSRES][2708634512]0startRunnable: settingCLI values

2013-07-18 10:51:11.642: [  CRSRES][2708634512]0Attempting to start`ora.rac2.vip` on member `rac1`

2013-07-18 10:51:19.395: [  CRSRES][2708634512]0Start of `ora.rac2.vip`on member `rac1` succeeded.

2013-07-18 10:51:19.409: [  CRSRES][2719124368]0Start of `ora.rac1.vip`on member `rac1` succeeded.

2013-07-18 10:51:19.603: [  CRSRES][2719124368]0startRunnable: settingCLI values

2013-07-18 10:51:19.611: [  CRSRES][2719124368]0Attempting to start`ora.rac1.ASM1.asm` on member `rac1`

2013-07-18 10:51:19.740: [  OCRSRV][3023330192]th_select_handler: Failedto retrieve procctx from ht. constr = [149875736] retval lht [-27] Signal CV.

2013-07-18 10:51:28.855: [  CRSRES][2677164944]0startRunnable: settingCLI values

2013-07-18 10:51:28.869: [  CRSRES][2677164944]0Attempting to start`ora.rac1.ons` on member `rac1`

2013-07-18 10:51:30.435: [  CRSRES][2677164944]0Start of `ora.rac1.ons`on member `rac1` succeeded.

2013-07-18 10:51:30.516: [  CRSRES][2719124368]0Start of`ora.rac1.ASM1.asm` on member `rac1` succeeded.

2013-07-18 10:51:30.645: [  CRSRES][2719124368]0startRunnable: settingCLI values

2013-07-18 10:51:30.651: [  CRSRES][2719124368]0Attempting to start`ora.RACDB.RACDB1.inst` on member `rac1`

2013-07-18 10:51:39.617: [  OCRSRV][3023330192]th_select_handler: Failedto retrieve procctx from ht. constr = [151180672] retval lht [-27] Signal CV.

2013-07-18 10:52:03.746: [  CRSRES][2719124368]0Start of`ora.RACDB.RACDB1.inst` on member `rac1` succeeded.

2013-07-18 10:52:04.972: [  CRSRES][2719124368]0startRunnable: settingCLI values

2013-07-18 10:52:04.979: [  CRSRES][2719124368]0Attempting to start`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`

2013-07-18 10:52:07.030: [  CRSRES][2719124368]0Start of`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.

2013-07-18 10:52:07.035: [  CRSRES][2719124368]0Skip online resource:ora.rac1.ons

2013-07-18 10:52:08.714: [  CRSRES][2708634512]0startRunnable: settingCLI values

2013-07-18 10:52:09.700: [  CRSRES][2687654800]0startRunnable: settingCLI values

2013-07-18 10:52:10.388: [  CRSRES][2708634512]0Attempting to start`ora.rac1.gsd` on member `rac1`

2013-07-18 10:52:10.714: [  CRSRES][2687654800]0Attempting to start`ora.RACDB.db` on member `rac1`

2013-07-18 10:52:14.828: [  CRSRES][2708634512]0Start of `ora.rac1.gsd`on member `rac1` succeeded.

2013-07-18 10:52:14.843: [  CRSRES][2687654800]0Start of `ora.RACDB.db`on member `rac1` succeeded.

2013-07-18 10:52:24.856: [  CRSRES][2687654800]0CRS-1002: Resource'ora.RACDB.db' is already running on member 'rac1'

 

2013-07-18 10:52:24.860: [  CRSRES][2719124368]0CRS-1002: Resource'ora.rac1.LISTENER_RAC1.lsnr' is already running on member 'rac1'

 

分析下整个流程:显示版本信息->开启crsd守护进程->检测OCRdevice->连接cssd守护进程->检查版本信息->初始化OCR(因为OCR存放RAC中所有节点的信息,非常重要)并确定磁盘路径等信息->统计vote票数“my votes (2), total votes (2)”(此处我的理解是:自己的票数=总的票数才不会被整个RACkill,此处有一个“I AM THE NEW OCR MASTER”)->log一系列OENV并初始化几个重要进程->开启crs资源。

再来看下资源的启动顺序:首先依次启动2个vip(vip1,vip2),其中一个是从节点2上failover过来的,然后依次是asm->inst->lsnr->ons->gsd->db

注意,数据库资源是放在最后才启动的,很有意思,和之前关闭顺序正好是相反的。

 

查看一下现在的crs资源状态,节点1上的全部资源都启动了,现在就可以作为一个单实例的RAC正常维护数据库了。

[root@RAC1 raw]# crs_stat -t

Name           Type           Target    State    Host        

------------------------------------------------------------

ora....B1.inst application    ONLINE   ONLINE    rac1       

ora....B2.inst application    ONLINE   OFFLINE              

ora.RACDB.db   application    ONLINE   ONLINE    rac1       

ora....SM1.asm application    ONLINE   ONLINE    rac1       

ora....C1.lsnr application    ONLINE   ONLINE    rac1       

ora.rac1.gsd   application    ONLINE   ONLINE    rac1       

ora.rac1.ons   application    ONLINE   ONLINE    rac1       

ora.rac1.vip   application    ONLINE   ONLINE    rac1       

ora....SM2.asm application    ONLINE   OFFLINE              

ora....C2.lsnr application    ONLINE   OFFLINE              

ora.rac2.gsd   application    ONLINE   OFFLINE              

ora.rac2.ons   application    ONLINE   OFFLINE              

ora.rac2.vip   application    ONLINE   ONLINE    rac1  

 

此时开启节点2的crs并trace相应目录下crsd.log的过程

 

[root@RAC1 raw]# crsctl start crs

Attempting to start CRS stack

The CRS stack will be started shortly

[root@RAC1 raw]# crsctl check crs

Failure 1 contacting CSS daemon

Cannot communicate with CRS

Cannot communicate with EVM

 

[root@RAC2 ~]# tail -10f/u01/app/oracle/product/10.2.0/crs_1/log/rac2/crsd/crsd.log

2013-07-18 10:30:28.654: [  CRSEVT][2730077072]0CAAMonitorHandler ::0:Action Script /u01/app/oracle/product/10.2.0/crs_1/bin/racgwrap(stop)timed out for ora.rac2.ons! (timeout=600)

2013-07-18 10:30:28.654: [  CRSAPP][2730077072]0StopResource error forora.rac2.ons error code = -2

2013-07-18 10:30:28.678: [  CRSRES][2730077072][ALERT]0`ora.rac2.ons` onmember `rac2` has experienced an unrecoverable failure.

2013-07-18 10:30:28.679: [  CRSRES][2730077072]0Human interventionrequired to resume its availability.

2013-07-18 10:30:28.692: [CSSCLNT][2876935056]clssgsGroupGetStatus: CSS shutting down.

 

2013-07-18 10:30:28.692: [CSSCLNT][2876935056]clssgsGroupGetStatus: returning 22

 

2013-07-18 10:30:28.692: [CRSMAIN][2876935056]0CRSD exiting on CSS shutdown request

2013-07-18 10:30:28.693: [default][2876935056]Terminating clsd session

 

(以上为之前残留在crsd.log文件中的关闭rac2上的crs的一些信息)

 

2013-07-18 10:56:33.643: [default][3740448][ENTER]0

Oracle Database 10g CRS Release 10.2.0.1.0Production Copyright 1996, 2004, Oracle. All rights reserved

2013-07-18 10:56:33.643: [default][3740448]0CRS Daemon Starting

2013-07-18 10:56:33.644: [CRSMAIN][3740448]0Checking the OCR device

2013-07-18 10:56:33.975: [CRSMAIN][3740448]0Connecting to the CSS Daemon

2013-07-18 10:56:34.569: [COMMCRS][35040144]clsc_connect: (0xa6bb410) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

 

2013-07-18 10:56:34.569: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:56:34.569: [  CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:56:36.240: [COMMCRS][35040144]clsc_connect: (0xa626ec8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

 

2013-07-18 10:56:36.240: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:56:36.241: [  CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:56:37.848: [COMMCRS][35040144]clsc_connect: (0xa67a8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

 

2013-07-18 10:56:37.848: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9

 

2013-07-18 10:56:37.848: [  CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

2013-07-18 10:56:43.473: [    CRSD][3740448]0Daemon Version: 10.2.0.1.0Active Version: 10.2.0.1.0

2013-07-18 10:56:43.473: [    CRSD][3740448]0Active Version and SoftwareVersion are same

2013-07-18 10:56:43.473: [CRSMAIN][3740448]0Initializing OCR

2013-07-18 10:56:43.670: [  OCRRAW][3740448]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)

2013-07-18 10:56:43.790: [    CRSD][3740448]0ENV Logging level forModule: allcomp  0

2013-07-18 10:56:43.798: [    CRSD][3740448]0ENV Logging level forModule: default  0

2013-07-18 10:56:43.908: [    CRSD][3740448]0ENV Logging level forModule: COMMCRS  0

2013-07-18 10:56:43.916: [    CRSD][3740448]0ENV Logging level forModule: COMMNS  0

2013-07-18 10:56:44.088: [    CRSD][3740448]0ENV Logging level forModule: CRSUI  0

2013-07-18 10:56:44.097: [    CRSD][3740448]0ENV Logging level forModule: CRSCOMM  0

2013-07-18 10:56:44.208: [    CRSD][3740448]0ENV Logging level forModule: CRSRTI  0

2013-07-18 10:56:44.216: [    CRSD][3740448]0ENV Logging level forModule: CRSMAIN  0

2013-07-18 10:56:44.347: [    CRSD][3740448]0ENV Logging level forModule: CRSPLACE  0

2013-07-18 10:56:44.355: [    CRSD][3740448]0ENV Logging level forModule: CRSAPP  0

2013-07-18 10:56:44.465: [    CRSD][3740448]0ENV Logging level forModule: CRSRES  0

2013-07-18 10:56:44.474: [    CRSD][3740448]0ENV Logging level forModule: CRSOCR  0

2013-07-18 10:56:44.603: [    CRSD][3740448]0ENV Logging level forModule: CRSTIMER  0

2013-07-18 10:56:44.612: [    CRSD][3740448]0ENV Logging level forModule: CRSEVT  0

2013-07-18 10:56:44.721: [    CRSD][3740448]0ENV Logging level forModule: CRSD  0

2013-07-18 10:56:44.734: [    CRSD][3740448]0ENV Logging level forModule: CLUCLS  0

2013-07-18 10:56:44.839: [    CRSD][3740448]0ENV Logging level forModule: OCRRAW  0

2013-07-18 10:56:44.848: [    CRSD][3740448]0ENV Logging level forModule: OCROSD  0

2013-07-18 10:56:44.957: [    CRSD][3740448]0ENV Logging level forModule: CSSCLNT  0

2013-07-18 10:56:44.965: [    CRSD][3740448]0ENV Logging level forModule: OCRAPI  0

2013-07-18 10:56:45.000: [    CRSD][3740448]0ENV Logging level forModule: OCRUTL  0

2013-07-18 10:56:45.010: [    CRSD][3740448]0ENV Logging level forModule: OCRMSG  0

2013-07-18 10:56:45.020: [    CRSD][3740448]0ENV Logging level forModule: OCRCLI  0

2013-07-18 10:56:45.029: [    CRSD][3740448]0ENV Logging level forModule: OCRCAC  0

2013-07-18 10:56:45.038: [    CRSD][3740448]0ENV Logging level forModule: OCRSRV  0

2013-07-18 10:56:45.048: [    CRSD][3740448]0ENV Logging level for Module:OCRMAS  0

2013-07-18 10:56:45.048: [CRSMAIN][3740448]0Filename is /u01/app/oracle/product/10.2.0/crs_1/crs/init/rac2.pid

[ clsdmt][2886654864]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))

2013-07-18 10:56:45.104: [CRSMAIN][3740448]0Using Authorizer location: /u01/app/oracle/product/10.2.0/crs_1/crs/auth/

2013-07-18 10:56:45.627: [CRSMAIN][3740448]0Initializing RTI

2013-07-18 10:56:45.627:[CRSTIMER][2865675152]0Timer Thread Starting.

2013-07-18 10:56:45.628: [  CRSRES][3740448]0Parameter SECURITY = 1,running in USER Mode

2013-07-18 10:56:45.628: [CRSMAIN][3740448]0Initializing EVMMgr

2013-07-18 10:56:45.812: [COMMCRS][2855185296]clsc_connect: (0xabaaea0) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

 

2013-07-18 10:56:46.497: [COMMCRS][2855185296]clsc_connect: (0xabaafb0) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

 

2013-07-18 10:56:48.832: [CRSMAIN][3740448]0CRSD locked during state recovery, please wait.

2013-07-18 10:56:48.964: [ CRSMAIN][3740448]0CRSDrecovered, unlocked.

2013-07-18 10:56:48.964: [CRSMAIN][3740448]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2013-07-18 10:56:49.007: [CRSMAIN][3740448]0CRSD UI socket on:(ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))

2013-07-18 10:56:49.046: [CRSMAIN][3740448]0E2E socket on:(ADDRESS=(PROTOCOL=tcp)(HOST=rac2-priv)(PORT=49896))

2013-07-18 10:56:49.046: [CRSMAIN][3740448]0Starting Threads

2013-07-18 10:56:49.046: [CRSMAIN][3740448]0CRS Daemon Started.

2013-07-18 10:56:50.511: [  CRSRES][2760776592]0StopResource: setting CLIvalues

2013-07-18 10:56:50.581: [  CRSRES][2760776592]0Attempting to stop`ora.rac2.vip` on member `rac1`

2013-07-18 10:56:50.836: [  CRSRES][2760776592]0Stop of `ora.rac2.vip` onmember `rac1` succeeded.

2013-07-18 10:56:50.845: [  CRSRES][2760776592]0startRunnable: settingCLI values

2013-07-18 10:56:50.846: [  CRSRES][2760776592]0Attempting to start`ora.rac2.vip` on member `rac2`

2013-07-18 10:56:58.252: [  CRSRES][2760776592]0Start of `ora.rac2.vip` onmember `rac2` succeeded.

2013-07-18 10:56:59.079: [  CRSRES][2760776592]0startRunnable: settingCLI values

2013-07-18 10:56:59.138: [  CRSRES][2760776592]0Attempting to start`ora.rac2.ASM2.asm` on member `rac2`

2013-07-18 10:57:11.612: [  OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [183125624] retval lht [-27] Signal CV.

2013-07-18 10:57:21.767: [  CRSRES][2729307024]0startRunnable: settingCLI values

2013-07-18 10:57:21.812: [  CRSRES][2729307024]0Attempting to start`ora.rac2.ons` on member `rac2`

2013-07-18 10:57:23.276: [  CRSRES][2729307024]0Start of `ora.rac2.ons`on member `rac2` succeeded.

2013-07-18 10:57:23.325: [  CRSRES][2760776592]0Start of`ora.rac2.ASM2.asm` on member `rac2` succeeded.

2013-07-18 10:57:24.094: [  CRSRES][2760776592]0startRunnable: settingCLI values

2013-07-18 10:57:24.130: [  CRSRES][2760776592]0Attempting to start`ora.RACDB.RACDB2.inst` on member `rac2`

2013-07-18 10:57:37.294: [  OCRUTL][2949594000]u_freem: mem passed isnull

2013-07-18 10:57:37.449: [  OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [182966616] retval lht [-27] Signal CV.

2013-07-18 10:58:02.769: [  CRSRES][2760776592]0Start of`ora.RACDB.RACDB2.inst` on member `rac2` succeeded.

2013-07-18 10:58:03.641: [  CRSRES][2760776592]0startRunnable: settingCLI values

2013-07-18 10:58:03.712: [  CRSRES][2760776592]0Attempting to start`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`

2013-07-18 10:58:04.926: [  CRSRES][2760776592]0Start of `ora.rac2.LISTENER_RAC2.lsnr`on member `rac2` succeeded.

2013-07-18 10:58:04.929: [  CRSRES][2750286736]0Skip online resource:ora.rac2.ons

2013-07-18 10:58:05.348: [  CRSRES][2760776592]0startRunnable: settingCLI values

2013-07-18 10:58:05.384: [  CRSRES][2760776592]0Attempting to start`ora.rac2.gsd` on member `rac2`

2013-07-18 10:58:06.175: [  CRSRES][2760776592]0Start of `ora.rac2.gsd`on member `rac2` succeeded.

2013-07-18 10:58:07.769: [  OCRUTL][2991553424]u_freem: mem passed isnull

2013-07-18 10:58:16.245: [  CRSRES][2750286736]0CRS-1002: Resource'ora.rac2.LISTENER_RAC2.lsnr' is already running on member 'rac2'

 

以上内容中,除了没有“IAM THE NEW OCR MASTER”之外,和节点1上crs启动时基本一致

此时再查看全部crs资源状态,发现全部是ONLINE了

 

[root@RAC1 raw]# crs_stat -t

Name           Type           Target    State    Host       

------------------------------------------------------------

ora....B1.inst application    ONLINE   ONLINE    rac1       

ora....B2.inst application    ONLINE   ONLINE    rac2       

ora.RACDB.db   application    ONLINE   ONLINE    rac1       

ora....SM1.asm application    ONLINE   ONLINE    rac1       

ora....C1.lsnr application    ONLINE   ONLINE    rac1       

ora.rac1.gsd   application    ONLINE   ONLINE    rac1       

ora.rac1.ons   application    ONLINE   ONLINE    rac1       

ora.rac1.vip   application    ONLINE   ONLINE    rac1       

ora....SM2.asm application    ONLINE   ONLINE    rac2       

ora....C2.lsnr application    ONLINE   ONLINE    rac2       

ora.rac2.gsd   application    ONLINE   ONLINE    rac2       

ora.rac2.ons   application    ONLINE   ONLINE    rac2       

ora.rac2.vip   application    ONLINE   ONLINE    rac2       

[root@RAC1 raw]#

 

再来测试以下之前的理论(关闭RAC中最后一个节点上的crs关闭时,数据库资源首先被回收)是否正确,再停止节点2上的crs一次,看看数据库资源是不是会从crs资源中被注销

 

[root@RAC2 ~]# crsctl stop crs

Stopping resources.

Successfully stopped CRS resources

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

 

2013-07-18 11:17:40.572: [  OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [182558112] retval lht [-27] Signal CV.

2013-07-18 11:19:27.914: [  CRSRES][2760776592]0StopResource: setting CLIvalues

2013-07-18 11:19:27.949: [  CRSRES][2760776592]0Attempting to stop`ora.rac2.gsd` on member `rac2`

2013-07-18 11:19:28.066: [  CRSRES][2760776592]0Stop of `ora.rac2.gsd` onmember `rac2` succeeded.

2013-07-18 11:19:28.196: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:28.204: [  CRSRES][2729307024]0Attempting to stop`ora.rac2.ons` on member `rac2`

2013-07-18 11:19:28.550: [  CRSRES][2729307024]0Stop of `ora.rac2.ons` onmember `rac2` succeeded.

2013-07-18 11:19:28.805: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:28.821: [  CRSRES][2729307024]0Attempting to stop`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`

2013-07-18 11:19:29.737: [  CRSRES][2729307024]0Stop of`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.

2013-07-18 11:19:29.993: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:30.032: [  CRSRES][2729307024]0Attempting to stop`ora.RACDB.RACDB2.inst` on member `rac2`

2013-07-18 11:19:31.691: [  CRSRES][2729307024]0Stop of`ora.RACDB.RACDB2.inst` on member `rac2` succeeded.

2013-07-18 11:19:32.124: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:32.252: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:32.261: [  CRSRES][2729307024]0Attempting to stop`ora.rac2.ASM2.asm` on member `rac2`

2013-07-18 11:19:33.963: [  CRSRES][2729307024]0Stop of`ora.rac2.ASM2.asm` on member `rac2` succeeded.

2013-07-18 11:19:34.710: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:36.269: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:36.407: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:36.622: [  CRSRES][2729307024]0StopResource: setting CLIvalues

2013-07-18 11:19:37.069: [  CRSRES][2729307024]0Attempting to stop`ora.rac2.vip` on member `rac2`

2013-07-18 11:19:37.715: [  CRSRES][2729307024]0Stop of `ora.rac2.vip` onmember `rac2` succeeded.

2013-07-18 11:19:37.863: [CSSCLNT][2897144720]clssgsGroupGetStatus: CSS shutting down.

 

2013-07-18 11:19:37.863: [CSSCLNT][2897144720]clssgsGroupGetStatus: returning 22

 

2013-07-18 11:19:37.863: [CRSMAIN][2897144720]0CRSD exiting on CSS shutdown request

2013-07-18 11:19:37.864: [default][2897144720]Terminating clsd session

 

从以上信息中可以了解到,数据库资源并没有被回收,此时仍然由节点1上的数据库实例对其进行服务,其他的停止顺序和之前节点1上的是一致的:gsd->ons->lsnr->inst->asm->vip

注意:此时只有一个vip资源,并且只是节点2自己的,而没有节点1的vip,作为2节点的RAC,此时节点1仍然是正常工作的,并不需要failover到节点2上。

 

总结:碰到crs故障,crsdlog文件内容有助于我们分析整个crs流程,如果是非crs自身问题造成的RAC故障,如ocr磁盘,voting disk故障或者interconnect故障造成的脑裂,可以通过查看其他系列的log文件进行分析,从而一一排错,这里就不列举了。

-------------------------------------------------------------------------------------------------------

By aaron8219 Chinaunix Blog:http://blog.chinaunix.net/uid/24612962.html

原创内容,转载请注明链接,谢谢!

http://blog.csdn.net/aaron8219/article/details/11935429

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值