10gR2(10.2.0.1)同时启动2个节点主机,经常会出现一些莫名的问题,会莫名其妙的变成OFFLINE和UNKNOWN(之前做完实验关机前,所有资源的状态都是ONLINE的),如:
[root@RAC1 raw]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....B1.inst application ONLINE OFFLINE
ora....B2.inst application ONLINE OFFLINE
ora.RACDB.db application ONLINE UNKNOWN rac1
ora....SM1.asm application ONLINE UNKNOWN rac1
ora....C1.lsnr application ONLINE UNKNOWN rac1
ora.rac1.gsd application ONLINE UNKNOWN rac1
ora.rac1.ons application ONLINE UNKNOWN rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE OFFLINE
ora....C2.lsnr application ONLINE OFFLINE
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE UNKNOWN rac1
怀疑这个可能是10.2.0.1版本在crs上的一个bug,看到很多人在运用同一版本的时候,都遇到过类似的问题,都是靠升级到10.2.0.4或10.2.0.5解决的。其实不升级数据库版本,也是可以避免这个问题的,只要顺序开启节点1和节点2的crs,不要同时启动,就可以使所有资源ONLINE了。
首先关闭节点2的crs,然后关闭节点1的crs,关闭节点1crs的时候,顺便trace一下crsd.log日志文件的内容,有助于我们更清楚的了解整个crs关闭,启动,以及注册和注销资源的过程,加深对crs的理解。
(以下命令运行在已经关闭了节点2上的crs之后:)
[root@RAC1 raw]# crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
克隆一个节点1的会话,并执行trace跟踪crsd.log:
[root@RAC1 ~]# tail -10f /u01/app/oracle/product/10.2.0/crs_1/log/rac1/crsd/crsd.log
2013-07-18 10:37:14.579: [ CRSRES][2666597264]0StopResource: setting CLIvalues
2013-07-18 10:37:14.583: [ CRSRES][2687576976]0StopResource: setting CLIvalues
2013-07-18 10:37:14.587: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:14.592: [ CRSRES][2666597264]0Attempting to stop`ora.rac1.gsd` on member `rac1`
2013-07-18 10:37:14.643: [ CRSRES][2687576976]0Attempting to stop`ora.rac1.ons` on member `rac1`
2013-07-18 10:37:14.666: [ CRSRES][2719046544]0Attempting to stop`ora.RACDB.db` on member `rac1`
2013-07-18 10:37:14.939: [ CRSRES][2719046544]0Stop of `ora.RACDB.db` onmember `rac1` succeeded.
2013-07-18 10:37:15.044: [ CRSRES][2666597264]0Stop of `ora.rac1.gsd` onmember `rac1` succeeded.
2013-07-18 10:37:15.048: [ CRSRES][2687576976]0Stop of `ora.rac1.ons` onmember `rac1` succeeded.
2013-07-18 10:37:15.054: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:15.065: [ CRSRES][2687576976]0StopResource: setting CLIvalues
2013-07-18 10:37:15.083: [ CRSRES][2719046544]0Attempting to stop`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
2013-07-18 10:37:15.204: [ CRSRES][2687576976]0StopResource: setting CLIvalues
2013-07-18 10:37:15.256: [ CRSRES][2687576976]0StopResource: setting CLIvalues
2013-07-18 10:37:15.621: [ CRSRES][2687576976]0StopResource: setting CLIvalues
2013-07-18 10:37:15.650: [ CRSRES][2687576976]0Attempting to stop`ora.rac2.vip` on member `rac1`
2013-07-18 10:37:15.929: [ CRSRES][2687576976]0Stop of `ora.rac2.vip` onmember `rac1` succeeded.
2013-07-18 10:37:15.961: [ CRSRES][2719046544]0Stop of`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
2013-07-18 10:37:15.967: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:15.969: [ CRSRES][2719046544]0Attempting to stop`ora.RACDB.RACDB1.inst` on member `rac1`
2013-07-18 10:37:22.261: [ CRSRES][2719046544]0Stop of`ora.RACDB.RACDB1.inst` on member `rac1` succeeded.
2013-07-18 10:37:22.269: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:22.337: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:22.515: [ CRSRES][2719046544]0Attempting to stop`ora.rac1.ASM1.asm` on member `rac1`
2013-07-18 10:37:23.894: [ CRSRES][2719046544]0Stop of`ora.rac1.ASM1.asm` on member `rac1` succeeded.
2013-07-18 10:37:23.906: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:23.954: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:23.967: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:23.991: [ CRSRES][2719046544]0StopResource: setting CLIvalues
2013-07-18 10:37:23.996: [ CRSRES][2719046544]0Attempting to stop`ora.rac1.vip` on member `rac1`
2013-07-18 10:37:24.971: [ CRSRES][2719046544]0Stop of `ora.rac1.vip` onmember `rac1` succeeded.
2013-07-18 10:37:24.985: [CSSCLNT][2855414672]clssgsGroupGetStatus: CSS shutting down.
2013-07-18 10:37:24.985: [CSSCLNT][2855414672]clssgsGroupGetStatus: returning 22
2013-07-18 10:37:24.985: [CRSMAIN][2855414672]0CRSD exiting on CSS shutdown request
2013-07-18 10:37:24.985: [default][2855414672]Terminating clsd session
从以上crsd.log日志的trace信息可以发现,单实例执行crsctl stop crs,各资源关闭的先后顺序依次为:ora.RACDB.db->ora.rac1.gsd->ora.rac1.ons->ora.rac1.LISTENER_RAC1.lsnr->ora.rac2.vip->ora.RACDB.RACDB1.inst->ora.rac1.ASM1.asm->ora.rac1.vip
因为是单实例,所以当关闭该节点crs时,数据库就首先关闭了(因为没有实例对其提供服务),这里关闭监听以后首先关闭的是节点2上的vip,是节点2出现故障并关闭以后failover它的vip到节点1的,如果节点2没有关闭的话,那么数据库是不应该立即关闭的(因为节点1crs关闭后,仍然有节点2上的crs可以保证有数据库实例为数据库提供服务)
[root@RAC1 raw]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@RAC1 raw]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
继续在克隆的会话中查看crsd.log的trace信息:
2013-07-18 10:50:58.397: [default][1299232][ENTER]0
Oracle Database 10g CRS Release 10.2.0.1.0Production Copyright 1996, 2004, Oracle. All rights reserved
2013-07-18 10:50:58.398: [default][1299232]0CRS Daemon Starting
2013-07-18 10:50:58.398: [CRSMAIN][1299232]0Checking the OCR device
2013-07-18 10:50:58.648: [CRSMAIN][1299232]0Connecting to the CSS Daemon
2013-07-18 10:50:59.277: [COMMCRS][120855440]clsc_connect: (0x8770410) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2013-07-18 10:50:59.277: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9
2013-07-18 10:50:59.277: [ CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:51:00.948: [ COMMCRS][120855440]clsc_connect:(0x86dbec8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2013-07-18 10:51:00.948: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9
2013-07-18 10:51:00.949: [ CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:51:02.477: [COMMCRS][120855440]clsc_connect: (0x872f8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2013-07-18 10:51:02.477: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9
2013-07-18 10:51:02.477: [ CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:51:04.085: [COMMCRS][120855440]clsc_connect: (0x872f8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2013-07-18 10:51:04.085: [CSSCLNT][1299232]clsssInitNative: connect failed, rc 9
2013-07-18 10:51:04.085: [ CRSRTI][1299232]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:51:08.452: [ CRSD][1299232]0Daemon Version: 10.2.0.1.0Active Version: 10.2.0.1.0
2013-07-18 10:51:08.452: [ CRSD][1299232]0Active Version and SoftwareVersion are same
2013-07-18 10:51:08.452: [CRSMAIN][1299232]0Initializing OCR
2013-07-18 10:51:08.865: [ OCRRAW][1299232]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)
2013-07-18 10:51:08.923: [ OCRMAS][3044309904]th_master:12: I AM THE NEWOCR MASTER at incar 1. Node Number = 1
2013-07-18 10:51:09.110: [ OCRRAW][3044309904]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)
2013-07-18 10:51:09.167: [ CRSD][1299232]0ENV Logging level forModule: allcomp 0
2013-07-18 10:51:09.168: [ CRSD][1299232]0ENV Logging level forModule: default 0
2013-07-18 10:51:09.170: [ CRSD][1299232]0ENV Logging level for Module:COMMCRS 0
2013-07-18 10:51:09.172: [ CRSD][1299232]0ENV Logging level forModule: COMMNS 0
2013-07-18 10:51:09.172: [ CRSD][1299232]0ENV Logging level forModule: CRSUI 0
2013-07-18 10:51:09.173: [ CRSD][1299232]0ENV Logging level forModule: CRSCOMM 0
2013-07-18 10:51:09.175: [ CRSD][1299232]0ENV Logging level forModule: CRSRTI 0
2013-07-18 10:51:09.176: [ CRSD][1299232]0ENV Logging level forModule: CRSMAIN 0
2013-07-18 10:51:09.176: [ CRSD][1299232]0ENV Logging level for Module:CRSPLACE 0
2013-07-18 10:51:09.177: [ CRSD][1299232]0ENV Logging level forModule: CRSAPP 0
2013-07-18 10:51:09.178: [ CRSD][1299232]0ENV Logging level forModule: CRSRES 0
2013-07-18 10:51:09.179: [ CRSD][1299232]0ENV Logging level for Module:CRSOCR 0
2013-07-18 10:51:09.180: [ CRSD][1299232]0ENV Logging level forModule: CRSTIMER 0
2013-07-18 10:51:09.180: [ CRSD][1299232]0ENV Logging level forModule: CRSEVT 0
2013-07-18 10:51:09.181: [ CRSD][1299232]0ENV Logging level for Module:CRSD 0
2013-07-18 10:51:09.182: [ CRSD][1299232]0ENV Logging level forModule: CLUCLS 0
2013-07-18 10:51:09.183: [ CRSD][1299232]0ENV Logging level forModule: OCRRAW 0
2013-07-18 10:51:09.184: [ CRSD][1299232]0ENV Logging level forModule: OCROSD 0
2013-07-18 10:51:09.185: [ CRSD][1299232]0ENV Logging level forModule: CSSCLNT 0
2013-07-18 10:51:09.186: [ CRSD][1299232]0ENV Logging level forModule: OCRAPI 0
2013-07-18 10:51:09.186: [ CRSD][1299232]0ENV Logging level forModule: OCRUTL 0
2013-07-18 10:51:09.187: [ CRSD][1299232]0ENV Logging level forModule: OCRMSG 0
2013-07-18 10:51:09.188: [ CRSD][1299232]0ENV Logging level forModule: OCRCLI 0
2013-07-18 10:51:09.190: [ CRSD][1299232]0ENV Logging level forModule: OCRCAC 0
2013-07-18 10:51:09.190: [ CRSD][1299232]0ENV Logging level forModule: OCRSRV 0
2013-07-18 10:51:09.191: [ CRSD][1299232]0ENV Logging level forModule: OCRMAS 0
2013-07-18 10:51:09.192: [CRSMAIN][1299232]0Filename is /u01/app/oracle/product/10.2.0/crs_1/crs/init/rac1.pid
[ clsdmt][2845002640]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CRSD))
2013-07-18 10:51:09.239: [CRSMAIN][1299232]0Using Authorizer location: /u01/app/oracle/product/10.2.0/crs_1/crs/auth/
2013-07-18 10:51:09.283: [CRSMAIN][1299232]0Initializing RTI
2013-07-18 10:51:09.283:[CRSTIMER][2824022928]0Timer Thread Starting.
2013-07-18 10:51:09.285: [ CRSRES][1299232]0Parameter SECURITY = 1,running in USER Mode
2013-07-18 10:51:09.285: [CRSMAIN][1299232]0Initializing EVMMgr
2013-07-18 10:51:10.443: [CRSMAIN][1299232]0CRSD locked during state recovery, please wait.
2013-07-18 10:51:10.576: [CRSMAIN][1299232]0CRSD recovered, unlocked.
2013-07-18 10:51:10.577: [CRSMAIN][1299232]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2013-07-18 10:51:10.601: [CRSMAIN][1299232]0CRSD UI socket on:(ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2013-07-18 10:51:10.603: [CRSMAIN][1299232]0E2E socket on:(ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49896))
2013-07-18 10:51:10.603: [CRSMAIN][1299232]0Starting Threads
2013-07-18 10:51:10.603: [CRSMAIN][1299232]0CRS Daemon Started.
2013-07-18 10:51:11.395: [ CRSRES][2719124368]0startRunnable: settingCLI values
2013-07-18 10:51:11.404: [ CRSRES][2719124368]0Attempting to start`ora.rac1.vip` on member `rac1`
2013-07-18 10:51:11.635: [ CRSRES][2708634512]0startRunnable: settingCLI values
2013-07-18 10:51:11.642: [ CRSRES][2708634512]0Attempting to start`ora.rac2.vip` on member `rac1`
2013-07-18 10:51:19.395: [ CRSRES][2708634512]0Start of `ora.rac2.vip`on member `rac1` succeeded.
2013-07-18 10:51:19.409: [ CRSRES][2719124368]0Start of `ora.rac1.vip`on member `rac1` succeeded.
2013-07-18 10:51:19.603: [ CRSRES][2719124368]0startRunnable: settingCLI values
2013-07-18 10:51:19.611: [ CRSRES][2719124368]0Attempting to start`ora.rac1.ASM1.asm` on member `rac1`
2013-07-18 10:51:19.740: [ OCRSRV][3023330192]th_select_handler: Failedto retrieve procctx from ht. constr = [149875736] retval lht [-27] Signal CV.
2013-07-18 10:51:28.855: [ CRSRES][2677164944]0startRunnable: settingCLI values
2013-07-18 10:51:28.869: [ CRSRES][2677164944]0Attempting to start`ora.rac1.ons` on member `rac1`
2013-07-18 10:51:30.435: [ CRSRES][2677164944]0Start of `ora.rac1.ons`on member `rac1` succeeded.
2013-07-18 10:51:30.516: [ CRSRES][2719124368]0Start of`ora.rac1.ASM1.asm` on member `rac1` succeeded.
2013-07-18 10:51:30.645: [ CRSRES][2719124368]0startRunnable: settingCLI values
2013-07-18 10:51:30.651: [ CRSRES][2719124368]0Attempting to start`ora.RACDB.RACDB1.inst` on member `rac1`
2013-07-18 10:51:39.617: [ OCRSRV][3023330192]th_select_handler: Failedto retrieve procctx from ht. constr = [151180672] retval lht [-27] Signal CV.
2013-07-18 10:52:03.746: [ CRSRES][2719124368]0Start of`ora.RACDB.RACDB1.inst` on member `rac1` succeeded.
2013-07-18 10:52:04.972: [ CRSRES][2719124368]0startRunnable: settingCLI values
2013-07-18 10:52:04.979: [ CRSRES][2719124368]0Attempting to start`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
2013-07-18 10:52:07.030: [ CRSRES][2719124368]0Start of`ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
2013-07-18 10:52:07.035: [ CRSRES][2719124368]0Skip online resource:ora.rac1.ons
2013-07-18 10:52:08.714: [ CRSRES][2708634512]0startRunnable: settingCLI values
2013-07-18 10:52:09.700: [ CRSRES][2687654800]0startRunnable: settingCLI values
2013-07-18 10:52:10.388: [ CRSRES][2708634512]0Attempting to start`ora.rac1.gsd` on member `rac1`
2013-07-18 10:52:10.714: [ CRSRES][2687654800]0Attempting to start`ora.RACDB.db` on member `rac1`
2013-07-18 10:52:14.828: [ CRSRES][2708634512]0Start of `ora.rac1.gsd`on member `rac1` succeeded.
2013-07-18 10:52:14.843: [ CRSRES][2687654800]0Start of `ora.RACDB.db`on member `rac1` succeeded.
2013-07-18 10:52:24.856: [ CRSRES][2687654800]0CRS-1002: Resource'ora.RACDB.db' is already running on member 'rac1'
2013-07-18 10:52:24.860: [ CRSRES][2719124368]0CRS-1002: Resource'ora.rac1.LISTENER_RAC1.lsnr' is already running on member 'rac1'
分析下整个流程:显示版本信息->开启crsd守护进程->检测OCRdevice->连接cssd守护进程->检查版本信息->初始化OCR(因为OCR存放RAC中所有节点的信息,非常重要)并确定磁盘路径等信息->统计vote票数“my votes (2), total votes (2)”(此处我的理解是:自己的票数=总的票数才不会被整个RACkill,此处有一个“I AM THE NEW OCR MASTER”)->log一系列OENV并初始化几个重要进程->开启crs资源。
再来看下资源的启动顺序:首先依次启动2个vip(vip1,vip2),其中一个是从节点2上failover过来的,然后依次是asm->inst->lsnr->ons->gsd->db
注意,数据库资源是放在最后才启动的,很有意思,和之前关闭顺序正好是相反的。
查看一下现在的crs资源状态,节点1上的全部资源都启动了,现在就可以作为一个单实例的RAC正常维护数据库了。
[root@RAC1 raw]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....B1.inst application ONLINE ONLINE rac1
ora....B2.inst application ONLINE OFFLINE
ora.RACDB.db application ONLINE ONLINE rac1
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE OFFLINE
ora....C2.lsnr application ONLINE OFFLINE
ora.rac2.gsd application ONLINE OFFLINE
ora.rac2.ons application ONLINE OFFLINE
ora.rac2.vip application ONLINE ONLINE rac1
此时开启节点2的crs并trace相应目录下crsd.log的过程
[root@RAC1 raw]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root@RAC1 raw]# crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
[root@RAC2 ~]# tail -10f/u01/app/oracle/product/10.2.0/crs_1/log/rac2/crsd/crsd.log
2013-07-18 10:30:28.654: [ CRSEVT][2730077072]0CAAMonitorHandler ::0:Action Script /u01/app/oracle/product/10.2.0/crs_1/bin/racgwrap(stop)timed out for ora.rac2.ons! (timeout=600)
2013-07-18 10:30:28.654: [ CRSAPP][2730077072]0StopResource error forora.rac2.ons error code = -2
2013-07-18 10:30:28.678: [ CRSRES][2730077072][ALERT]0`ora.rac2.ons` onmember `rac2` has experienced an unrecoverable failure.
2013-07-18 10:30:28.679: [ CRSRES][2730077072]0Human interventionrequired to resume its availability.
2013-07-18 10:30:28.692: [CSSCLNT][2876935056]clssgsGroupGetStatus: CSS shutting down.
2013-07-18 10:30:28.692: [CSSCLNT][2876935056]clssgsGroupGetStatus: returning 22
2013-07-18 10:30:28.692: [CRSMAIN][2876935056]0CRSD exiting on CSS shutdown request
2013-07-18 10:30:28.693: [default][2876935056]Terminating clsd session
(以上为之前残留在crsd.log文件中的关闭rac2上的crs的一些信息)
2013-07-18 10:56:33.643: [default][3740448][ENTER]0
Oracle Database 10g CRS Release 10.2.0.1.0Production Copyright 1996, 2004, Oracle. All rights reserved
2013-07-18 10:56:33.643: [default][3740448]0CRS Daemon Starting
2013-07-18 10:56:33.644: [CRSMAIN][3740448]0Checking the OCR device
2013-07-18 10:56:33.975: [CRSMAIN][3740448]0Connecting to the CSS Daemon
2013-07-18 10:56:34.569: [COMMCRS][35040144]clsc_connect: (0xa6bb410) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2013-07-18 10:56:34.569: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9
2013-07-18 10:56:34.569: [ CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:56:36.240: [COMMCRS][35040144]clsc_connect: (0xa626ec8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2013-07-18 10:56:36.240: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9
2013-07-18 10:56:36.241: [ CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:56:37.848: [COMMCRS][35040144]clsc_connect: (0xa67a8d8) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2013-07-18 10:56:37.848: [CSSCLNT][3740448]clsssInitNative: connect failed, rc 9
2013-07-18 10:56:37.848: [ CRSRTI][3740448]0CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..
2013-07-18 10:56:43.473: [ CRSD][3740448]0Daemon Version: 10.2.0.1.0Active Version: 10.2.0.1.0
2013-07-18 10:56:43.473: [ CRSD][3740448]0Active Version and SoftwareVersion are same
2013-07-18 10:56:43.473: [CRSMAIN][3740448]0Initializing OCR
2013-07-18 10:56:43.670: [ OCRRAW][3740448]proprioo: for disk 0(/dev/raw/raw1), id match (1), my id set (1669906634,1028247821) total id sets(1), 1st set (1669906634,1028247821), 2nd set (0,0) my votes (2), total votes(2)
2013-07-18 10:56:43.790: [ CRSD][3740448]0ENV Logging level forModule: allcomp 0
2013-07-18 10:56:43.798: [ CRSD][3740448]0ENV Logging level forModule: default 0
2013-07-18 10:56:43.908: [ CRSD][3740448]0ENV Logging level forModule: COMMCRS 0
2013-07-18 10:56:43.916: [ CRSD][3740448]0ENV Logging level forModule: COMMNS 0
2013-07-18 10:56:44.088: [ CRSD][3740448]0ENV Logging level forModule: CRSUI 0
2013-07-18 10:56:44.097: [ CRSD][3740448]0ENV Logging level forModule: CRSCOMM 0
2013-07-18 10:56:44.208: [ CRSD][3740448]0ENV Logging level forModule: CRSRTI 0
2013-07-18 10:56:44.216: [ CRSD][3740448]0ENV Logging level forModule: CRSMAIN 0
2013-07-18 10:56:44.347: [ CRSD][3740448]0ENV Logging level forModule: CRSPLACE 0
2013-07-18 10:56:44.355: [ CRSD][3740448]0ENV Logging level forModule: CRSAPP 0
2013-07-18 10:56:44.465: [ CRSD][3740448]0ENV Logging level forModule: CRSRES 0
2013-07-18 10:56:44.474: [ CRSD][3740448]0ENV Logging level forModule: CRSOCR 0
2013-07-18 10:56:44.603: [ CRSD][3740448]0ENV Logging level forModule: CRSTIMER 0
2013-07-18 10:56:44.612: [ CRSD][3740448]0ENV Logging level forModule: CRSEVT 0
2013-07-18 10:56:44.721: [ CRSD][3740448]0ENV Logging level forModule: CRSD 0
2013-07-18 10:56:44.734: [ CRSD][3740448]0ENV Logging level forModule: CLUCLS 0
2013-07-18 10:56:44.839: [ CRSD][3740448]0ENV Logging level forModule: OCRRAW 0
2013-07-18 10:56:44.848: [ CRSD][3740448]0ENV Logging level forModule: OCROSD 0
2013-07-18 10:56:44.957: [ CRSD][3740448]0ENV Logging level forModule: CSSCLNT 0
2013-07-18 10:56:44.965: [ CRSD][3740448]0ENV Logging level forModule: OCRAPI 0
2013-07-18 10:56:45.000: [ CRSD][3740448]0ENV Logging level forModule: OCRUTL 0
2013-07-18 10:56:45.010: [ CRSD][3740448]0ENV Logging level forModule: OCRMSG 0
2013-07-18 10:56:45.020: [ CRSD][3740448]0ENV Logging level forModule: OCRCLI 0
2013-07-18 10:56:45.029: [ CRSD][3740448]0ENV Logging level forModule: OCRCAC 0
2013-07-18 10:56:45.038: [ CRSD][3740448]0ENV Logging level forModule: OCRSRV 0
2013-07-18 10:56:45.048: [ CRSD][3740448]0ENV Logging level for Module:OCRMAS 0
2013-07-18 10:56:45.048: [CRSMAIN][3740448]0Filename is /u01/app/oracle/product/10.2.0/crs_1/crs/init/rac2.pid
[ clsdmt][2886654864]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))
2013-07-18 10:56:45.104: [CRSMAIN][3740448]0Using Authorizer location: /u01/app/oracle/product/10.2.0/crs_1/crs/auth/
2013-07-18 10:56:45.627: [CRSMAIN][3740448]0Initializing RTI
2013-07-18 10:56:45.627:[CRSTIMER][2865675152]0Timer Thread Starting.
2013-07-18 10:56:45.628: [ CRSRES][3740448]0Parameter SECURITY = 1,running in USER Mode
2013-07-18 10:56:45.628: [CRSMAIN][3740448]0Initializing EVMMgr
2013-07-18 10:56:45.812: [COMMCRS][2855185296]clsc_connect: (0xabaaea0) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2013-07-18 10:56:46.497: [COMMCRS][2855185296]clsc_connect: (0xabaafb0) no listener at(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2013-07-18 10:56:48.832: [CRSMAIN][3740448]0CRSD locked during state recovery, please wait.
2013-07-18 10:56:48.964: [ CRSMAIN][3740448]0CRSDrecovered, unlocked.
2013-07-18 10:56:48.964: [CRSMAIN][3740448]0QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))
2013-07-18 10:56:49.007: [CRSMAIN][3740448]0CRSD UI socket on:(ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))
2013-07-18 10:56:49.046: [CRSMAIN][3740448]0E2E socket on:(ADDRESS=(PROTOCOL=tcp)(HOST=rac2-priv)(PORT=49896))
2013-07-18 10:56:49.046: [CRSMAIN][3740448]0Starting Threads
2013-07-18 10:56:49.046: [CRSMAIN][3740448]0CRS Daemon Started.
2013-07-18 10:56:50.511: [ CRSRES][2760776592]0StopResource: setting CLIvalues
2013-07-18 10:56:50.581: [ CRSRES][2760776592]0Attempting to stop`ora.rac2.vip` on member `rac1`
2013-07-18 10:56:50.836: [ CRSRES][2760776592]0Stop of `ora.rac2.vip` onmember `rac1` succeeded.
2013-07-18 10:56:50.845: [ CRSRES][2760776592]0startRunnable: settingCLI values
2013-07-18 10:56:50.846: [ CRSRES][2760776592]0Attempting to start`ora.rac2.vip` on member `rac2`
2013-07-18 10:56:58.252: [ CRSRES][2760776592]0Start of `ora.rac2.vip` onmember `rac2` succeeded.
2013-07-18 10:56:59.079: [ CRSRES][2760776592]0startRunnable: settingCLI values
2013-07-18 10:56:59.138: [ CRSRES][2760776592]0Attempting to start`ora.rac2.ASM2.asm` on member `rac2`
2013-07-18 10:57:11.612: [ OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [183125624] retval lht [-27] Signal CV.
2013-07-18 10:57:21.767: [ CRSRES][2729307024]0startRunnable: settingCLI values
2013-07-18 10:57:21.812: [ CRSRES][2729307024]0Attempting to start`ora.rac2.ons` on member `rac2`
2013-07-18 10:57:23.276: [ CRSRES][2729307024]0Start of `ora.rac2.ons`on member `rac2` succeeded.
2013-07-18 10:57:23.325: [ CRSRES][2760776592]0Start of`ora.rac2.ASM2.asm` on member `rac2` succeeded.
2013-07-18 10:57:24.094: [ CRSRES][2760776592]0startRunnable: settingCLI values
2013-07-18 10:57:24.130: [ CRSRES][2760776592]0Attempting to start`ora.RACDB.RACDB2.inst` on member `rac2`
2013-07-18 10:57:37.294: [ OCRUTL][2949594000]u_freem: mem passed isnull
2013-07-18 10:57:37.449: [ OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [182966616] retval lht [-27] Signal CV.
2013-07-18 10:58:02.769: [ CRSRES][2760776592]0Start of`ora.RACDB.RACDB2.inst` on member `rac2` succeeded.
2013-07-18 10:58:03.641: [ CRSRES][2760776592]0startRunnable: settingCLI values
2013-07-18 10:58:03.712: [ CRSRES][2760776592]0Attempting to start`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
2013-07-18 10:58:04.926: [ CRSRES][2760776592]0Start of `ora.rac2.LISTENER_RAC2.lsnr`on member `rac2` succeeded.
2013-07-18 10:58:04.929: [ CRSRES][2750286736]0Skip online resource:ora.rac2.ons
2013-07-18 10:58:05.348: [ CRSRES][2760776592]0startRunnable: settingCLI values
2013-07-18 10:58:05.384: [ CRSRES][2760776592]0Attempting to start`ora.rac2.gsd` on member `rac2`
2013-07-18 10:58:06.175: [ CRSRES][2760776592]0Start of `ora.rac2.gsd`on member `rac2` succeeded.
2013-07-18 10:58:07.769: [ OCRUTL][2991553424]u_freem: mem passed isnull
2013-07-18 10:58:16.245: [ CRSRES][2750286736]0CRS-1002: Resource'ora.rac2.LISTENER_RAC2.lsnr' is already running on member 'rac2'
以上内容中,除了没有“IAM THE NEW OCR MASTER”之外,和节点1上crs启动时基本一致
此时再查看全部crs资源状态,发现全部是ONLINE了
[root@RAC1 raw]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....B1.inst application ONLINE ONLINE rac1
ora....B2.inst application ONLINE ONLINE rac2
ora.RACDB.db application ONLINE ONLINE rac1
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
[root@RAC1 raw]#
再来测试以下之前的理论(关闭RAC中最后一个节点上的crs关闭时,数据库资源首先被回收)是否正确,再停止节点2上的crs一次,看看数据库资源是不是会从crs资源中被注销
[root@RAC2 ~]# crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
2013-07-18 11:17:40.572: [ OCRSRV][3064982416]th_select_handler: Failedto retrieve procctx from ht. constr = [182558112] retval lht [-27] Signal CV.
2013-07-18 11:19:27.914: [ CRSRES][2760776592]0StopResource: setting CLIvalues
2013-07-18 11:19:27.949: [ CRSRES][2760776592]0Attempting to stop`ora.rac2.gsd` on member `rac2`
2013-07-18 11:19:28.066: [ CRSRES][2760776592]0Stop of `ora.rac2.gsd` onmember `rac2` succeeded.
2013-07-18 11:19:28.196: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:28.204: [ CRSRES][2729307024]0Attempting to stop`ora.rac2.ons` on member `rac2`
2013-07-18 11:19:28.550: [ CRSRES][2729307024]0Stop of `ora.rac2.ons` onmember `rac2` succeeded.
2013-07-18 11:19:28.805: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:28.821: [ CRSRES][2729307024]0Attempting to stop`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
2013-07-18 11:19:29.737: [ CRSRES][2729307024]0Stop of`ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
2013-07-18 11:19:29.993: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:30.032: [ CRSRES][2729307024]0Attempting to stop`ora.RACDB.RACDB2.inst` on member `rac2`
2013-07-18 11:19:31.691: [ CRSRES][2729307024]0Stop of`ora.RACDB.RACDB2.inst` on member `rac2` succeeded.
2013-07-18 11:19:32.124: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:32.252: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:32.261: [ CRSRES][2729307024]0Attempting to stop`ora.rac2.ASM2.asm` on member `rac2`
2013-07-18 11:19:33.963: [ CRSRES][2729307024]0Stop of`ora.rac2.ASM2.asm` on member `rac2` succeeded.
2013-07-18 11:19:34.710: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:36.269: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:36.407: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:36.622: [ CRSRES][2729307024]0StopResource: setting CLIvalues
2013-07-18 11:19:37.069: [ CRSRES][2729307024]0Attempting to stop`ora.rac2.vip` on member `rac2`
2013-07-18 11:19:37.715: [ CRSRES][2729307024]0Stop of `ora.rac2.vip` onmember `rac2` succeeded.
2013-07-18 11:19:37.863: [CSSCLNT][2897144720]clssgsGroupGetStatus: CSS shutting down.
2013-07-18 11:19:37.863: [CSSCLNT][2897144720]clssgsGroupGetStatus: returning 22
2013-07-18 11:19:37.863: [CRSMAIN][2897144720]0CRSD exiting on CSS shutdown request
2013-07-18 11:19:37.864: [default][2897144720]Terminating clsd session
从以上信息中可以了解到,数据库资源并没有被回收,此时仍然由节点1上的数据库实例对其进行服务,其他的停止顺序和之前节点1上的是一致的:gsd->ons->lsnr->inst->asm->vip
注意:此时只有一个vip资源,并且只是节点2自己的,而没有节点1的vip,作为2节点的RAC,此时节点1仍然是正常工作的,并不需要failover到节点2上。
总结:碰到crs故障,crsdlog文件内容有助于我们分析整个crs流程,如果是非crs自身问题造成的RAC故障,如ocr磁盘,voting disk故障或者interconnect故障造成的脑裂,可以通过查看其他系列的log文件进行分析,从而一一排错,这里就不列举了。
-------------------------------------------------------------------------------------------------------
By aaron8219 Chinaunix Blog:http://blog.chinaunix.net/uid/24612962.html
原创内容,转载请注明链接,谢谢!