clusterware和oracle10gr2软件升级到10.2.0.4时,重启系统后,节点一crs无法启动, crsctl start crs后系统立即重启。
以下是crs 和 css的日志记录。
crsd.log:
2012-12-25 08:11:56.757: [ CSSCLNT][1226828528]clsssInitNative: connectfailed, rc 9
2012-12-25 08:11:56.757: [ CRSRTI][1226828528]0CSS isnotready. Received status 3fromCSS. Waitingforgood status ..
2012-12-25 08:11:58.252: [ COMMCRS][1099401536]clsc_connect: (0xe18010) nolistenerat(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2012-12-25 08:11:58.252: [ CSSCLNT][1226828528]clsssInitNative: connectfailed, rc 9
2012-12-25 08:11:58.252: [ CRSRTI][1226828528]0CSS isnotready. Received status 3fromCSS. Waitingforgood status ..
2012-12-25 08:11:59.789: [ COMMCRS][1099401536]clsc_connect: (0xe18010) nolistenerat(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2012-12-25 08:11:59.789: [ CSSCLNT][1226828528]clsssInitNative: connectfailed, rc 9
2012-12-25 08:11:59.789: [ CRSRTI][1226828528]0CSS isnotready. Received status 3fromCSS. Waitingforgood status ..
2012-12-25 08:12:01.586: [ COMMCRS][1099401536]clsc_connect: (0xe18010) nolistenerat(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2012-12-25 08:12:01.586: [ CSSCLNT][1226828528]clsssInitNative: connectfailed, rc 9
2012-12-25 08:12:01.586: [ CRSRTI][1226828528]0CSS isnotready. Received status 3fromCSS. Waitingforgood status ..
2012-12-25 08:12:04.174: [ COMMCRS][1099401536]clsc_connect: (0xe18010) nolistenerat(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
2012-12-25 08:12:04.174: [ CSSCLNT][1226828528]clsssInitNative: connectfailed, rc 9
2012-12-25 08:12:04.175: [ CRSRTI][1226828528]0CSS isnotready. Received status 3fromCSS. Waitingforgood status ..
ocssd.log:
[ CSSD]2012-12-25 09:58:03.233 >USER: Copyright 2012, Oracle version 10.2.0.4.0
[ CSSD]2012-12-25 09:58:03.233 >USER: CSS daemon logfornode rac1, number 1,incluster crs
[ clsdmt]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CSSD))
[ CSSD]2012-12-25 09:58:03.337 [547869936] >TRACE: clssscmain: local-onlysettofalse
[ CSSD]2012-12-25 09:58:03.351 [547869936] >TRACE: clssnmReadNodeInfo: added node 1 (rac1) tocluster
[ CSSD]2012-12-25 09:58:03.386 [547869936] >TRACE: clssnmReadNodeInfo: added node 2 (rac2) tocluster
[ CSSD]2012-12-25 09:58:04.159 [1138325824] >TRACE: clssnm_skgxninit: Compatible vendor clusterware notinuse
[ CSSD]2012-12-25 09:58:04.159 [1138325824] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2012-12-25 09:58:04.341 [547869936] >TRACE: clssnmNMInitialize: misscount setto(300)
[ CSSD]2012-12-25 09:58:04.342 [547869936] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 150000 ms, reconfig start (misscount) 300000 ms
[ CSSD]2012-12-25 09:58:04.350 [547869936] >TRACE: clssnmDiskStateChange: state from1to2 disk (0//dev/raw/raw4)
[ CSSD]2012-12-25 09:58:04.350 [1138325824] >TRACE: clssnmvDPT: spawned fordisk 0 (/dev/raw/raw4)
[ CSSD]2012-12-25 09:58:06.389 [1138325824] >TRACE: clssnmDiskStateChange: state from2to4 disk (0//dev/raw/raw4)
[ CSSD]2012-12-25 09:58:06.457 [547869936] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2012-12-25 09:58:06.522 [1148815680] >TRACE: clssnmvKillBlockThread: spawned fordisk 0 (/dev/raw/raw4) initial sleep interval (1000)ms
[ CSSD]2012-12-25 09:58:06.531 [1169795392] >TRACE: clssnmClusterListener: Listening on(ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49895))
[ CSSD]2012-12-25 09:58:06.542 [1169795392] >TRACE: clssnmClusterListener: Probing node rac2 (2), probcon(0x1422bd90)
[ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: MSGSRC 2, type 6, node 2, flags 0x0001, con 0x1422bd90, probe 0x1422bd90
[ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: node 2, rac2, con(0x1422bd90), probcon(0x1422bd90), ninfcon((nil)), node unique1356444601, prevunique0, msgunique1356444601 node state 0
[ CSSD]2012-12-25 09:58:06.582 [1169795392] >TRACE: clssnmConnComplete: connected tonode 2 (con 0x1422bd90), ninfcon (0x1422bd90), state (0), flag (1037)
[ CSSD]2012-12-25 09:58:06.594 [1138325824] >TRACE: clssnmReadDskHeartbeat: node(2) isdown. rcfg(2) wrtcnt(2797) LATS(207944) Disk lastSeqNo(2797)
[ CSSD]2012-12-25 09:58:06.756 [1092946240] >TRACE: clssgmclientlsnr: listening on(ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-12-25 09:58:06.756 [1092946240] >TRACE: clssgmclientlsnr: listening on(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
[ CSSD]2012-12-25 09:58:06.817 [1201264960] >TRACE: clssgmPeerListener: Listening on(ADDRESS=(PROTOCOL=tcp)(DEV=20)(HOST=10.0.0.154)(PORT=33670))
[ CSSD]2012-12-25 09:58:08.725 [1169795392] >TRACE: clssnmHandleSync: diskTimeout setto(297000)ms
[ CSSD]2012-12-25 09:58:08.725 [1169795392] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[rac2] seq[0] sync[2]
[ CSSD]2012-12-25 09:58:08.725 [1232734528] >TRACE: clssnmRcfgMgrThread: initial lastleader(2) unique(1356444601)
各节点都能ping通,但根据日志总感觉是节点间通信问题,我将OCR恢复了一下,但问题依然。在这里记录一下整个处理过程。
1.停止两个节点crs
# crsctl stop crs
2.各节点运行crs/root脚本
--节点一
[root@rac1 ~]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured
Oracle CRS stack is already configured and will be running under init(1M)
以上问题需要删除两个节点 /etc/oracle/scls_scr//oracle/cssfatal ,然后重新运行crs/root.sh脚本。
--节点一
[root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
Checking tosee if Oracle CRS stackisalready configured
Setting the permissions onOCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 andEVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
Creating OCR keys foruser'root', privgrp'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw4
Format of1 voting devices complete.
Startup will be queued toinit within 30 seconds.
Adding daemons toinittab
Expecting the CRS daemons tobe up within 600 seconds.
CSS isactiveonthese nodes.
rac1
CSS isinactiveonthese nodes.
rac2
Localnode checking complete.
Run root.sh onremaining nodestostart CRS daemons.
--节点二
[root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
Checking tosee if Oracle CRS stackisalready configured
Setting the permissions onOCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 andEVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
clscfg: Arguments checkoutsuccessfully.
NOKEYS WERE WRITTEN. Supply -forceparametertooverride.
-forceisdestructiveandwill destroyanyprevious cluster
configuration.
Oracle Cluster Registry forcluster has already been initialized
Startup will be queued toinit within 30 seconds.
Adding daemons toinittab
Expecting the CRS daemons tobe up within 600 seconds.
CSS isactiveonthese nodes.
rac1
rac2
CSS isactiveonallnodes.
Waiting forthe Oracle CRSDandEVMDtostart
Oracle CRS stack installed andrunning under init(1M)
Running vipca(silent) forconfiguring nodeapps
Creating VIP application resource on(2) nodes...
Creating GSD application resource on(2) nodes...
Creating ONS application resource on(2) nodes...
Starting VIP application resource on(2) nodes...
Starting GSD application resource on(2) nodes...
Starting ONS application resource on(2) nodes...
Done.
[root@rac2 ~]# crsctl checkcrs
CSS appears healthy
CRS appears healthy
EVM appears healthy
3.各节点运行cluster升级时的两个脚本
--节点一
[root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[root@rac1 oracle]# /u01/app/oracle/product/10.2.0/crs/install/root102.sh
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
Preparing torecopy patched initandRC scripts.
Recopying init andRC scripts.
Startup will be queued toinit within 30 seconds.
Starting up the CRS daemons.
Waiting forthe patched CRS daemonstostart.
This may take a while onsomesystems.
.
10204 patch successfully applied.
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 andEVMR=49897.
node :
node 1: rac1 rac1-priv rac1
Creating OCR keys foruser'root', privgrp'root'..
Operation successful.
clscfg -upgrade completed successfully
[root@rac1 oracle]# /etc/init.d/init.crs enable
Automatic startup enabled forsystem boot.
--节点二
[root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/bin/crsctl stop crs
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
You have new mail in/var/spool/mail/root
[root@rac2 ~]# /u01/app/oracle/product/10.2.0/crs/install/root102.sh
WARNING: directory '/u01/app/oracle/product/10.2.0'isnotownedbyroot
WARNING: directory '/u01/app/oracle/product'isnotownedbyroot
WARNING: directory '/u01/app/oracle'isnotownedbyroot
WARNING: directory '/u01/app'isnotownedbyroot
WARNING: directory '/u01'isnotownedbyroot
Preparing torecopy patched initandRC scripts.
Recopying init andRC scripts.
Startup will be queued toinit within 30 seconds.
Starting up the CRS daemons.
Waiting forthe patched CRS daemonstostart.
This may take a while onsomesystems.
.
10204 patch successfully applied.
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 andEVMR=49897.
node :
node 2: rac2 rac2-priv rac2
Creating OCR keys foruser'root', privgrp'root'..
Operation successful.
clscfg -upgrade completed successfully
[root@rac2 ~]# crs_stat -t
NameType Target State Host
------------------------------------------------------------
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
[root@rac2 ~]# /etc/init.d/init.crs enable
Automatic startup enabled forsystem boot.
4.添加asm
[oracle@rac1 db_1]$ srvctladdasm -n rac1 -i ASM1 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 db_1]$ srvctl addasm -n rac2 -i ASM2 -o /u01/app/oracle/product/10.2.0/db_1
5.然后建库。