oracle+故障切换,CRS故障切换异常

本帖最后由 ftc007 于 2012-6-7 09:53 编辑

以上是这个过程的CRS日志:

2012-06-05 18:06:55.023    --开始关机

[cssd(10293)]CRS-1654:Clean up of CRSD resources finished successfully.

2012-06-05 18:06:55.024

[cssd(10293)]CRS-1655:CSSD on node oracle02 detected a problem and started to shutdown.  --关闭本节点的crs

2012-06-05 18:06:55.035

[/opt/app/11.2.0/grid/bin/orarootagent.bin(10867)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117

5f60efcaccfa40b26873a6a9cb266119.gif {0:1:4} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/orarootagent_root/orarootagent_root.log.

2012-06-05 18:06:55.034

[/opt/app/11.2.0/grid/bin/oraagent.bin(10864)]CRS-5822:Agent '/opt/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117

5f60efcaccfa40b26873a6a9cb266119.gif {0:2:6} in /opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log.

2012-06-05 18:06:55.117

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:55.195

[cssd(10293)]CRS-1660:The CSS daemon shutdown has completed

2012-06-05 18:06:55.512

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:06:55.531

[ohasd(9758)]CRS-2765:Resource 'ora.evmd' has failed on server 'oracle02'.

2012-06-05 18:06:55.539

[ohasd(9758)]CRS-2765:Resource 'ora.ctssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.154

[crsd(11278)]CRS-0805:Cluster Ready Service aborted due to failure to communicate with Cluster Synchronization Service with error [3]. Details at (:CRSD00109

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/crsd/crsd.log.

2012-06-05 18:06:56.528

[ohasd(9758)]CRS-2765:Resource 'ora.cssd' has failed on server 'oracle02'.

2012-06-05 18:06:56.546

[ohasd(9758)]CRS-2765:Resource 'ora.crsd' has failed on server 'oracle02'.

2012-06-05 18:06:56.558

[ohasd(9758)]CRS-2765:Resource 'ora.cluster_interconnect.haip' has failed on server 'oracle02'.

2012-06-05 18:06:57.323

[ctssd(11289)]CRS-2402:The Cluster Time Synchronization Service aborted on host oracle02. Details at (:ctss_css_init1

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/ctssd/octssd.log.

2012-06-05 18:06:58.279

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:06:58.466

[ohasd(9758)]CRS-2765:Resource 'ora.asm' has failed on server 'oracle02'.

2012-06-05 18:07:06.870

[cssd(11333)]CRS-1713:CSSD daemon is started in clustered mode  --准备启动crs,在启动之后,它也需要重新检测各节点的健康状况,下面5行,就是这个过程,但比较久,不正常

2012-06-05 18:07:21.817

[ohasd(9758)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'oracle02'.

2012-06-05 18:07:22.548

[cssd(11333)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:07:23.872

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.888

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:07:23.898

[cssd(11333)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

^[[A^[[A2012-06-05 18:09:26.767

[cssd(11333)]CRS-1625:Node oracle01, number 1, was manually shut down     --节点2来到这里才意识到节点1崩溃,这个好像有点奇怪,其实主要是这里之前花的时间比较多,整个过程花了三分钟,后面都比较快

2012-06-05 18:09:30.670

[cssd(11333)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:10:30.793

[cssd(11333)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011

5f60efcaccfa40b26873a6a9cb266119.gif in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log

2012-06-05 18:10:34.327

[cssd(11623)]CRS-1713:CSSD daemon is started in clustered mode

2012-06-05 18:10:36.708

[ohasd(9758)]CRS-2765:Resource 'ora.diskmon' has failed on server 'oracle02'.

2012-06-05 18:10:50.103

[cssd(11623)]CRS-1707:Lease acquisition for node oracle02 number 2 completed

2012-06-05 18:10:51.524

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL3; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.532

[cssd(11623)]CRS-1625:Node oracle01, number 1, was manually shut down

2012-06-05 18:10:51.552

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL2; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:51.576

[cssd(11623)]CRS-1605:CSSD voting file is online: ORCL:CRSVOL1; details in /opt/app/11.2.0/grid/log/oracle02/cssd/ocssd.log.

2012-06-05 18:10:59.113

[cssd(11623)]CRS-1601:CSSD Reconfiguration complete. Active nodes are oracle02 .

2012-06-05 18:11:01.715

[ctssd(11714)]CRS-2407:The new Cluster Time Synchronization Service reference node is host oracle02.

2012-06-05 18:11:01.714

[ctssd(11714)]CRS-2401:The Cluster Time Synchronization Service started on host oracle02.

2012-06-05 18:11:02.428

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:02.680

[/opt/app/11.2.0/grid/bin/oraagent.bin(10199)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/ohasd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:22.096

[crsd(11870)]CRS-1012:The OCR service started on node oracle02.

2012-06-05 18:11:22.897

[evmd(11286)]CRS-1401:EVMD started on node oracle02.

2012-06-05 18:11:24.284

[crsd(11870)]CRS-1201:CRSD started on node oracle02.

2012-06-05 18:11:25.832

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:25.842

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010

5f60efcaccfa40b26873a6a9cb266119.gif" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:27.438

[/opt/app/11.2.0/grid/bin/oraagent.bin(11981)]CRS-5016:Process "/opt/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/opt/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/opt/app/11.2.0/grid/log/oracle02/agent/crsd/oraagent_grid/oraagent_grid.log"

2012-06-05 18:11:27.575

[crsd(11870)]CRS-2772:Server 'oracle02' has been assigned to pool 'Free'.

从日志来分析,感觉好像是共享存储的切换比较慢造成的,以下是ocssd的日志信息:

2012-06-05 18:09:25.930: [    CSSD][1086961984]clssgmDiscEndpcl: gipcDestroy 0x1faa

2012-06-05 18:09:26.151: [    CSSD][1118615872]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:26.767: [    CSSD][1107577152]clssnmvDHBValidateNCopy: node 1, oracle01, has a disk HB, but no network HB, DHB has

rcfg 234372137, wrtcnt, 26905, LATS 4294594190, lastSeqNo 26900, uniqueness 1338889661, timestamp 1338890966/536754

2012-06-05 18:09:26.767: [    CSSD][1107577152]clssnmReadDskHeartbeat:manual shutdown of nodename oracle01, nodenum 1 epoch 13388909

66 msec 536754

2012-06-05 18:09:27.153: [    CSSD][1118615872]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:28.155: [    CSSD][1118615872]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:29.157: [    CSSD][1118615872]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:29.661: [    CSSD][1121769792]clssnmSendingThread: sending join msg to all nodes

2012-06-05 18:09:29.661: [    CSSD][1121769792]clssnmSendingThread: sent 4 join msgs to all nodes

2012-06-05 18:09:29.940: [    CSSD][1086961984]clssscSelect: cookie accept request 0x2aaaac024f00

2012-06-05 18:09:29.940: [    CSSD][1086961984]clssgmAllocProc: (0x1cc455c0) allocated

2012-06-05 18:09:29.941: [    CSSD][1086961984]clssgmClientConnectMsg: properties of cmProc 0x1cc455c0 - 1,2,3,4,5

2012-06-05 18:09:29.941: [    CSSD][1086961984]clssgmClientConnectMsg: Connect from con(0x2039) proc(0x1cc455c0) pid(11200) version

11:2:1:4, properties: 1,2,3,4,5

2012-06-05 18:09:29.941: [    CSSD][1086961984]clssgmClientConnectMsg: msg flags 0x0000

2012-06-05 18:09:29.942: [    CSSD][1086961984]clssscSelect: cookie accept request 0x1cc455c0

2012-06-05 18:09:29.942: [    CSSD][1086961984]clssscevtypSHRCON: getting client with cmproc 0x1cc455c0

2012-06-05 18:09:29.942: [    CSSD][1086961984]clssgmRegisterClient: proc(3/0x1cc455c0), client(1/0x1d146830)

2012-06-05 18:09:29.943: [    CSSD][1086961984]clssgmJoinGrock: global grock CRF- new client 0x1d146830 with con 0x2068, requested n

um -1, flags 0x4000600

2012-06-05 18:09:29.943: [    CSSD][1086961984]clssgmJoinGrock: ignoring grock join for client not requiring fencing until group inf

ormation has been received from the master; group name CRF-, member number -1, flags 0x4000600

2012-06-05 18:09:29.943: [    CSSD][1086961984]clssgmDiscEndpcl: gipcDestroy 0x2068

2012-06-05 18:09:29.943: [    CSSD][1086961984]clssgmDeadProc: proc 0x1cc455c0

2012-06-05 18:09:29.944: [    CSSD][1086961984]clssgmDestroyProc: cleaning up proc(0x1cc455c0) con(0x2039) skgpid  ospid 11200 with

0 clients, refcount 0

2012-06-05 18:09:29.944: [    CSSD][1086961984]clssgmDiscEndpcl: gipcDestroy 0x2039

2012-06-05 18:09:30.159: [    CSSD][1118615872]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0

2012-06-05 18:09:30.662: [    CSSD][1123346752]clssnmRcfgMgrThread: Local Join

2012-06-05 18:09:30.662: [    CSSD][1123346752]clssnmLocalJoinEvent: begin on node(2), waittime 193000

2012-06-05 18:09:30.662: [    CSSD][1123346752]clssnmLocalJoinEvent: set curtime (4294598080) for my node

2012-06-05 18:09:30.662: [    CSSD][1123346752]clssnmLocalJoinEvent: scanning 32 nodes

2012-06-05 18:09:30.662: [    CSSD][1123346752]clssnmLocalJoinEvent: Node oracle01, number 1, was shut down

2012-06-05 18:09:30.664: [    CSSD][1123346752]clssnmLocalJoinEvent: Starting initial cluster reconfig

2012-06-05 18:09:30.664: [    CSSD][1123346752]clssnmDoSyncUpdate: Initiating sync 0

2012-06-05 18:09:30.664: [    CSSD][1123346752]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 2, from -1, changes 1

2012-06-05 18:09:30.664: [    CSSD][1123346752]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 2

00000

2012-06-05 18:09:30.664: [    CSSD][1123346752]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will ta

ke effect when the sync is completed.

从以上描述来看,节点2等待了193秒后才找到共享磁盘,然后才成功把CRS服务起来,感觉可能是这个方向有异常,大家意见怎样?楼上的朋友怎么看?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值