为什么Grid Infrastructure Rebootless节点防护失败(文档 ID 1502282.1)

适用于:

Oracle Server - Enterprise Edition - Version11.2.0.2 and later
Information in this document applies to anyplatform.

用途:

Rebootless防护在11.2.0.2 GridInfrastructure中引入,在驱逐发生时,它将尝试在被驱逐的节点上正常停止GI,而不是重新启动节点,以避免节点重新启动。如果重新引导防护失败,则驱逐的节点将重新启动。此文档列出了重新引导防护故障的常见原因。

详细信息:

1.资源无法停止。

如果一个或多个资源无法停止,则rebootless fencing将失败,并且将重新启动节点。

在这种情况下,在节点2脑裂后rebootless fencing失败,node2将重启:

驱逐节点<GI_HOME>/log/<node>/alert<node>.log

.. 
2012-09-11 12:04:34.363
[cssd(18834)]CRS-1610:Network communication with node racnode1 (1) missing for90% of timeout interval.  Removal of this node from cluster in 2.020seconds
2012-09-11 12:04:36.379
[cssd(18834)]CRS-1609:This node is unable to communicate with other nodes inthe cluster and is going down to preserve cluster integrity; details at(:CSSNM00008:) in /ocw/grid/log/racnode2/cssd/ocssd.log.
2012-09-11 12:04:36.379
[cssd(18834)]CRS-1656:The CSS daemon is terminating due to a fatal error;Details at (:CSSSC00012:) in /ocw/grid/log/racnode2/cssd/ocssd.log
2012-09-11 12:04:36.399
[cssd(18834)]CRS-1652:Starting clean up of CRSD resources.
2012-09-11 12:04:36.586
[crsd(26115)]CRS-5833:Cleaning resource 'zDRMON.sh.racnode2 1 1' failed as partof reboot-less node fencing
2012-09-11 12:04:36.588
[cssd(18834)]CRS-1653:The clean up of the CRSD resources failed.                    ##>>user resource fails to be cleaned
2012-09-11 12:04:37.042
[ohasd(16821)]CRS-2765:Resource 'ora.evmd' has failed on server 'racnode2'.
2012-09-11 12:04:37.052
[/ocw/grid/bin/scriptagent.bin(27696)]CRS-5822:Agent'/ocw/grid/bin/scriptagent_oracle' disconnected from server. Details at(:CRSAGF00117:) {0:4:10} in/ocw/grid/log/racnode2/agent/crsd/scriptagent_oracle/scriptagent_oracle.log.
2012-09-11 12:04:37.062
[ohasd(16821)]CRS-2765:Resource 'ora.crsd' has failed on server'racnode2'.                 ##>>node rebooted after this message, in some cases, this message won't be there
2012-09-11 12:10:47.356
[ohasd(16677)]CRS-2112:The OLR service started on node racnode2.
2012-09-11 12:10:47.521
[ohasd(16677)]CRS-1301:Oracle High Availability Service started on noderacnode2.
2012-09-11 12:10:47.539
[ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2,component: cssagent, with time stamp: L-2012-09-11-12:04:37.140      ##>>reboot advisory shows both cssdagent and cssdmonitor took the action to reboot
[ohasd(16677)]CRS-8013:reboot advisory message text: clsnomon_status: needto reboot, unexpected failure 8 received from CSS
2012-09-11 12:10:47.594
[ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2, component:cssmonit, with time stamp: L-2012-09-11-12:04:37.139
[ohasd(16677)]CRS-8013:reboot advisory message text: clsnomon_status: need toreboot, unexpected failure 8 received from CSS
2012-09-11 12:10:47.605
[ohasd(16677)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory logfiles, 2 were announced and 0 errors occurred

当资源无法停止时,cssdagent或cssdmonitor或两者都将尝试重新引导节点,以下是样本日志。

<GI_HOME>/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log

2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal
2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted
2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcInternalSend: connection notvalid for send operation endp 0x8e3e60 [00000000000001b7] { gipcEndpoint :localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12)
2012-09-11 12:04:37.035: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:37.035: [ CSSCLNT][1077418304]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 1
2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8

2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete...
2012-09-11 12:04:37.036: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed tosend on endp 0x8e3e60 [00000000000001b7] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000
2012-09-11 12:04:37.036: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7

2012-09-11 12:04:37.036: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3

2012-09-11 12:04:37.139: [ USRTHRD][1097382208]clsnwork_process_work: sync completed
2012-09-11 12:04:37.139: [ USRTHRD][1097382208] clsnSyncComplete: posting omon

<GI_HOME>/agent/ohasd/oracssdagent_root/oracssdagent_root.log

2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal
2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27
2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcInternalSend: connection notvalid for send operation endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint :localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12)
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed tosend on endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint : localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7

2012-09-11 12:04:37.035: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clsssRecvMsg: got a disconnect from the server whilewaiting for message type 1
2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)

2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8

2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete...
2012-09-11 12:04:37.036: [ USRTHRD][1097382208] clsnwork_process_work: callingsync

由于CRSD资源(用户资源)无法停止,crsd.log可以作为进一步调试的起点。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值