这个oracle集群环境由两个节点构成
os:aix 6.1
crs/oracle 10.2.0.5.0
storage:IBM-8100
node1(ocssd.log):
[ CSSD]2012-07-20 09:31:43.527 [2058] >TRACE: clssgmDeleteClientListener: cleanup for proc(111f15fb0) con(111f17e90) pid()
[ CSSD]2012-07-20 09:31:44.009 [3343] >WARNING: clssnmPollingThread: node pmmdpdb2 (2) at 50% heartbeat fatal, eviction in 14.602 seconds seedhbimpd 0
--对2号机发起misscount计时
[ CSSD]2012-07-20 09:31:44.009 [3343] >TRACE: clssnmPollingThread: node pmmdpdb2 (2) is impending reconfig, flag 17421, misstime 15398
[ CSSD]2012-07-20 09:31:44.009 [3343] >TRACE: clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[ CSSD]2012-07-20 09:31:44.925 [3600] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2012-07-20 09:31:44.925 [3600] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2012-07-20 09:31:45.010 [3343] >WARNING: clssnmPollingThread: node pmmdpdb2 (2) at 50% heartbeat fatal, eviction in 13.600 seconds seedhbimpd 1
[ CSSD]2012-07-20 09:31:48.950 [3600] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2012-07-20 09:31:48.950 [3600] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
[ CSSD]2012-07-20 09:31:52.016 [3343] >WARNING: clssnmPollingThread: node pmmdpdb2 (2) at 75% heartbeat fatal, eviction in 6.595 seconds seedhbimpd 1
[ CSSD]2012-07-20 09:31:52.960 [3600] >TRACE: clssnmSendingThread: sending status msg to all nodes
[ CSSD]2012-07-20 09:31:52.960 [3600] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes
......
[ CSSD]2012-07-20 09:31:58.620 [3857] >TRACE: clssnmCheckDskInfo: Checking disk info...
--驱逐node2
[ CSSD]2012-07-20 09:31:58.620 [3857] >TRACE: clssnmCheckDskSleepTime: Node 2, pmmdpdb2, is alive, DHB (1342747917, 1068431812) more than disk timeout of 27000 after the last NHB (1342747888, 1068402589)
[ CSSD]2012-07-20 09:31:58.621 [3857] >TRACE: clssnmCheckDskInfo: My cohort: 1
[ CSSD]2012-07-20 09:31:58.621 [3857] >TRACE: clssnmRemove: Start
[ CSSD]2012-07-20 09:31:58.621 [3857] >TRACE: clssnmrRemoveNode: Evicting node 2, pmmdpdb2, birth 4, death 5, stateflags 0x4001, unique 1341077868, prev unique 1341077868
[ CSSD]2012-07-20 09:31:58.621 [3857] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2012-07-20 09:31:58.621 [3857] >TRACE: clssnmWaitOnEvictions: node 2, undead 1 seedhbimpd 1
......
[ CSSD]2012-07-20 09:31:58.635 [1] >TRACE: clssgmSuspendAllGrocks: done
[ CSSD]2012-07-20 09:31:58.635 [1] >TRACE: clssgmUpdateEventValue: CmInfo State val 2, changes 38
[ CSSD]2012-07-20 09:31:58.635 [1] >TRACE: clssgmUpdateEventValue: ConnectedNodes val 4, changes 17
[ CSSD]2012-07-20 09:31:58.635 [1] >TRACE: clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(4)
[ CSSD]2012-07-20 09:31:58.635 [1] >TRACE: clssgmCleanupNodeContexts(): successful cleanup of nodes rcfg(4)
[ CSSD]2012-07-20 09:31:58.636 [1] >TRACE: clssgmStartNMMon: completed node cleanup
[ CSSD]2012-07-20 09:31:59.129 [1030] >TRACE: clssnmvReadDskHeartbeat: node(2) is down. rcfg(5) wrtcnt(1659807) LATS(1068416948) Disk lastSeqNo(1659807)
[ CSSD]2012-07-20 09:31:59.619 [3857] >TRACE: clssnmWaitOnEvictions: node 2, undead 1 seedhbimpd 1
...
[ CSSD]2012-07-20 09:32:29.601 [1801] >TRACE:
clssnmDeactivateNode: node 2 (pmmdpdb2) left cluster
node2(ocssd.log):
--2节点比1节点的node number大,所以1节点构成了最大的子集群,为了避免脑裂,2节点自杀
[ CSSD]2012-07-20 09:31:57.832 [3857] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2012-07-20 09:31:57.832 [3857] >TRACE: clssnmCheckDskSleepTime: Node 1, pmmdpdb1, is alive, DHB (1342747917, 1068415338) more than disk timeout of 27000 after the last NHB (1342747887, 1068385646)
[ CSSD]2012-07-20 09:31:57.832 [3857] >TRACE: clssnmCheckDskInfo: My cohort: 2
[ CSSD]2012-07-20 09:31:57.832 [3857] >TRACE: clssnmCheckDskInfo: Surviving cohort: 1
[ CSSD]2012-07-20 09:31:57.832 [3857] >TRACE: clssnmCheckDskInfo:
Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, pmmdpdb2,
is smaller than cohort of 1 nodes led by node 1, pmmdpdb1, based on map type 2
[ CSSD]2012-07-20 09:31:57.832 [3857] >ERROR: ###################################
[ CSSD]2012-07-20 09:31:57.833 [3857] >ERROR: clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
采取办法:
查看私有网络链路是否正常
查看私有交换机是否正常
查看物理网卡是否正常
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23378530/viewspace-739392/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/23378530/viewspace-739392/