oracle数据库脑裂,了解Oracle RAC Brain Split Resolution集群脑裂协议

开始检测voting disk上的信息

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2

发现其他子集群,包含1号节点且1号节点为该子集群的leader,为最小子集群;3号与2号节点组成最大子集群,2号节点为leader节点

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Start

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10,

impendingrcfg 1, stateflags 0x40d

发起对1号节点的驱逐

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmSendShutdown: req to node 1, kill time 443294

[    CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 1, vrh1, undead 1

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)

clssnmCheckKillStatus检查1号节点是否down了

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(2)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 10 complete!

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]

[    CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)

[    CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE:   clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)

[    CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

.........................省略若干行.........................

[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)

[    CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10

[    CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)

[    CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

更新阶段

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 3, master node number 2

[    CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

另一场景为1号节点未加入集群,2号节点的网络失败,因2号节点的member number较小故其通过voting disk向3号节点发起驱逐

以下为2号节点的ocssd.log日志

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds

[    CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds

[    CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds

[    CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds

[    CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds

[    CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds

[    CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds

[    CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds

[    CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds

[    CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.021 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2

检查voting disk后发现子集群3为最小"子集群"(3号节点的node number较2号大);2号节点为最大子集群

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084

发起对3号节点的驱逐和shutdown request

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 3, vrh3, undead 1

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmDeactivateNode: node 3 (vrh3) left cluster

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: SYNC(13) from node(2) completed

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 13 complete!

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (13)

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

................省略若干行..............

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 13

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(13), with status(1)

以下为3号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds

[    CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds

[    CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds

[    CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds

[    CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds

[    CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds

[    CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds

[    CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]

[    CSSD]2011-04-23 17:43:15.912 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)0b1331709591d260c1c78e86d0c51c18.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值