当实际的NM Reconfiguration集群重置情况发生时所有的active节点和正在加入集群的节点都会参与到reconfig中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上reconfig重置包括多个阶段:
1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动reconfig的信号
2.投票阶段 — 节点向reconfig manager发送该节点所了解的成员关系
3.脑裂检查阶段 — reconfig manager检查是否脑裂
4.驱逐阶段 — reconfig manager驱逐非成员节点
5.更新阶段 — reconfig manager向成员节点发送权威成员关系信息
在脑裂检查阶段Reconfig Manager会找出那些没有Network Heartbeat而有Disk Heartbeat的节点,并通过Network Heartbeat(如果可能的话)和Disk Heartbeat的信息来计算所有竞争子集群(subcluster)内的节点数目,并依据以下2种因素决定哪个子集群应当存活下去:
拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)
若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个2节点的RAC环境中总是1号节点会获胜。
采用Stonith algorithm 的IO fencing(remote power reset)
STONITH算法是一种常用的I/O Fencing algorithm,是RAC中必要的远程关闭节点的接口。其想法十分简单,当某个节点上运行的软件希望确保本集群内的其他节点不能使用某种资源时,拔出其他节点的插座即可。这是一种简单、可靠且有些残酷的算法。Stonith 的优势是其没有特定的硬件需求,也不会限制集群的扩展性。
Oracle Clusterware的Process Monitor模块负责实现IO fencing,保证不会因节点/实例的不协调工作而产生corruption。Process Monitor的工作具体由hangcheck timer或者oprocd 完成, 在Linux平台上10.2.0.4 之前都没有oprocd的存在(其他Unix平台在10.2.0.1就有了),在安装RAC之前需要额外安装hangcheck timer软件以保证IO fencing, 到10.2.0.4 时Linux上也有了oprocd,具体见 一文。 这些负责IO fencing的进程一般都会被锁定在内存中(locked in memory)、实时运行(Real time)、休眠固定的时间(Sleep a fixed time)、以root身份运行;若进程唤醒后发现时间已经太晚,那么它会强制reboot;若这些进程发生失败,则也会重启,所以在RAC环境中oprocd是一个很重要的进程,不要失去手动去kill这个进程。
在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用的话),若无法发送信息则会通过写出驱逐通知到voting disk上的”kill block”来达到驱逐通知的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的方式也可能是投票磁盘上的状态信息。
可以看到Oracle CSS中Brain Split Check时会尽可能地保证最大的一个子集群存活下来以保证RAC系统具有最高的可用性。
实际案例日志
1号节点网络失败,2,3号节点形成子集群;2,3节点通过voting disk向1号节点发起驱逐:
以下为1号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds
对2,3号节点发起misscount计时
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds
[ CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds
[ CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds
[ CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds
2号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds
[ CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds
[ CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds
[ CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds
[ CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds
[ CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds
[ CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds
[ CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds
[ CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
clssnmHandleSyn应答3号节点发送的同步信息
[ CSSD]2011-04-23 17:12:35.399 [5073104] >USER: NMEVENT_SUSPEND [00][00][00][0e]
发生Node Monitoring SUSPEND事件
[ CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10)
通过clssnmSendVoteInfo向3号节点发送投票信息Vote mesg
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster