ES---Master选举和Bully算法

最新推荐文章于 2023-12-25 23:43:42 发布

San6653

最新推荐文章于 2023-12-25 23:43:42 发布

阅读量1.2k

点赞数

分类专栏： JAVA基础

原文链接：https://mp.csdn.net/postedit/103368883

版权

JAVA基础专栏收录该内容

5 篇文章 1 订阅

订阅专栏

bully算法是一个分布式系统中动态选择master节点的算法，进程号最大的非失效的节点将被选为master。

算法用三种消息类型：

1）选举消息 （Election Message: Sent to announce election.）
2）应答消息（Answer (Alive) Message: Responds to the Election message.）
3）选举成功消息 （Coordinator (Victory) Message: Sent by winner of the election to announce victory.）

当一个进程P从失败中恢复，或者接收到主节点失效信息，进程P将做以下事情：

1）如果进程P有最大的进程ID，那么它则会向其他节点广播Coordinator (Victory) Message。否则进程P向进程号比它大的进程发送Election Message
2）如果进程P发送Election Message后，没有接收到应答，它就会向其他节点广播Coordinator (Victory) Message，并成为master。
3）如果进程P接收到比它进程号更高的进程的Answer (Alive) Message信息，那么它这次的选举就失败了，等待接收其他节点的Coordinator (Victory) Message。
4）如果进程P接收到比它进程号低的进程的Election message，那么它会向该节点返回一个Answer (Alive) Message，并启动选举进程，并向比它更高的进程发送Election Message。
5）如果进程P接收到Coordinator (Victory) Message，那么它就会把发送这条消息的节点看作为master 进程。

ES Master选举过程

当master节点失效之后，各个节点find master的过程如下： 1）每个节点会ping配置文件中discovery.zen.ping.unicast.hosts的IP，找到所有存活的节点并过滤 2）找到非本身的active master

 List<DiscoveryNode> activeMasters = new ArrayList<>();
    for (ZenPing.PingResponse pingResponse : pingResponses) {
        // We can't include the local node in pingMasters list, otherwise we may up electing ourselves without
        // any check / verifications from other nodes in ZenDiscover#innerJoinCluster()
        if (pingResponse.master() != null && !localNode.equals(pingResponse.master())) {
            activeMasters.add(pingResponse.master());
        }
    }

3）找到所有的可成为master的节点集合masterCandidates ，包含自己

// nodes discovered during pinging
    List<ElectMasterService.MasterCandidate> masterCandidates = new ArrayList<>();
    for (ZenPing.PingResponse pingResponse : pingResponses) {
        if (pingResponse.node().isMasterNode()) {
            masterCandidates.add(new ElectMasterService.MasterCandidate(pingResponse.node(), pingResponse.getClusterStateVersion()));
        }
    }

4）如果activeMasters 为空，不存在活着的master节点，同时当前活着的节点满足配置中discovery.zen.minimum_master_nodes的数量，就从masterCandidates 挑出ID最小的节点，让其成为master节点。如果activeMasters 不为空，则从中选择最小的ID成为Master节点。

if (activeMasters.isEmpty()) {
        if (electMaster.hasEnoughCandidates(masterCandidates)) {
            final ElectMasterService.MasterCandidate winner = electMaster.electMaster(masterCandidates);
            logger.trace("candidate {} won election", winner);
            return winner.getNode();
        } else {
            // if we don't have enough master nodes, we bail, because there are not enough master to elect from
            logger.warn("not enough master nodes discovered during pinging (found [{}], but needed [{}]), pinging again",
                        masterCandidates, electMaster.minimumMasterNodes());
            return null;
        }
    } else {
        assert !activeMasters.contains(localNode) : "local node should never be elected as master when other nodes indicate an active master";
        // lets tie break between discovered nodes
        return electMaster.tieBreakActiveMasters(activeMasters);
    }

electMaster.electMaster方法和electMaster.tieBreakActiveMasters方法都是从集合中选取最小节点的ID：

public MasterCandidate electMaster(Collection<MasterCandidate> candidates) {
    assert hasEnoughCandidates(candidates);
    List<MasterCandidate> sortedCandidates = new ArrayList<>(candidates);
    sortedCandidates.sort(MasterCandidate::compare);
    return sortedCandidates.get(0);
}
public DiscoveryNode tieBreakActiveMasters(Collection<DiscoveryNode> activeMasters) {
    return activeMasters.stream().min(ElectMasterService::compareNodes).get();
}

总结：Bully算法

1）如果进程P有最大的进程ID，那么它则会向其他节点广播Coordinator (Victory) Message。否则进程P向进程号比它大的进程发送Election Message
2）如果进程P发送Election Message后，没有接收到应答，它就会向其他节点广播Coordinator (Victory) Message，并成为master。
3）如果进程P接收到比它进程号更高的进程的Answer (Alive) Message信息，那么它这次的选举就失败了，等待接收其他节点的Coordinator (Victory) Message。
4）如果进程P接收到比它进程号低的进程的Election message，那么它会向该节点返回一个Answer (Alive) Message，并启动选举进程，并向比它更高的进程发送Election Message。
5）如果进程P接收到Coordinator (Victory) Message，那么它就会把发送这条消息的节点看作为master 进程。

总结：ES Master选举过程

产生脑裂现象：

在es的集群中,多个master选举出一个现役(active)master后,由于现役master网络波动但没有宕机,导致其他主节点连接不到判断宕机,其他主节点执行选举逻辑,生成新的master,当网络波动消失,集群将会被多个master同时管理(meta data),最终集群中的数据会错乱--脑裂