ElasticSearch集群选举

最新推荐文章于 2024-02-25 22:54:16 发布

VIP文章泮小俊233

最新推荐文章于 2024-02-25 22:54:16 发布

阅读量3.1k

点赞数 1

分类专栏： ElasticSearch 源码学习文章标签： ElasticSearch 集群

本文链接：https://blog.csdn.net/panxj856856/article/details/81561633

版权

上一篇分析了es集群选举前，每个节点通过ping()获取集群内其他节点的信息。这一次从整体分析下es集群选举master的流程。

在Node节点的start()方法中，通过discovery.startInitialJoin()方法开始加入集群，并参与选举。

    @Override
    public void startInitialJoin() {
        // start the join thread from a cluster state update. See {@link JoinThreadControl} for details.
        synchronized (stateMutex) {
            // do the join on a different thread, the caller of this method waits for 30s anyhow till it is discovered
            joinThreadControl.startNewThreadIfNotRunning();
        }
    }

加锁，并调用startNewThreadIfNotRunning()，为了保证执行加入集群的控制线程的唯一性。

        public void startNewThreadIfNotRunning() {
            assert Thread.holdsLock(stateMutex);
            if (joinThreadActive()) {
                return;
            }
            threadPool.generic().execute(new Runnable() {
                @Override
                public void run() {
                    Thread currentThread = Thread.currentThread();
                    if (!currentJoinThread.compareAndSet(null, currentThread)) {
                        return;
                    }
                    while (running.get() && joinThreadActive(currentThread)) {
                        try {
                            innerJoinCluster();
                            return;
                        } catch (Exception e) {
                            logger.error("unexpected error while joining cluster, trying again", e);
                            // Because we catch any exception here, we want to know in
                            // tests if an uncaught exception got to this point and the test infra uncaught exception
                            // leak detection can catch this. In practise no uncaught exception should leak
                            assert ExceptionsHelper.reThrowIfNotNull(e);
                        }
                    }
                    // cleaning the current thread from currentJoinThread is done by explicit calls.
                }
            });
        }

可以看到，执行该方法先确保该线程此时握有锁。

其次如果此时该线程执行已经开启，那么为了确保唯一性，此次无需生成，直接退出。

        public boolean joinThreadActive() {
            Thread currentThread = currentJoinThread.get();
            return running.get() && currentThread != null && currentThread.isAlive();
        }

否则生成新的线程来执行，并通过cas将本线程保存在currentJoinThread成员上，该成员为原子变量保证线程安全。

        private final AtomicBoolean running = new AtomicBoolean(false);
        private final AtomicReference<Thread> currentJoinThread = new AtomicReference<>();

然后该唯一线程在线程工作时，不断循环执行innerJoinCluster()函数，准备开始加入集群。

在innerJoinCluster()中，首先通过循环调用findMaster()直到找到当前节点认定的master为止。

        while (masterNode == null && joinThreadControl.joinThreadActive(currentThread)) {
            masterNode = findMaster();
        }

在findMaster中，一开始先调用pingAndWait发送ping去同集群的其他节点，并等待和收集其他节点的ping的回复，得到fullPingResponses。

List<ZenPing.PingResponse> fullPingResponses = pingAndWait(pingTimeout).toList();

在pingAndWait方法中，就和我们上一篇分析的连接起来了。

    private ZenPing.PingCollection pingAndWait(TimeValue timeout) {
        final CompletableFuture<ZenPing.PingCollection> response = new CompletableFuture<>();
        try {
            zenPing.ping(response::complete, timeout);
        } catch (Exception ex) {
            // logged later
            response.completeExceptionally(ex);
        }

        try {
            return response.get();
        } catch (InterruptedException e) {
            logger.trace("pingAndWait interrupted");
            return new ZenPing.PingCollection();
        } catch (ExecutionException e) {
            logger.warn("Ping execution failed", e);
            return new ZenPing.PingCollection();
        }
    }

我们可以看到，先构造一个CompletableFuture，然后调用ping向其他节点发送ping请求，然后等待response::complete的回调函数，(在ping接收到其他请求返回后的最后accept调用)。然后等待response的get方法返回结果。

这时候我们已经得到集群内其他节点关于选举的ping请求的回复的集合。

当前节点没有提出一个master，把当前节点的选举信息也加入到收到的请求的集合中，由于此时是刚加入还未提出master，于是master为null。

        assert fullPingResponses.stream().map(ZenPing.PingResponse::node)
            .filter(n -> n.equals(localNode)).findAny().isPresent() == false;

        fullPingResponses.add(new ZenPing.PingResponse(localNode, null, this.clusterState()));

当然此时并没有master节点被选出。

最低0.47元/天解锁文章

泮小俊233

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch集群选举

上一篇分析了es集群选举前，每个节点通过ping()获取集群内其他节点的信息。这一次从整体分析下es集群选举master的流程。在Node节点的start()方法中，通过discovery.startInitialJoin()方法开始加入集群，并参与选举。 @Override public void startInitialJoin() { // start...
复制链接

扫一扫