elasticsearch中集群选举中的ping源码解析

最新推荐文章于 2024-03-27 10:00:00 发布

tydhot

最新推荐文章于 2024-03-27 10:00:00 发布

阅读量764

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch 源码集群

本文链接：https://blog.csdn.net/weixin_40318210/article/details/81489151

版权

elasticsearch 专栏收录该内容

3 篇文章 1 订阅

订阅专栏

在elasticsearch在选举中，节点之间的相互投票通过ping来实现。

其中的实现类为UnicastZenPing，在其构造方法中读取配置中的discovery.zen.ping.unicast.hosts

来把保存节点之间的别的节点ip。

if (DISCOVERY_ZEN_PING_UNICAST_HOSTS_SETTING.exists(settings)) {
    configuredHosts = DISCOVERY_ZEN_PING_UNICAST_HOSTS_SETTING.get(settings);
    // we only limit to 1 addresses, makes no sense to ping 100 ports
    limitPortCounts = LIMIT_FOREIGN_PORTS_COUNT;
}

当节点要参与选举，而希望得到别的节点的信息的时候将会通过ping()方法来获得。

@Override
public void ping(final Consumer<PingCollection> resultsConsumer, final TimeValue duration) {
    ping(resultsConsumer, duration, duration);
}

其参数resultConsumer实则是CompletableFuture：：complete，处理一次ping操作结束的操作。

在ping()方法中，首先通过resolveHostLists()方法来解析集群内的节点。

final List<DiscoveryNode> seedNodes;
try {
    seedNodes = resolveHostsLists(
        unicastZenPingExecutorService,
        logger,
        configuredHosts,
        limitPortCounts,
        transportService,
        UNICAST_NODE_PREFIX,
        resolveTimeout);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

在resolveHostLists()方法中，会根据所有配置的host ip，创建一个定时任务，以便并发的将host ip解析成方便transportService调用的TransportAddress。

final List<Callable<TransportAddress[]>> callables =
    hosts
        .stream()
        .map(hn -> (Callable<TransportAddress[]>) () -> transportService.addressesFromString(hn, limitPortCounts))
        .collect(Collectors.toList());
final List<Future<TransportAddress[]>> futures =
    executorService.invokeAll(callables, resolveTimeout.nanos(), TimeUnit.NANOSECONDS);

接下来会得到本机的publish ip和绑定的ip，然后将会遍历所有的future去等待地址的解析完毕。在得到解析结果之后，过滤掉自己的publish ip和绑定的ip之后，包装成discoveryNode，并加入返回的结果集。

localAddresses.add(transportService.boundAddress().publishAddress());
localAddresses.addAll(Arrays.asList(transportService.boundAddress().boundAddresses()));
// ExecutorService#invokeAll guarantees that the futures are returned in the iteration order of the tasks so we can associate the
// hostname with the corresponding task by iterating together
final Iterator<String> it = hosts.iterator();
for (final Future<TransportAddress[]> future : futures) {
    final String hostname = it.next();
    if (!future.isCancelled()) {
        assert future.isDone();
        try {
            final TransportAddress[] addresses = future.get();
            logger.trace("resolved host [{}] to {}", hostname, addresses);
            for (int addressId = 0; addressId < addresses.length; addressId++) {
                final TransportAddress address = addresses[addressId];
                // no point in pinging ourselves
                if (localAddresses.contains(address) == false) {
                    discoveryNodes.add(
                        new DiscoveryNode(
                            nodeId_prefix + hostname + "_" + addressId + "#",
                            address,
                            emptyMap(),
                            emptySet(),
                            Version.CURRENT.minimumCompatibilityVersion()));
                }
            }
        } catch (final ExecutionException e) {
            assert e.getCause() != null;
            final String message = "failed to resolve host [" + hostname + "]";
            logger.warn(message, e.getCause());
        }
    } else {
        logger.warn("timed out after [{}] resolving host [{}]", resolveTimeout, hostname);
    }
}
return discoveryNodes;

在完成了，配置中节点的解析之后，还可以根据集群中的别的节点去发现更多的节点，之后，根据集群状态clusterState中将所有可能成为master节点的节点（也就是master属性为true）也加入到seedNodes中。

然后构造一个ConnectionProfile，确定ping操作的连接类型为reg，以及握手和连接的timeout，默认为3秒。

之后就会构造一次ping操作的抽象，PingingRound。

final ConnectionProfile connectionProfile =
    ConnectionProfile.buildSingleChannelProfile(TransportRequestOptions.Type.REG, requestDuration, requestDuration);
final PingingRound pingingRound = new PingingRound(pingingRoundIdGenerator.incrementAndGet(), seedNodes, resultsConsumer,
    nodes.getLocalNode(), connectionProfile);
activePingingRounds.put(pingingRound.id(), pingingRound);

一个PingingRound代表本节点一次ping操作，包含了ping操作的连接类型和timeout，所要发往的节点，和本轮操作的id。

接下来将会构造一个pingSender，分三轮，按照一定的时间间隔通过sendPings方法向所有节点发送三次ping请求。

final AbstractRunnable pingSender = new AbstractRunnable() {
    @Override
    public void onFailure(Exception e) {
        if (e instanceof AlreadyClosedException == false) {
            logger.warn("unexpected error while pinging", e);
        }
    }

    @Override
    protected void doRun() throws Exception {
        sendPings(requestDuration, pingingRound);
    }
};
threadPool.generic().execute(pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3), ThreadPool.Names.GENERIC, pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3 * 2), ThreadPool.Names.GENERIC, pingSender);

默认时间为1秒一次，执行一次pingSender的sendPings()方法。

final UnicastPingRequest pingRequest = new UnicastPingRequest();
pingRequest.id = pingingRound.id();
pingRequest.timeout = timeout;
ClusterState lastState = contextProvider.clusterState();

pingRequest.pingResponse = createPingResponse(lastState);

Set<DiscoveryNode> nodesFromResponses = temporalResponses.stream().map(pingResponse -> {
    assert clusterName.equals(pingResponse.clusterName()) :
        "got a ping request from a different cluster. expected " + clusterName + " got " + pingResponse.clusterName();
    return pingResponse.node();
}).collect(Collectors.toSet());

// dedup by address
final Map<TransportAddress, DiscoveryNode> uniqueNodesByAddress =
    Stream.concat(pingingRound.getSeedNodes().stream(), nodesFromResponses.stream())
        .collect(Collectors.toMap(DiscoveryNode::getAddress, Function.identity(), (n1, n2) -> n1));


// resolve what we can via the latest cluster state
final Set<DiscoveryNode> nodesToPing = uniqueNodesByAddress.values().stream()
    .map(node -> {
        DiscoveryNode foundNode = lastState.nodes().findByAddress(node.getAddress());
        if (foundNode == null) {
            return node;
        } else {
            return foundNode;
        }
    }).collect(Collectors.toSet());

nodesToPing.forEach(node -> sendPingRequestToNode(node, timeout, pingingRound, pingRequest));

在其sendPings()方法中，在之前得到seedNodes的基础上，再加上之前向当前节点同样发送ping消息的同集群消息。

同时，当前构造一个pingRequest，其id正是pingRound的id，这样的id的pingRequest将会有三个。关于当前节点的节点数据，以及其所在集群的master节点（如果还未选出则为null），将会构造成一个pingResponse，携带在pingRequest中。

之后将会遍历之前得到的节点集合，分别调用sendPingRequstToNode()将pingRequest和pingingRound发送至目标节点。

transportService.sendRequest(connection, ACTION_NAME, pingRequest,
    TransportRequestOptions.builder().withTimeout((long) (timeout.millis() * 1.25)).build(),
    getPingResponseHandler(pingingRound, node));

在sendPingRequstToNode()中，启动了一个线程，最终去向目标节点的discovery/zen/unicast发送ping请求。同时通过getPingResponseHandler()设置了关于此次的pingingRound的responseHandler，用来处理目标节点对以这个request的回复。

@Override
public void handleResponse(UnicastPingResponse response) {
    logger.trace("[{}] received response from {}: {}", pingingRound.id(), node, Arrays.toString(response.pingResponses));
    if (pingingRound.isClosed()) {
        if (logger.isTraceEnabled()) {
            logger.trace("[{}] skipping received response from {}. already closed", pingingRound.id(), node);
        }
    } else {
        Stream.of(response.pingResponses).forEach(pingingRound::addPingResponseToCollection);
    }
}

在处理response的时候，先判断当前pingingRound是否已经关闭，如果还未关闭，则将response中关于节点对于选举的数据存放在其Map中，按照节点和其选举的选择进行存放。

一次pingingRound的时间为3秒，所以在完成pingingRound中3次ping的定时任务安排后同时会schedule一个3秒之后触发的任务，用来结束pingingRound关于pingResponse的收集。

@Override
public void close() {
    List<Connection> toClose = null;
    synchronized (this) {
        if (closed.compareAndSet(false, true)) {
            activePingingRounds.remove(id);
            toClose = new ArrayList<>(tempConnections.values());
            tempConnections.clear();
        }
    }
    if (toClose != null) {
        // we actually closed
        try {
            pingListener.accept(pingCollection);
        } finally {
            IOUtils.closeWhileHandlingException(toClose);
        }
    }
}

在关闭的过程中，在同步块中通过cas将closed的状态从false改为true，并将当前id从可用pingingRound中移出，关闭当前所有与集群中别的节点的连接。

之后调用，在最一开始传进来的consumer的accept方法，将ping之后的结果返回。

以上是调用ping的节点的过程。


transportService.registerRequestHandler(ACTION_NAME, UnicastPingRequest::new, ThreadPool.Names.SAME,
    new UnicastPingRequestHandler());

每个节点在UncastZenPing的构造方法中，都会对discovery/zen/unicast注册requsetHandler用于处理ping请求的处理。

在UnicastPingRequstHandler中，在接收到别的节点的ping请求的时候，将会先判断是否处于一个集群的节点，如果是，则将会发回相应有HandlerPingRequest所产生的pingResponse。

private UnicastPingResponse handlePingRequest(final UnicastPingRequest request) {
    assert clusterName.equals(request.pingResponse.clusterName()) :
        "got a ping request from a different cluster. expected " + clusterName + " got " + request.pingResponse.clusterName();
    temporalResponses.add(request.pingResponse);
    // add to any ongoing pinging
    activePingingRounds.values().forEach(p -> p.addPingResponseToCollection(request.pingResponse));
    threadPool.schedule(TimeValue.timeValueMillis(request.timeout.millis() * 2), ThreadPool.Names.SAME,
        () -> temporalResponses.remove(request.pingResponse));

    List<PingResponse> pingResponses = CollectionUtils.iterableAsArrayList(temporalResponses);
    pingResponses.add(createPingResponse(contextProvider.clusterState()));

    UnicastPingResponse unicastPingResponse = new UnicastPingResponse();
    unicastPingResponse.id = request.id;
    unicastPingResponse.pingResponses = pingResponses.toArray(new PingResponse[pingResponses.size()]);

    return unicastPingResponse;
}

其中temporalResponse用来存放每个节点所发过来的pingRequset中携带的节点数据（PingResponse），这里先加入当前request的pingResponse，然后向当前所有可用的pingingRound中加入当前request的pingResponse作为接下来获得的结果。

最后，起一个两倍pingingRound有效timeout的定时任务来从temporalResponse众移出此次pingResponse，保证数据的时效性。

最后构造类似的pingResponse，将当前节点的选举数据，以及所接收到的其他节点的pingResponse一并返回给发出的节点。

如上文，发出的节点将在ReponseHandler中解析这些数据。

又及，pingingRound中的id也可作为一个节点数据的版本号来作为管理，当ping返回回来的结果冲突的时候，将会选择id相对较大的。

public synchronized boolean addPing(PingResponse ping) {
    PingResponse existingResponse = pings.get(ping.node());
    if (existingResponse == null || existingResponse.id() <= ping.id()) {
        pings.put(ping.node(), ping);
        return true;
    }
    return false;
}