elasticsearch中集群选举中的ping源码解析

在elasticsearch在选举中,节点之间的相互投票通过ping来实现。

 

 

其中的实现类为UnicastZenPing,在其构造方法中读取配置中的discovery.zen.ping.unicast.hosts

来把保存节点之间的别的节点ip。

if (DISCOVERY_ZEN_PING_UNICAST_HOSTS_SETTING.exists(settings)) {
    configuredHosts = DISCOVERY_ZEN_PING_UNICAST_HOSTS_SETTING.get(settings);
    // we only limit to 1 addresses, makes no sense to ping 100 ports
    limitPortCounts = LIMIT_FOREIGN_PORTS_COUNT;
} 

 

当节点要参与选举,而希望得到别的节点的信息的时候将会通过ping()方法来获得。

@Override
public void ping(final Consumer<PingCollection> resultsConsumer, final TimeValue duration) {
    ping(resultsConsumer, duration, duration);
}

其参数resultConsumer实则是CompletableFuture::complete,处理一次ping操作结束的操作。

 

在ping()方法中,首先通过resolveHostLists()方法来解析集群内的节点。

final List<DiscoveryNode> seedNodes;
try {
    seedNodes = resolveHostsLists(
        unicastZenPingExecutorService,
        logger,
        configuredHosts,
        limitPortCounts,
        transportService,
        UNICAST_NODE_PREFIX,
        resolveTimeout);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

在resolveHostLists()方法中,会根据所有配置的host ip,创建一个定时任务,以便并发的将host ip解析成方便transportService调用的TransportAddress。

final List<Callable<TransportAddress[]>> callables =
    hosts
        .stream()
        .map(hn -> (Callable<TransportAddress[]>) () -> transportService.addressesFromString(hn, limitPortCounts))
        .collect(Collectors.toList());
final List<Future<TransportAddress[]>> futures =
    executorService.invokeAll(callables, resolveTimeout.nanos(), TimeUnit.NANOSECONDS);

接下来会得到本机的publish ip和绑定的ip,然后将会遍历所有的future去等待地址的解析完毕。在得到解析结果之后,过滤掉自己的publish ip和绑定的ip之后,包装成discoveryNode,并加入返回的结果集。

localAddresses.add(transportService.boundAddress().publishAddress());
localAddresses.addAll(Arrays.asList(transportService.boundAddress().boundAddresses()));
// ExecutorService#invokeAll guarantees that the futures are returned in the iteration order of the tasks so we can associate the
// hostname with the corresponding task by iterating together
final Iterator<String> it = hosts.iterator();
for (final Future<TransportAddress[]> future : futures) {
    final String hostname = it.next();
    if (!future.isCancelled()) {
        assert future.isDone();
        try {
            final TransportAddress[] addresses = future.get();
            logger.trace("resolved host [{}] to {}", hostname, addresses);
            for (int addressId = 0; addressId < addresses.length; addressId++) {
                final TransportAddress address = addresses[addressId];
                // no point in pinging ourselves
                if (localAddresses.contains(address) == false) {
                    discoveryNodes.add(
                        new DiscoveryNode(
                            nodeId_prefix + hostname + "_" + addressId + "#",
                            address,
                            emptyMap(),
                            emptySet(),
                            Version.CURRENT.minimumCompatibilityVersion()));
                }
            }
        } catch (final ExecutionException e) {
            assert e.getCause() != null;
            final String message = "failed to resolve host [" + hostname + "]";
            logger.warn(message, e.getCause());
        }
    } else {
        logger.warn("timed out after [{}] resolving host [{}]", resolveTimeout, hostname);
    }
}
return discoveryNodes;

在完成了,配置中节点的解析之后,还可以根据集群中的别的节点去发现更多的节点,之后,根据集群状态clusterState中将所有可能成为master节点的节点(也就是master属性为true)也加入到seedNodes中。

 

然后构造一个ConnectionProfile,确定ping操作的连接类型为reg,以及握手和连接的timeout,默认为3秒。

之后就会构造一次ping操作的抽象,PingingRound。

final ConnectionProfile connectionProfile =
    ConnectionProfile.buildSingleChannelProfile(TransportRequestOptions.Type.REG, requestDuration, requestDuration);
final PingingRound pingingRound = new PingingRound(pingingRoundIdGenerator.incrementAndGet(), seedNodes, resultsConsumer,
    nodes.getLocalNode(), connectionProfile);
activePingingRounds.put(pingingRound.id(), pingingRound);

一个PingingRound代表本节点一次ping操作,包含了ping操作的连接类型和timeout,所要发往的节点,和本轮操作的id。

 

接下来将会构造一个pingSender,分三轮,按照一定的时间间隔通过sendPings方法向所有节点发送三次ping请求。

final AbstractRunnable pingSender = new AbstractRunnable() {
    @Override
    public void onFailure(Exception e) {
        if (e instanceof AlreadyClosedException == false) {
            logger.warn("unexpected error while pinging", e);
        }
    }

    @Override
    protected void doRun() throws Exception {
        sendPings(requestDuration, pingingRound);
    }
};
threadPool.generic().execute(pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3), ThreadPool.Names.GENERIC, pingSender);
threadPool.schedule(TimeValue.timeValueMillis(scheduleDuration.millis() / 3 * 2), ThreadPool.Names.GENERIC, pingSender);

默认时间为1秒一次,执行一次pingSender的sendPings()方法。

final UnicastPingRequest pingRequest = new UnicastPingRequest();
pingRequest.id = pingingRound.id();
pingRequest.timeout = timeout;
ClusterState lastState = contextProvider.clusterState();

pingRequest.pingResponse = createPingResponse(lastState);

Set<DiscoveryNode> nodesFromResponses = temporalResponses.stream().map(pingResponse -> {
    assert clusterName.equals(pingResponse.clusterName()) :
        "got a ping request from a different cluster. expected " + clusterName + " got " + pingResponse.clusterName();
    return pingResponse.node();
}).collect(Collectors.toSet());

// dedup by address
final Map<TransportAddress, DiscoveryNode> uniqueNodesByAddress =
    Stream.concat(pingingRound.getSeedNodes().stream(), nodesFromResponses.stream())
        .collect(Collectors.toMap(DiscoveryNode::getAddress, Function.identity(), (n1, n2) -> n1));


// resolve what we can via the latest cluster state
final Set<DiscoveryNode> nodesToPing = uniqueNodesByAddress.values().stream()
    .map(node -> {
        DiscoveryNode foundNode = lastState.nodes().findByAddress(node.getAddress());
        if (foundNode == null) {
            return node;
        } else {
            return foundNode;
        }
    }).collect(Collectors.toSet());

nodesToPing.forEach(node -> sendPingRequestToNode(node, timeout, pingingRound, pingRequest));

在其sendPings()方法中,在之前得到seedNodes的基础上,再加上之前向当前节点同样发送ping消息的同集群消息。

同时,当前构造一个pingRequest,其id正是pingRound的id,这样的id的pingRequest将会有三个。关于当前节点的节点数据,以及其所在集群的master节点(如果还未选出则为null),将会构造成一个pingResponse,携带在pingRequest中。

之后将会遍历之前得到的节点集合,分别调用sendPingRequstToNode()将pingRequest和pingingRound发送至目标节点。

transportService.sendRequest(connection, ACTION_NAME, pingRequest,
    TransportRequestOptions.builder().withTimeout((long) (timeout.millis() * 1.25)).build(),
    getPingResponseHandler(pingingRound, node));

在sendPingRequstToNode()中,启动了一个线程,最终去向目标节点的discovery/zen/unicast发送ping请求。同时通过getPingResponseHandler()设置了关于此次的pingingRound的responseHandler,用来处理目标节点对以这个request的回复。

@Override
public void handleResponse(UnicastPingResponse response) {
    logger.trace("[{}] received response from {}: {}", pingingRound.id(), node, Arrays.toString(response.pingResponses));
    if (pingingRound.isClosed()) {
        if (logger.isTraceEnabled()) {
            logger.trace("[{}] skipping received response from {}. already closed", pingingRound.id(), node);
        }
    } else {
        Stream.of(response.pingResponses).forEach(pingingRound::addPingResponseToCollection);
    }
}

在处理response的时候,先判断当前pingingRound是否已经关闭,如果还未关闭,则将response中关于节点对于选举的数据存放在其Map中,按照节点和其选举的选择进行存放。

 

一次pingingRound的时间为3秒,所以在完成pingingRound中3次ping的定时任务安排后同时会schedule一个3秒之后触发的任务,用来结束pingingRound关于pingResponse的收集。

@Override
public void close() {
    List<Connection> toClose = null;
    synchronized (this) {
        if (closed.compareAndSet(false, true)) {
            activePingingRounds.remove(id);
            toClose = new ArrayList<>(tempConnections.values());
            tempConnections.clear();
        }
    }
    if (toClose != null) {
        // we actually closed
        try {
            pingListener.accept(pingCollection);
        } finally {
            IOUtils.closeWhileHandlingException(toClose);
        }
    }
}

在关闭的过程中,在同步块中通过cas将closed的状态从false改为true,并将当前id从可用pingingRound中移出,关闭当前所有与集群中别的节点的连接。

之后调用,在最一开始传进来的consumer的accept方法,将ping之后的结果返回。

 

以上是调用ping的节点的过程。


transportService.registerRequestHandler(ACTION_NAME, UnicastPingRequest::new, ThreadPool.Names.SAME,
    new UnicastPingRequestHandler());

每个节点在UncastZenPing的构造方法中,都会对discovery/zen/unicast注册requsetHandler用于处理ping请求的处理。

 

在UnicastPingRequstHandler中,在接收到别的节点的ping请求的时候,将会先判断是否处于一个集群的节点,如果是,则将会发回相应有HandlerPingRequest所产生的pingResponse。

private UnicastPingResponse handlePingRequest(final UnicastPingRequest request) {
    assert clusterName.equals(request.pingResponse.clusterName()) :
        "got a ping request from a different cluster. expected " + clusterName + " got " + request.pingResponse.clusterName();
    temporalResponses.add(request.pingResponse);
    // add to any ongoing pinging
    activePingingRounds.values().forEach(p -> p.addPingResponseToCollection(request.pingResponse));
    threadPool.schedule(TimeValue.timeValueMillis(request.timeout.millis() * 2), ThreadPool.Names.SAME,
        () -> temporalResponses.remove(request.pingResponse));

    List<PingResponse> pingResponses = CollectionUtils.iterableAsArrayList(temporalResponses);
    pingResponses.add(createPingResponse(contextProvider.clusterState()));

    UnicastPingResponse unicastPingResponse = new UnicastPingResponse();
    unicastPingResponse.id = request.id;
    unicastPingResponse.pingResponses = pingResponses.toArray(new PingResponse[pingResponses.size()]);

    return unicastPingResponse;
}

其中temporalResponse用来存放每个节点所发过来的pingRequset中携带的节点数据(PingResponse),这里先加入当前request的pingResponse,然后向当前所有可用的pingingRound中加入当前request的pingResponse作为接下来获得的结果。

最后,起一个两倍pingingRound有效timeout的定时任务来从temporalResponse众移出此次pingResponse,保证数据的时效性。

最后构造类似的pingResponse,将当前节点的选举数据,以及所接收到的其他节点的pingResponse一并返回给发出的节点。

 

如上文,发出的节点将在ReponseHandler中解析这些数据。

 

又及,pingingRound中的id也可作为一个节点数据的版本号来作为管理,当ping返回回来的结果冲突的时候,将会选择id相对较大的。

public synchronized boolean addPing(PingResponse ping) {
    PingResponse existingResponse = pings.get(ping.node());
    if (existingResponse == null || existingResponse.id() <= ping.id()) {
        pings.put(ping.node(), ping);
        return true;
    }
    return false;
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值