Nacos源码解读06——服务健康检查

健康检查场景

临时节点健康检测

1.服务端主动探活每3s检测所有超过20s没发生过通讯的客户端,向客户端发起ClientDetectionRequest探测请求,如果客户端在1s内成功响应,则检测通过,否则执行unregister方法移除Connection。
2.客户端也会每5s主动探测超过5s空闲的长连接是否存活。当长连接断开,客户端会主动向服务端发起重连。

持久节点心跳检测

心跳执行器通过每隔五秒中向Nacos Server发起HTTP请求
如果返回的server not found会向Nacos Server发起注册请求重新注册

持久节点探活

Nacos探活只有在持久节点注册时才会支持
探活支持HTTP、TCP、Mysql三种探活类型
HTTP通过检测返回200状态码标记是否健康
TPC通过Channel连接方式标记是否健康
Mysql则保证当前节点为主节点,可用于主从切换场景

服务端探活

ConnectionManager 的start方法 标准了 @PostConstruct会在Bean构造完成之后执行 ,start方法中有一个线程任务,专门用来剔除不健康的链接 每3s检测所有超过20s没发生过通讯的客户端,向客户端发起ClientDetectionRequest探测请求,如果客户端在1s内成功响应,则检测通过,否则执行unregister方法移除Connection。

public class ConnectionManager extends Subscriber<ConnectionLimitRuleChangeEvent> {
@PostConstruct
public void start() {
 
     // Start UnHealthy Connection Expel Task.
     // 启动不健康的连接驱逐任务。
     RpcScheduledExecutor.COMMON_SERVER_EXECUTOR.scheduleWithFixedDelay(new Runnable() {
         @Override
         public void run() {
             try {
 
                 // ..................
 
                 // 保存过期链接的集合
                 Set<String> outDatedConnections = new HashSet<>();
                 long now = System.currentTimeMillis();
                 for (Map.Entry<String, Connection> entry : entries) {
                     Connection client = entry.getValue();
                     String clientIp = client.getMetaInfo().getClientIp();
                     AtomicInteger integer = expelForIp.get(clientIp);
                     if (integer != null && integer.intValue() > 0) {
                         integer.decrementAndGet();
                         expelClient.add(client.getMetaInfo().getConnectionId());
                         expelCount--;
                       // 这里是个关键点,计算最后一次激活时间。
                       // 上面服务端处理请求时,都调用refreshActiveTime(requestMeta.getConnectionId());
                       // 此方法就是修改 lastActiveTime 的值
                     } else if (now - client.getMetaInfo().getLastActiveTime() >= KEEP_ALIVE_TIME) {
                         outDatedConnections.add(client.getMetaInfo().getConnectionId());
                     }
 
                 }
 
                 // ...........
 
                 // 再确认一遍连接已关闭
                 String serverIp = null;
                 String serverPort = null;
                 if (StringUtils.isNotBlank(redirectAddress) && redirectAddress.contains(Constants.COLON)) {
                     String[] split = redirectAddress.split(Constants.COLON);
                     serverIp = split[0];
                     serverPort = split[1];
                 }
 
                 for (String expelledClientId : expelClient) {
                     try {
                         Connection connection = getConnection(expelledClientId);
                         if (connection != null) {
                             ConnectResetRequest connectResetRequest = new ConnectResetRequest();
                             connectResetRequest.setServerIp(serverIp);
                             connectResetRequest.setServerPort(serverPort);
                             connection.asyncRequest(connectResetRequest, null);
                             Loggers.REMOTE_DIGEST
                                     .info("Send connection reset request , connection id = {},recommendServerIp={}, recommendServerPort={}",
                                             expelledClientId, connectResetRequest.getServerIp(),
                                             connectResetRequest.getServerPort());
                         }
 
                     } catch (ConnectionAlreadyClosedException e) {
                         // 连接确实关闭了,就移除掉。
                         unregister(expelledClientId);
                     } catch (Exception e) {
                         Loggers.REMOTE_DIGEST.error("Error occurs when expel connection, expelledClientId:{}", expelledClientId, e);
                     }
                 }
 
                 //4.client active detection.
                 // 客户端主动检测。
                 // 这里发起一个来自服务器的客户端主动检测请求。如果该请求有成功响应的话,说明该客户端又活过来了。
                 if (CollectionUtils.isNotEmpty(outDatedConnections)) {
                     Set<String> successConnections = new HashSet<>();
                     final CountDownLatch latch = new CountDownLatch(outDatedConnections.size());
                     for (String outDateConnectionId : outDatedConnections) {
                         try {
                             Connection connection = getConnection(outDateConnectionId);
                             if (connection != null) {
                                 ClientDetectionRequest clientDetectionRequest = new ClientDetectionRequest();
                                 connection.asyncRequest(clientDetectionRequest, new RequestCallBack() {
                                     @Override
                                     public void onResponse(Response response) {
                                         latch.countDown();
                                         // 救活了,刷新最后一次连接时间。放到救活列表里。
                                         if (response != null && response.isSuccess()) {
                                             connection.freshActiveTime();
                                             successConnections.add(outDateConnectionId);
                                         }
                                     }
 
                                     @Override
                                     public void onException(Throwable e) {
                                         latch.countDown();
                                     }
                                 });
                             } else {
                                 latch.countDown();
                             }
                         } catch (ConnectionAlreadyClosedException e) {
                             latch.countDown();
                         } catch (Exception e) {
                             latch.countDown();
                         }
                     }
                     latch.await(3000L, TimeUnit.MILLISECONDS);
                     for (String outDateConnectionId : outDatedConnections) {
                         // 没在救活列表中的,移除掉。
                         if (!successConnections.contains(outDateConnectionId)) {
                             unregister(outDateConnectionId);
                         }
                     }
                 }
 
                 //reset loader client
                 if (isLoaderClient) {
                     loadClient = -1;
                     redirectAddress = null;
                 }
             } catch (Throwable e) {
                 Loggers.REMOTE.error("Error occurs during connection check... ", e);
             }
         }
     }, 1000L, 3000L, TimeUnit.MILLISECONDS);
 
}

连接重置

         // 再确认一遍连接已关闭
                 String serverIp = null;
                 String serverPort = null;
                 if (StringUtils.isNotBlank(redirectAddress) && redirectAddress.contains(Constants.COLON)) {
                     String[] split = redirectAddress.split(Constants.COLON);
                     serverIp = split[0];
                     serverPort = split[1];
                 }
 
                 for (String expelledClientId : expelClient) {
                     try {
                         Connection connection = getConnection(expelledClientId);
                         if (connection != null) {
                             //这里发送异步请求不要求有回应
                             ConnectResetRequest connectResetRequest = new ConnectResetRequest();
                             connectResetRequest.setServerIp(serverIp);
                             connectResetRequest.setServerPort(serverPort);
                             connection.asyncRequest(connectResetRequest, null);
                             Loggers.REMOTE_DIGEST
                                     .info("Send connection reset request , connection id = {},recommendServerIp={}, recommendServerPort={}",
                                             expelledClientId, connectResetRequest.getServerIp(),
                                             connectResetRequest.getServerPort());
                         }
 
                     } catch (ConnectionAlreadyClosedException e) {
                         // 连接确实关闭了,就移除掉。
                         unregister(expelledClientId);
                     } catch (Exception e) {
                         Loggers.REMOTE_DIGEST.error("Error occurs when expel connection, expelledClientId:{}", expelledClientId, e);
                     }
                 }

客户端处理连接事件

RpcClient

class ConnectResetRequestHandler implements ServerRequestHandler {
    private final BlockingQueue<ReconnectContext> reconnectionSignal = new ArrayBlockingQueue<>(1);
    
    @Override
    public Response requestReply(Request request) {
        
        if (request instanceof ConnectResetRequest) {
            synchronized (RpcClient.this) {
                if (isRunning()) {
                    ConnectResetRequest connectResetRequest = (ConnectResetRequest) request;
                    if (StringUtils.isNotBlank(connectResetRequest.getServerIp())) {
                        // 解析服务信息
                        ServerInfo serverInfo = resolveServerInfo(connectResetRequest.getServerIp() + Constants.COLON + connectResetRequest.getServerPort());
                        //[] switchServerAsync(recommendServerInfo=serverInfo, onRequestFail=false);
                        // 入队,执行任务见下面
                        reconnectionSignal.offer(new ReconnectContext(recommendServerInfo, onRequestFail));
                    } else {
                      
                        switchServerAsync(null, false); //和上面的代码一样了
                    }
                }
            }
            return new ConnectResetResponse();
        }
        return null;
    }
}

RpcClient

 public final void start() throws NacosException {
  ......
          clientEventExecutor.submit(() -> {
            while (true) {
                try {
                    //关闭了直接结束循环
                    if (isShutdown()) {
                        break;
                    }
                    //从reconnectionSignal拿出之前offer塞入的数据
                    ReconnectContext reconnectContext = reconnectionSignal
                            .poll(keepAliveTime, TimeUnit.MILLISECONDS);
                    if (reconnectContext == null) {
                         // 系统时间-上一次检查时间 >= 5s,才进行健康检查
                        if (System.currentTimeMillis() - lastActiveTimeStamp >= keepAliveTime) {
                         // 向服务端发送HealthCheckRequest,成功时服务端直接返回Response,
                            boolean isHealthy = healthCheck();
                            // 如果不健康
                            if (!isHealthy) {
                                if (currentConnection == null) {
                                    continue;
                                }
                                LoggerUtils.printIfInfoEnabled(LOGGER,
                                        "[{}] Server healthy check fail, currentConnection = {}", name,
                                        currentConnection.getConnectionId());
                                
                                RpcClientStatus rpcClientStatus = RpcClient.this.rpcClientStatus.get();
                                // 如果客户端已关闭,则终止重连任务
                                if (RpcClientStatus.SHUTDOWN.equals(rpcClientStatus)) {
                                    break;
                                }
                                  // 标记客户端状态为UNHEALTHY(不健康的)
                                boolean statusFLowSuccess = RpcClient.this.rpcClientStatus
                                        .compareAndSet(rpcClientStatus, RpcClientStatus.UNHEALTHY);
                                if (statusFLowSuccess) {
                                    reconnectContext = new ReconnectContext(null, false);
                                } else {
                                    continue;
                                }
                                
                            } else {
                             // 如果存活,则更新存活时间
                                lastActiveTimeStamp = System.currentTimeMillis();
                                continue;
                            }
                        } else {
                            continue;
                        }
                        
                    }
                    
                    if (reconnectContext.serverInfo != null) {
                        // clear recommend server if server is not in server list.
                        boolean serverExist = false;
                        // 获取服务列表
                        for (String server : getServerListFactory().getServerList()) {
                            ServerInfo serverInfo = resolveServerInfo(server);
                            if (serverInfo.getServerIp().equals(reconnectContext.serverInfo.getServerIp())) {
                                serverExist = true;
                                 // 更新端口为发起连接重置的服务端口
                                reconnectContext.serverInfo.serverPort = serverInfo.serverPort;
                                break;
                            }
                        }
                        if (!serverExist) {
                            LoggerUtils.printIfInfoEnabled(LOGGER,
                                    "[{}] Recommend server is not in server list, ignore recommend server {}", name,                        // 如果发起连接重置的服务不在服务列表中,则清除serverInfo
                                    reconnectContext.serverInfo.getAddress());
                            
                            reconnectContext.serverInfo = null;
                            
                        }
                    }
                      // 重连
                    reconnect(reconnectContext.serverInfo, reconnectContext.onRequestFail);
                } catch (Throwable throwable) {
                    // Do nothing
                }
            }
        });
  ......
 }

发起一个心跳请求

private boolean healthCheck() {
    HealthCheckRequest healthCheckRequest = new HealthCheckRequest();
    if (this.currentConnection == null) {
        return false;
    }
    try {
        // 执行请求
        Response response = this.currentConnection.request(healthCheckRequest, 3000L);
        // not only check server is ok ,also check connection is register.
        // 判断响应
        return response == null ? false : response.isSuccess();
    } catch (NacosException e) {
        // ignore
    }
    return false;
}

HealthCheckRequestHandler HealthCheckResponse 返回做了一个空实现

@Component
public class HealthCheckRequestHandler extends RequestHandler<HealthCheckRequest, HealthCheckResponse> {
    @Override
    @TpsControl(pointName = "HealthCheck")
    public HealthCheckResponse handle(HealthCheckRequest request, RequestMeta meta) {
        return new HealthCheckResponse();
    }
}
 
public class HealthCheckResponse extends Response {
}

健康检测

在ConnectionManager 的start方法中大家可以看到下面如下代码他会发起一个客户端主动检测的一个请求 为ClientDetectionRequest 如果响应成功则说明客户端还存活则不需要剔除

                //4.client active detection.
                 // 客户端主动检测。
                 // 这里发起一个来自服务器的客户端主动检测请求。如果该请求有成功响应的话,说明该客户端又活过来了。
                 if (CollectionUtils.isNotEmpty(outDatedConnections)) {
                     Set<String> successConnections = new HashSet<>();
                     final CountDownLatch latch = new CountDownLatch(outDatedConnections.size());
                     for (String outDateConnectionId : outDatedConnections) {
                         try {
                             Connection connection = getConnection(outDateConnectionId);
                             if (connection != null) {
                                 ClientDetectionRequest clientDetectionRequest = new ClientDetectionRequest();
                                 connection.asyncRequest(clientDetectionRequest, new RequestCallBack() {
                                     @Override
                                     public void onResponse(Response response) {
                                         latch.countDown();
                                         // 救活了,刷新最后一次连接时间。放到救活列表里。
                                         if (response != null && response.isSuccess()) {
                                             connection.freshActiveTime();
                                             successConnections.add(outDateConnectionId);
                                         }
                                     }
 
                                     @Override
                                     public void onException(Throwable e) {
                                         latch.countDown();
                                     }
                                 });
                             } else {
                                 latch.countDown();
                             }
                         } catch (ConnectionAlreadyClosedException e) {
                             latch.countDown();
                         } catch (Exception e) {
                             latch.countDown();
                         }
                     }
                     latch.await(3000L, TimeUnit.MILLISECONDS);
                     for (String outDateConnectionId : outDatedConnections) {
                         // 没在救活列表中的,移除掉。
                         if (!successConnections.contains(outDateConnectionId)) {
                             unregister(outDateConnectionId);
                         }
                     }
                 }
 
                 //reset loader client
                 if (isLoaderClient) {
                     loadClient = -1;
                     redirectAddress = null;
                 }
             } catch (Throwable e) {
                 Loggers.REMOTE.error("Error occurs during connection check... ", e);
             }
         }
     }, 1000L, 3000L, TimeUnit.MILLISECONDS);
        registerServerRequestHandler(request -> {
            if (request instanceof ClientDetectionRequest) {
                return new ClientDetectionResponse();
            }
            
            return null;
        });

服务注销

    public synchronized void unregister(String connectionId) {
        //根据客户端id 移除客户端
        Connection remove = this.connections.remove(connectionId);
        if (remove != null) {
            String clientIp = remove.getMetaInfo().clientIp;
            AtomicInteger atomicInteger = connectionForClientIp.get(clientIp);
            if (atomicInteger != null) {
                int count = atomicInteger.decrementAndGet();
                if (count <= 0) {
                    connectionForClientIp.remove(clientIp);
                }
            }
            remove.close();
            Loggers.REMOTE_DIGEST.info("[{}]Connection unregistered successfully. ", connectionId);
            clientConnectionEventListenerRegistry.notifyClientDisConnected(remove);
        }
    }

向注册的所有ClientConnectionEventListener发送断连事件,主要看ConnectionBasedClientManager,其它两个只是清除缓存(ConfigConnectionEventListener和RpcAckCallbackInitorOrCleaner)

    public void notifyClientDisConnected(final Connection connection) {
        
        for (ClientConnectionEventListener clientConnectionEventListener : clientConnectionEventListeners) {
            try {
                clientConnectionEventListener.clientDisConnected(connection);
            } catch (Throwable throwable) {
                Loggers.REMOTE.info("[NotifyClientDisConnected] failed for listener {}",
                        clientConnectionEventListener.getName(), throwable);
            }
        }
        
    }

ConnectionBasedClientManager

// 把启动线程放到构造器中 生成 Bean 时,会调用构造器,从而启动任务线程。
    public ConnectionBasedClientManager() {
        GlobalExecutor
                .scheduleExpiredClientCleaner(new ExpiredClientCleaner(this), 0, Constants.DEFAULT_HEART_BEAT_INTERVAL,
                        TimeUnit.MILLISECONDS);
    }
    private static class ExpiredClientCleaner implements Runnable {
        
        private final ConnectionBasedClientManager clientManager;
        
        public ExpiredClientCleaner(ConnectionBasedClientManager clientManager) {
            this.clientManager = clientManager;
        }
        
        @Override
        public void run() {
            long currentTime = System.currentTimeMillis();
            //遍历客户端id
            for (String each : clientManager.allClientId()) {
                ConnectionBasedClient client = (ConnectionBasedClient) clientManager.getClient(each);
                //判断客户端是否失效
                if (null != client && client.isExpire(currentTime)) {
                    //剔除
                    clientManager.clientDisconnected(each);
                }
            }
        }
    }

如果不是本地服务(非本地服务)并且距离上次续约时间已经超过客户端配置中配置的客户端过期时间

    @Override
    public boolean isExpire(long currentTime) {
        return !isNative() && currentTime - getLastRenewTime() > ClientConfig.getInstance().getClientExpiredTime();
    }

客户端重连

客户端会每5s主动探测超过5s空闲的长连接是否存活。当长连接断开,客户端会主动向服务端发起重连。

      clientEventExecutor.submit(() -> {
            while (true) {
                try {
                    //关闭了直接结束循环
                    if (isShutdown()) {
                        break;
                    }
                    //从reconnectionSignal拿出之前offer塞入的数据  这里会有5秒超时
                    ReconnectContext reconnectContext = reconnectionSignal
                            .poll(keepAliveTime, TimeUnit.MILLISECONDS);
                    if (reconnectContext == null) {
                         // 系统时间-上一次检查时间 >= 5s,才进行健康检查
                        if (System.currentTimeMillis() - lastActiveTimeStamp >= keepAliveTime) {
                         // 向服务端发送HealthCheckRequest,成功时服务端直接返回Response,
                            boolean isHealthy = healthCheck();
                            // 如果不健康
                            if (!isHealthy) {
                                if (currentConnection == null) {
                                    continue;
                                }
                                LoggerUtils.printIfInfoEnabled(LOGGER,
                                        "[{}] Server healthy check fail, currentConnection = {}", name,
                                        currentConnection.getConnectionId());
                                
                                RpcClientStatus rpcClientStatus = RpcClient.this.rpcClientStatus.get();
                                // 如果客户端已关闭,则终止重连任务
                                if (RpcClientStatus.SHUTDOWN.equals(rpcClientStatus)) {
                                    break;
                                }
                                  // 标记客户端状态为UNHEALTHY(不健康的)
                                boolean statusFLowSuccess = RpcClient.this.rpcClientStatus
                                        .compareAndSet(rpcClientStatus, RpcClientStatus.UNHEALTHY);
                                if (statusFLowSuccess) {
                                    reconnectContext = new ReconnectContext(null, false);
                                } else {
                                    continue;
                                }
                                
                            } else {
                             // 如果存活,则更新存活时间
                                lastActiveTimeStamp = System.currentTimeMillis();
                                continue;
                            }
                        } else {
                            continue;
                        }
                        
                    }
                    
                    if (reconnectContext.serverInfo != null) {
                        // clear recommend server if server is not in server list.
                        boolean serverExist = false;
                        // 获取服务列表
                        for (String server : getServerListFactory().getServerList()) {
                            ServerInfo serverInfo = resolveServerInfo(server);
                            if (serverInfo.getServerIp().equals(reconnectContext.serverInfo.getServerIp())) {
                                serverExist = true;
                                 // 更新端口为发起连接重置的服务端口
                                reconnectContext.serverInfo.serverPort = serverInfo.serverPort;
                                break;
                            }
                        }
                        if (!serverExist) {
                            LoggerUtils.printIfInfoEnabled(LOGGER,
                                    "[{}] Recommend server is not in server list, ignore recommend server {}", name,                        // 如果发起连接重置的服务不在服务列表中,则清除serverInfo
                                    reconnectContext.serverInfo.getAddress());
                            
                            reconnectContext.serverInfo = null;
                            
                        }
                    }
                      // 重连
                    reconnect(reconnectContext.serverInfo, reconnectContext.onRequestFail);
                } catch (Throwable throwable) {
                    // Do nothing
                }
            }
        });
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值