Spring Cloud Netflix之Eureka源码系列文章一共分为六个片段
Spring Cloud Netflix-Eureka(一)、服务注册与发现
Spring Cloud Netflix-Eureka(二)、信息存储原理
Spring Cloud Netflix-Eureka(三)、自我保护机制
Spring Cloud Netflix-Eureka(四)、心跳续约机制
Spring Cloud Netflix-Eureka(五)、多级缓存机制
Spring Cloud Netflix-Eureka(六)、集群数据同步
Spring Cloud Netflix-Eureka、心跳续约机制
一、什么是心跳续约
在分布式系统中,不同主机上的节点需要检测其他节点的状态,如服务器节点需要检测从节点是否失效。为了检测对方节点的有效性,每隔固定时间就发送一个固定信息(心跳包)给对方,对方收到后回复一个固定信息,如果长时间没有收到对方的回复,则断开与对方的连接。因为是每隔固定时间发送一次,类似心脏跳动,所以称为心跳续约。
一般而言,应该客户端主动向服务器发送心跳包,因为服务器向客户端发送心跳包会影响服务器的性能。
二、Eureka 心跳续约
在 Eureka-Server 通过心跳续约的方式来检查各个服务提供者的健康状态,进而判断当前服务是否服务不可用的。
实际上,Eureka-Server 在判断服务是否不可用,主要会分两个逻辑:
- Eureka Client 需要定时发送心跳包---------服务端接收心跳包
- Eureka Server 需要定期检查服务提供者的健康状态------------服务端处理心跳包
Eureka的心跳续约机制:
- 客户端在启动时, 会开启一个心跳任务,每隔 30s(可配置,
eureka.client.renewal-interval-in-secs
) 向服务单发送一次心跳请求。 - 服务端维护了每个实例的最后一次心跳时间,客户端发送心跳包过来后,会更新这个心跳时间。
- 服务端在启动时,开启了一个定时任务,该任务每隔 60s(可配置,
eureka.server.eviction-interval-timer-in-ms
) 执行一次,检查每个实例的最后一次心跳时间是否超过 90s(可配置,eureka.server.eviction-interval-timer-in-ms
),如果超过则需要剔除。
2.1 Eureka Client 定时发送心跳包
客户端在启动过程中,会初始化一个定时任务,每隔 30s 向服务端发起一个心跳包。
2.1.1 DiscoveryClient.initScheduledTasks()
在前面的 Spring Cloud Netflix-Eureka(一)、服务注册与发现 中有这么一段 DiscoveryClient.initScheduledTasks()
,会初始化一个定时任务,负责心跳、实例数据更新,具体代码如下。
@Singleton
public class DiscoveryClient implements EurekaClient {
/**
* 初始化一个定时任务,负责心跳、实例数据更新
*/
private void initScheduledTasks() {
// 省略其他代码...
// 默认30秒
int renewalIntervalInSecs = instanceInfo.getLeaseInfo().getRenewalIntervalInSecs();
// 用于初始化最大超时延迟时间,默认10
int expBackOffBound = clientConfig.getHeartbeatExecutorExponentialBackOffBound();
logger.info("Starting heartbeat executor: " + "renew interval is: {}", renewalIntervalInSecs);
// Heartbeat timer
// 开启一个心跳任务
heartbeatTask = new TimedSupervisorTask(
"heartbeat",
scheduler,
heartbeatExecutor,
renewalIntervalInSecs,
TimeUnit.SECONDS,
expBackOffBound,
new HeartbeatThread() // 心跳续约执行线程
);
scheduler.schedule(
heartbeatTask,
renewalIntervalInSecs, TimeUnit.SECONDS);
// 省略其他代码...
}
}
2.1.2 TimedSupervisorTask.run()
客户端发起心跳续约定时任务。
public class TimedSupervisorTask extends TimerTask {
@Override
public void run() {
Future<?> future = null;
try {
// 通过 task 异步发送心跳包
future = executor.submit(task);
threadPoolLevelGauge.set((long) executor.getActiveCount());
// 阻塞
future.get(timeoutMillis, TimeUnit.MILLISECONDS); // block until done or timeout
// 设置延迟时间
delay.set(timeoutMillis);
threadPoolLevelGauge.set((long) executor.getActiveCount());
successCounter.increment();
} catch (TimeoutException e) {
logger.warn("task supervisor timed out", e);
timeoutCounter.increment();
// 获取当前延迟时间
long currentDelay = delay.get();
// 如果请求超时,则在当前延迟时间上乘 2 ,直到达到最大值 maxDelay
long newDelay = Math.min(maxDelay, currentDelay * 2);
// 设置当前延迟时间
delay.compareAndSet(currentDelay, newDelay);
} catch (RejectedExecutionException e) {
if (executor.isShutdown() || scheduler.isShutdown()) {
logger.warn("task supervisor shutting down, reject the task", e);
} else {
logger.warn("task supervisor rejected the task", e);
}
rejectedCounter.increment();
} catch (Throwable e) {
if (executor.isShutdown() || scheduler.isShutdown()) {
logger.warn("task supervisor shutting down, can't accept the task");
} else {
logger.warn("task supervisor threw an exception", e);
}
throwableCounter.increment();
} finally {
if (future != null) {
future.cancel(true);
}
if (!scheduler.isShutdown()) {
// 循环执行
scheduler.schedule(this, delay.get(), TimeUnit.MILLISECONDS);
}
}
}
}
2.1.3 HeartbeatThread
HeartbeatThread 为 DiscoveryClient 内部类,实现了 Runnable 接口。
@Singleton
public class DiscoveryClient implements EurekaClient {
private class HeartbeatThread implements Runnable {
public void run() {
// 发送心跳包
if (renew()) {
// 更新最后成功时间
lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
}
}
}
/**
* 发送心跳包
*/
boolean renew() {
EurekaHttpResponse<InstanceInfo> httpResponse;
try {
/**
* instanceInfo.getAppName():当前服务名称
* instanceInfo.getId():当前实例id
* instanceInfo:心跳包
*/
httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
logger.debug(PREFIX + "{} - Heartbeat status: {}", appPathIdentifier, httpResponse.getStatusCode());
if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
REREGISTER_COUNTER.increment();
logger.info(PREFIX + "{} - Re-registering apps/{}", appPathIdentifier, instanceInfo.getAppName());
long timestamp = instanceInfo.setIsDirtyWithTime();
boolean success = register();
if (success) {
instanceInfo.unsetIsDirty(timestamp);
}
return success;
}
return httpResponse.getStatusCode() == Status.OK.getStatusCode();
} catch (Throwable e) {
logger.error(PREFIX + "{} - was unable to send heartbeat!", appPathIdentifier, e);
return false;
}
}
}
2.1.4 AbstractJerseyEurekaHttpClient.sendHeartBeat
在 DiscoveryClient.renew()
方法中,调用 EurekaServer 的 "apps/" + appName + '/' + id;
这个地址,进行心跳续约。
服务端收到心跳请求后,会通过 appName、id 获取 ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
中实例,更新最后收到心跳的时间。
public abstract class AbstractJerseyEurekaHttpClient implements EurekaHttpClient {
/**
* appName:当前服务名称
* id():当前实例id
* instanceInfo:心跳包
*/
@Override
public EurekaHttpResponse<InstanceInfo> sendHeartBeat(String appName, String id, InstanceInfo info, InstanceStatus overriddenStatus) {
String urlPath = "apps/" + appName + '/' + id;
ClientResponse response = null;
try {
WebResource webResource = jerseyClient.resource(serviceUrl)
.path(urlPath)
.queryParam("status", info.getStatus().toString())
.queryParam("lastDirtyTimestamp", info.getLastDirtyTimestamp().toString());
if (overriddenStatus != null) {
webResource = webResource.queryParam("overriddenstatus", overriddenStatus.name());
}
Builder requestBuilder = webResource.getRequestBuilder();
addExtraHeaders(requestBuilder);
response = requestBuilder.put(ClientResponse.class);
EurekaHttpResponseBuilder<InstanceInfo> eurekaResponseBuilder = anEurekaHttpResponse(response.getStatus(), InstanceInfo.class).headers(headersOf(response));
if (response.hasEntity() &&
!HTML.equals(response.getType().getSubtype())) { //don't try and deserialize random html errors from the server
eurekaResponseBuilder.entity(response.getEntity(InstanceInfo.class));
}
return eurekaResponseBuilder.build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey HTTP PUT {}/{}; statusCode={}", serviceUrl, urlPath, response == null ? "N/A" : response.getStatus());
}
if (response != null) {
response.close();
}
}
}
}
2.2 Eureka Server 收到心跳处理
客户端通过 Eureka Server 对外提供的 "apps/" + appName + '/' + id;
进行心跳续约。 "apps/" + appName + '/' + id;
具体com.netflix.eureka.resources
包下的 InstanceResource
类的 renewLease()
方法。
服务端收到心跳请求后,会通过 appName、id 获取 ConcurrentHashMap<String, Map<String, Lease<InstanceInfo>>> registry
中实例,更新最后收到心跳的时间。
2.2.1 InstanceResource.renewLease()
心跳续约。
@Produces({"application/xml", "application/json"})
public class InstanceResource {
@PUT
public Response renewLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
@QueryParam("overriddenstatus") String overriddenStatus,
@QueryParam("status") String status,
@QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
boolean isFromReplicaNode = "true".equals(isReplication);
// 核心逻辑:心跳续约
boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);
// Not found in the registry, immediately ask for a register
if (!isSuccess) {// 续约失败,返回异常
logger.warn("Not Found (Renew): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
// Check if we need to sync based on dirty time stamp, the client
// instance might have changed some value
// 校验客户端与服务端的时间差异,如果存在问题则需要重新发起注册
Response response;
if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
// Store the overridden status since the validation found out the node that replicates wins
if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
&& (overriddenStatus != null)
&& !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
&& isFromReplicaNode) {
registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
}
} else {
// 续约成功,返回200
response = Response.ok().build();
}
logger.debug("Found (Renew): {} - {}; reply status={}", app.getName(), id, response.getStatus());
return response;
}
}
2.2.2 InstanceRegistry.renew(final String appName, final String serverId, boolean isReplication)
InstanceRegistry.renew(final String appName, final String serverId, boolean isReplication)
的实现方法如下,主要有两个流程
- 从服务注册列表中找到匹配当前请求的实例
- 发布EurekaInstanceRenewedEvent事件:这个事件在EurekaServer中并没有处理,我们可以监听这个事件来做一些事情,比如做监控。
public class InstanceRegistry extends PeerAwareInstanceRegistryImpl
implements ApplicationContextAware {
@Override
public boolean renew(final String appName, final String serverId,
boolean isReplication) {
log("renew " + appName + " serverId " + serverId + ", isReplication {}"
+ isReplication);
// 获取所有服务注册信息
List<Application> applications = getSortedApplications();
for (Application input : applications) {
if (input.getName().equals(appName)) {
// 获取当前实例节点
InstanceInfo instance = null;
for (InstanceInfo info : input.getInstances()) {
if (info.getId().equals(serverId)) {
instance = info;
break;
}
}
// 发布EurekaInstanceRenewedEvent事件
// 这个事件在EurekaServer中并没有处理,我们可以监听这个事件来做一些事情,比如做监控。
publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId, instance, isReplication));
break;
}
}
// 进行心跳续约
return super.renew(appName, serverId, isReplication);
}
}
调用父类的续约方法PeerAwareInstanceRegistryImpl.renew(final String appName, final String id, final boolean isReplication)
,如果续约成功,同步给其他集群节点
@Singleton
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {
public boolean renew(final String appName, final String id, final boolean isReplication) {
if (super.renew(appName, id, isReplication)) {
// 同步给其他集群节点
replicateToPeers(Action.Heartbeat, appName, id, null, null, isReplication);
return true;
}
return false;
}
}
2.2.3 AbstractInstanceRegistry.renew(String appName, String id, boolean isReplication)
调用父类的续约方法AbstractInstanceRegistry.renew(final String appName, final String id, final boolean isReplication)
。
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
public boolean renew(String appName, String id, boolean isReplication) {
RENEW.increment(isReplication);
// 根据 appName 获取实例列表
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToRenew = null;
if (gMap != null) {
// 根据 id 获取当前实例
leaseToRenew = gMap.get(id);
}
if (leaseToRenew == null) {// 不存在当前实例,返回false
RENEW_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
return false;
} else {// 存在
InstanceInfo instanceInfo = leaseToRenew.getHolder();
if (instanceInfo != null) {
// touchASGCache(instanceInfo.getASGName());
// 获取实例的运行状态
InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
instanceInfo, leaseToRenew, isReplication);
if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {// 运行状态未知,返回false
logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
+ "; re-register required", instanceInfo.getId());
RENEW_NOT_FOUND.increment(isReplication);
return false;
}
if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
logger.info(
"The instance status {} is different from overridden instance status {} for instance {}. "
+ "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
overriddenInstanceStatus.name(),
instanceInfo.getId());
instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
}
}
// 更新上一分钟的续约数量
renewsLastMin.increment();
leaseToRenew.renew();
return true;
}
}
}
2.2.5 Lease.renew()
public class Lease<T> {
/**
* 更新最后一次续约时间
*/
public void renew() {
lastUpdateTimestamp = System.currentTimeMillis() + duration;
}
}
2.3 Eureka Server 定期检查
在 Eureka Server 启动过程中,会初始化心跳检测的定时任务。
在 Spring Cloud Netflix-Eureka(三)、自我保护机制 中,涉及到自我保护机制的判断,也是在这个定时任务中实现的。
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
protected void postInit() {
// 开启一个定时任务,用来实现每分钟的续约数量,每隔 60s 归 0 重新计算
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
evictionTaskRef.get().cancel();
}
// 启动一个定时任务EvictionTask, 每隔60s执行一次
evictionTaskRef.set(new EvictionTask());
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(),// 默认60秒
serverConfig.getEvictionIntervalTimerInMs());// 默认60秒
}
}
2.3.1 EvictionTask
EvictionTask 类主要是用来判断是否开启自我保护机制 和 客户端心跳时间定期检测。
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
class EvictionTask extends TimerTask {
@Override
public void run() {
try {
/**
* 获取补偿时间毫秒数
* 补偿时间定义:为自前一次迭代以来执行该任务的实际时间与配置的执行时间。
* 这对于时间的变化(例如,由于时钟倾斜或gc)导致实际的回收任务按照配置的周期比预期的时间执行得晚的情况非常有用。
*/
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
}
// evict 为 AbstractInstanceRegistry 中的方法
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
// 是否需要开启自我保护机制,如果需要,那么直接RETURE, 不需要继续往下执行了
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
// We collect first all expired items, to evict them in random order. For large eviction sets,
// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
// the impact should be evenly distributed across all applications.
// 循环遍历 registry,获取已经过期的服务,放入expiredLeases列表,进行服务剔除
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
// 判断当前服务是否已经过期,默认90s没有收到心跳,则为过期
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
// triggering self-preservation. Without that we would wipe out full registry.
// 获取注册的实例数量
int registrySize = (int) getLocalRegistrySize();
// 主要是为了避免开启自动保护机制, 所以会逐步过期
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
// 可以过期的数量
int evictionLimit = registrySize - registrySizeThreshold;
// 取 过期数量和可以过期的数量 最小值
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {// 剔除 expiredLeases列表中所有服务
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
Random random = new Random(System.currentTimeMillis());
for (int i = 0; i < toEvict; i++) {
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
// 随机取一个过期的节点
Lease<InstanceInfo> lease = expiredLeases.get(i);
String appName = lease.getHolder().getAppName();
String id = lease.getHolder().getId();
// 增加过期数量
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
// 服务下线
internalCancel(appName, id, false);
}
}
}
}