“不积跬步,无以至千里。”
这篇文章来看看Eureka中两个重要的核心机制:心跳续约与实例摘除。
心跳续约
eureka client每隔一定时间,向server发送心跳,让eureka server知道自己还活着,还可以继续对外提供服务,简言之,就是如此。
这个心跳是通过一个调度线程池实现的,根据之前的经验,跟client有关的功能,找DiscoveryClient
这个类,准没错,
把eureka源码读完之后,会发现,DiscoveryClient
里面近2000行代码,几乎涵盖了client端的所以功能。
找个initScheduledTasks()
,可以很明显的看到跟心跳机制有关的代码
// Heartbeat timer
heartbeatTask = new TimedSupervisorTask(
"heartbeat",
scheduler,
heartbeatExecutor,
renewalIntervalInSecs,
TimeUnit.SECONDS,
expBackOffBound,
new HeartbeatThread()
);
scheduler.schedule(
heartbeatTask,
//public static final int DEFAULT_LEASE_RENEWAL_INTERVAL = 30;
//这个renewalIntervalInSecs默认值是30
renewalIntervalInSecs, TimeUnit.SECONDS);
通过一个HeartbeatThread
线程类,放到一个调度线程池里,每隔30s执行一次HeartbeatThread
线程的逻辑,发送心跳
private class HeartbeatThread implements Runnable {
public void run() {
if (renew()) {
lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
}
}
}
在renew()
方法中,走的是EurekaHttpClient的sendHeartbeat()
方法,
/**
* Renew with the eureka service by making the appropriate REST call
*/
boolean renew() {
EurekaHttpResponse<InstanceInfo> httpResponse;
try {
//发送心跳
httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
logger.debug(PREFIX + "{} - Heartbeat status: {}", appPathIdentifier, httpResponse.getStatusCode());
if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
REREGISTER_COUNTER.increment();
logger.info(PREFIX + "{} - Re-registering apps/{}", appPathIdentifier, instanceInfo.getAppName());
long timestamp = instanceInfo.setIsDirtyWithTime();
boolean success = register();
if (success) {
instanceInfo.unsetIsDirty(timestamp);
}
return success;
}
return httpResponse.getStatusCode() == Status.OK.getStatusCode();
} catch (Throwable e) {
logger.error(PREFIX + "{} - was unable to send heartbeat!", appPathIdentifier, e);
return false;
}
}
完成请求参数的拼接,发送的url类似:http://localhost:8080/v2/apps/ServiceA/i-000000-1,走的是PUT请求
@Override
public EurekaHttpResponse<InstanceInfo> sendHeartBeat(String appName, String id, InstanceInfo info, InstanceStatus overriddenStatus) {
String urlPath = "apps/" + appName + '/' + id;
Response response = null;
try {
WebTarget webResource = jerseyClient.target(serviceUrl)
.path(urlPath)
.queryParam("status", info.getStatus().toString())
.queryParam("lastDirtyTimestamp", info.getLastDirtyTimestamp().toString());
if (overriddenStatus != null) {
webResource = webResource.queryParam("overriddenstatus", overriddenStatus.name());
}
Builder requestBuilder = webResource.request();
addExtraProperties(requestBuilder);
addExtraHeaders(requestBuilder);
requestBuilder.accept(MediaType.APPLICATION_JSON_TYPE);
//.put(...) 可以发现心跳续约走的是put请求
response = requestBuilder.put(Entity.entity("{}", MediaType.APPLICATION_JSON_TYPE)); // Jersey2 refuses to handle PUT with no body
EurekaHttpResponseBuilder<InstanceInfo> eurekaResponseBuilder = anEurekaHttpResponse(response.getStatus(), InstanceInfo.class).headers(headersOf(response));
if (response.hasEntity()) {
eurekaResponseBuilder.entity(response.readEntity(InstanceInfo.class));
}
return eurekaResponseBuilder.build();
... ...
}
服务端处理心跳的Resource是InstanceResource
根据路径匹配,找到renewLease()
方法
@PUT
public Response renewLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
@QueryParam("overriddenstatus") String overriddenStatus,
@QueryParam("status") String status,
@QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
boolean isFromReplicaNode = "true".equals(isReplication);
boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);
// Not found in the registry, immediately ask for a register
if (!isSuccess) {
logger.warn("Not Found (Renew): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
// Check if we need to sync based on dirty time stamp, the client
// instance might have changed some value
Response response;
if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
// Store the overridden status since the validation found out the node that replicates wins
if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
&& (overriddenStatus != null)
&& !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
&& isFromReplicaNode) {
registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
}
} else {
response = Response.ok().build();
}
logger.debug("Found (Renew): {} - {}; reply status={}", app.getName(), id, response.getStatus());
return response;
}
发现server端处理心跳续约的是registry的renew(...)
方法,服务注册相关的功能,使用服务注册表来处理也算正常
这个PeerAwareInstanceRegistry
我提醒你注意,跟之前提过的DiscoveryClient
类似,也是核心类,server端的很多核心机制,都在这个注册表的类里面,要善于抓住核心。
调用了父类的renew方法
public boolean renew(final String appName, final String id, final boolean isReplication) {
if (super.renew(appName, id, isReplication)) {
replicateToPeers(Action.Heartbeat, appName, id, null, null, isReplication);
return true;
}
return false;
}
这个方法的核心逻辑是根据服务名称和实例Id,拿到一个Lease<InstanceInfo>
,然后调用renew()
方法
这么大一坨代码,有用的就下面那一行,leaseToRenew.renew();
public boolean renew(String appName, String id, boolean isReplication) {
RENEW.increment(isReplication);
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToRenew = null;
if (gMap != null) {
leaseToRenew = gMap.get(id);
}
if (leaseToRenew == null) {
RENEW_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
return false;
} else {
InstanceInfo instanceInfo = leaseToRenew.getHolder();
if (instanceInfo != null) {
// touchASGCache(instanceInfo.getASGName());
InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
instanceInfo, leaseToRenew, isReplication);
if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
+ "; re-register required", instanceInfo.getId());
RENEW_NOT_FOUND.increment(isReplication);
return false;
}
if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
logger.info(
"The instance status {} is different from overridden instance status {} for instance {}. "
+ "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
overriddenInstanceStatus.name(),
instanceInfo.getId());
instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);
}
}
renewsLastMin.increment();
leaseToRenew.renew();
return true;
}
}
调用renew()
方法,更新了一下lastUpdateTimestamp
这个变量的时间戳
总结一下,心跳续约就是更新了服务实例的上次更新的时间戳为当前时间戳
public void renew() {
//这个duration是服务注册的补偿时间,默认是90s
//public static final int DEFAULT_DURATION_IN_SECS = 90;
lastUpdateTimestamp = System.currentTimeMillis() + duration;
}
服务注册时,lastUpdateTimestamp
取的就是当前时间戳
registrant.setLastUpdatedTimestamp();
public void setLastUpdatedTimestamp() {
this.lastUpdatedTimestamp = System.currentTimeMillis();
}
在一个分布式系统里面,心跳机制,是很重要的,可以让一个中枢控制的服务,监控所有其他的工作服务是否还活着,这个所以是一个心跳机制,就是每次更新心跳,就更新最近的一次时间戳就可以了。
实例摘除
比如现在线上有一个注册中心,还有很多个服务,在线上跑着,各个服务都会每隔一段时间发送一次心跳,但如果某个服务现在要停机,或者是重启,首先就会关闭,此时需要你自己去调用eurekaClient的shutdown()
,将服务实例停止,所以说呢,我们重点就是从eurekaClient的shutdown()
方法开始入手来看。
比如说你如果eureka client也是跟着一个web容器来启动的,ContextListener,里面有一个contextDestroyed()
,在这个方法里,你就调用eureka client的shutdown()
方法就可以了。
来规矩,调用的是DisvoveryClient的shutdown()
方法
/**
* Shuts down Eureka Client. Also sends a deregistration request to the
* eureka server.
*/
@PreDestroy
@Override
public synchronized void shutdown() {
if (isShutdown.compareAndSet(false, true)) {
logger.info("Shutting down DiscoveryClient ...");
if (statusChangeListener != null && applicationInfoManager != null) {
applicationInfoManager.unregisterStatusChangeListener(statusChangeListener.getId());
}
cancelScheduledTasks();
// If APPINFO was registered
if (applicationInfoManager != null
&& clientConfig.shouldRegisterWithEureka()
&& clientConfig.shouldUnregisterOnShutdown()) {
applicationInfoManager.setInstanceStatus(InstanceStatus.DOWN);
unregister();
}
if (eurekaTransport != null) {
eurekaTransport.shutdown();
}
heartbeatStalenessMonitor.shutdown();
registryStalenessMonitor.shutdown();
Monitors.unregisterObject(this);
logger.info("Completed shut down of DiscoveryClient");
}
}
然后调用unregister()
来取消服务实例的注册
调用EurekaHttpClient的cancel()
方法,http://localhost:8080/v2/apps/ServiceA/i-00000-1,发送的是DELETE请求
void unregister() {
// It can be null if shouldRegisterWithEureka == false
if(eurekaTransport != null && eurekaTransport.registrationClient != null) {
try {
logger.info("Unregistering ...");
EurekaHttpResponse<Void> httpResponse = eurekaTransport.registrationClient.cancel(instanceInfo.getAppName(), instanceInfo.getId());
logger.info(PREFIX + "{} - deregister status: {}", appPathIdentifier, httpResponse.getStatusCode());
} catch (Exception e) {
logger.error(PREFIX + "{} - de-registration failed{}", appPathIdentifier, e.getMessage(), e);
}
}
}
@Override
public EurekaHttpResponse<Void> cancel(String appName, String id) {
String urlPath = "apps/" + appName + '/' + id;
Response response = null;
try {
Builder resourceBuilder = jerseyClient.target(serviceUrl).path(urlPath).request();
addExtraProperties(resourceBuilder);
addExtraHeaders(resourceBuilder);
//发送Delete请求
response = resourceBuilder.delete();
return anEurekaHttpResponse(response.getStatus()).headers(headersOf(response)).build();
} finally {
if (logger.isDebugEnabled()) {
logger.debug("Jersey2 HTTP DELETE {}/{}; statusCode={}", serviceUrl, urlPath, response == null ? "N/A" : response.getStatus());
}
if (response != null) {
response.close();
}
}
}
根据路径匹配规则,最终会调用server端,由InstanceResource
组件的cancelLease()
方法来处理服务下线的请求
@DELETE
public Response cancelLease(
@HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
try {
boolean isSuccess = registry.cancel(app.getName(), id,
"true".equals(isReplication));
if (isSuccess) {
logger.debug("Found (Cancel): {} - {}", app.getName(), id);
return Response.ok().build();
} else {
logger.info("Not Found (Cancel): {} - {}", app.getName(), id);
return Response.status(Status.NOT_FOUND).build();
}
} catch (Throwable e) {
logger.error("Error (cancel): {} - {}", app.getName(), id, e);
return Response.serverError().build();
}
}
走到registry的cancel方法中
先在本地cancel清理掉这个注册的服务实例,然后调用replicateToPeers方法同步给集群其他的节点,通知它们把该服务实例进行清理
@Override
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
if (super.cancel(appName, id, isReplication)) {
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}
服务下线的逻辑比较简单,
(1)就是将服务实例从eureka server的map结构的注册表中移除掉,gMap.remove(id)
(2)把这个被移除的服务放入了一个recentCanceledQueue
队列中,看字面意思就是最近被移除这么个队列
(3)调用leaseToCancel.cancel()
,设置evictionTimestamp
驱逐时间为当前时间戳
(4)把这个被移除的实例放入recentlyChangedQueue
最新变动队列中,那么下次客户端增量拉取的时候就可以拉取到,然后在本地进行清理,就不会调用到这个下线的实例上面去了
(5)最后调用invalidateCache()
方法过期RW缓存中的数据,因为这个实例已经被清理了
protected boolean internalCancel(String appName, String id, boolean isReplication) {
read.lock();
try {
CANCEL.increment(isReplication);
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToCancel = null;
if (gMap != null) {
//清理服务实例,从Map数据结构中移除
leaseToCancel = gMap.remove(id);
}
recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
if (leaseToCancel == null) {
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
leaseToCancel.cancel();
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
instanceInfo.setActionType(ActionType.DELETED);
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
}
} finally {
read.unlock();
}
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to cancel it, reduce the number of clients to send renews.
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
updateRenewsPerMinThreshold();
}
}
return true;
}
/**
* Cancels the lease by updating the eviction time.
*/
public void cancel() {
if (evictionTimestamp <= 0) {
evictionTimestamp = System.currentTimeMillis();
}
}
过期RW缓存的代码单独贴出来,最终会把这个服务对应的缓存数据清理,不过看代码应该是把这个RW缓存整个清空了,感兴趣的话可以打个断点看看
@Override
public void invalidate(String appName, @Nullable String vipAddress, @Nullable String secureVipAddress) {
for (Key.KeyType type : Key.KeyType.values()) {
for (Version v : Version.values()) {
invalidate(
new Key(Key.EntityType.Application, appName, type, v, EurekaAccept.full),
new Key(Key.EntityType.Application, appName, type, v, EurekaAccept.compact),
new Key(Key.EntityType.Application, ALL_APPS, type, v, EurekaAccept.full),
new Key(Key.EntityType.Application, ALL_APPS, type, v, EurekaAccept.compact),
new Key(Key.EntityType.Application, ALL_APPS_DELTA, type, v, EurekaAccept.full),
new Key(Key.EntityType.Application, ALL_APPS_DELTA, type, v, EurekaAccept.compact)
);
if (null != vipAddress) {
invalidate(new Key(Key.EntityType.VIP, vipAddress, type, v, EurekaAccept.full));
}
if (null != secureVipAddress) {
invalidate(new Key(Key.EntityType.SVIP, secureVipAddress, type, v, EurekaAccept.full));
}
}
}
}
public void invalidate(Key... keys) {
for (Key key : keys) {
logger.debug("Invalidating the response cache key : {} {} {} {}, {}",
key.getEntityType(), key.getName(), key.getVersion(), key.getType(), key.getEurekaAccept());
readWriteCacheMap.invalidate(key);
Collection<Key> keysWithRegions = regionSpecificKeys.get(key);
if (null != keysWithRegions && !keysWithRegions.isEmpty()) {
for (Key keysWithRegion : keysWithRegions) {
logger.debug("Invalidating the response cache key : {} {} {} {} {}",
key.getEntityType(), key.getName(), key.getVersion(), key.getType(), key.getEurekaAccept());
readWriteCacheMap.invalidate(keysWithRegion);
}
}
}
}
readWriteCacheMap.invalidate(key);
下次eureka client来增量拉取服务注册表的时候,会发现readOnlyCacheMap
里没有,会找readWriteCacheMap
也会发现没有,然后就会从注册表里抓取增量注册表,此时就会将上面那个recentlyChangedQueue
中的记录返回。
其实在eureka server将本地服务下线之后,也会向集群中其他的server节点去同步服务下线的这么一个事情,通知其他节点将此服务下线,那么下次client来拉取注册表的时候,无论从哪个server去拉取最新的注册表,都没有了这个已经下线的服务实例,保证了集群数据的一致性;
这块代码最终会调用到HttpReplicationClient.cancel()
方法去通知集群其他节点执行操作,感兴趣的可以去跟一下,这里就不赘述了。
@Override
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
if (super.cancel(appName, id, isReplication)) {
//同步服务实例下线通知到集群其他节点上面
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}