Eureka源码深度刨析-(8)Eureka心跳续约与实例摘除机制

“不积跬步,无以至千里。”

这篇文章来看看Eureka中两个重要的核心机制:心跳续约实例摘除

心跳续约

eureka client每隔一定时间,向server发送心跳,让eureka server知道自己还活着,还可以继续对外提供服务,简言之,就是如此。

这个心跳是通过一个调度线程池实现的,根据之前的经验,跟client有关的功能,找DiscoveryClient这个类,准没错,

把eureka源码读完之后,会发现,DiscoveryClient里面近2000行代码,几乎涵盖了client端的所以功能。

找个initScheduledTasks(),可以很明显的看到跟心跳机制有关的代码

// Heartbeat timer
heartbeatTask = new TimedSupervisorTask(
    "heartbeat",
    scheduler,
    heartbeatExecutor,
    renewalIntervalInSecs,
    TimeUnit.SECONDS,
    expBackOffBound,
    new HeartbeatThread()
);
scheduler.schedule(
    heartbeatTask,
    //public static final int DEFAULT_LEASE_RENEWAL_INTERVAL = 30;
    //这个renewalIntervalInSecs默认值是30
    renewalIntervalInSecs, TimeUnit.SECONDS);

通过一个HeartbeatThread线程类,放到一个调度线程池里,每隔30s执行一次HeartbeatThread线程的逻辑,发送心跳

private class HeartbeatThread implements Runnable {

    public void run() {
        if (renew()) {
            lastSuccessfulHeartbeatTimestamp = System.currentTimeMillis();
        }
    }
}

renew()方法中,走的是EurekaHttpClient的sendHeartbeat()方法,

 /**
   * Renew with the eureka service by making the appropriate REST call
   */
boolean renew() {
    EurekaHttpResponse<InstanceInfo> httpResponse;
    try {
        //发送心跳
        httpResponse = eurekaTransport.registrationClient.sendHeartBeat(instanceInfo.getAppName(), instanceInfo.getId(), instanceInfo, null);
        logger.debug(PREFIX + "{} - Heartbeat status: {}", appPathIdentifier, httpResponse.getStatusCode());
        if (httpResponse.getStatusCode() == Status.NOT_FOUND.getStatusCode()) {
            REREGISTER_COUNTER.increment();
            logger.info(PREFIX + "{} - Re-registering apps/{}", appPathIdentifier, instanceInfo.getAppName());
            long timestamp = instanceInfo.setIsDirtyWithTime();
            boolean success = register();
            if (success) {
                instanceInfo.unsetIsDirty(timestamp);
            }
            return success;
        }
        return httpResponse.getStatusCode() == Status.OK.getStatusCode();
    } catch (Throwable e) {
        logger.error(PREFIX + "{} - was unable to send heartbeat!", appPathIdentifier, e);
        return false;
    }
}

完成请求参数的拼接,发送的url类似:http://localhost:8080/v2/apps/ServiceA/i-000000-1,走的是PUT请求

@Override
public EurekaHttpResponse<InstanceInfo> sendHeartBeat(String appName, String id, InstanceInfo info, InstanceStatus overriddenStatus) {
    String urlPath = "apps/" + appName + '/' + id;
    Response response = null;
    try {
        WebTarget webResource = jerseyClient.target(serviceUrl)
            .path(urlPath)
            .queryParam("status", info.getStatus().toString())
            .queryParam("lastDirtyTimestamp", info.getLastDirtyTimestamp().toString());
        if (overriddenStatus != null) {
            webResource = webResource.queryParam("overriddenstatus", overriddenStatus.name());
        }
        Builder requestBuilder = webResource.request();
        addExtraProperties(requestBuilder);
        addExtraHeaders(requestBuilder);
        requestBuilder.accept(MediaType.APPLICATION_JSON_TYPE);
        //.put(...) 可以发现心跳续约走的是put请求
        response = requestBuilder.put(Entity.entity("{}", MediaType.APPLICATION_JSON_TYPE)); // Jersey2 refuses to handle PUT with no body
        EurekaHttpResponseBuilder<InstanceInfo> eurekaResponseBuilder = anEurekaHttpResponse(response.getStatus(), InstanceInfo.class).headers(headersOf(response));
        if (response.hasEntity()) {
            eurekaResponseBuilder.entity(response.readEntity(InstanceInfo.class));
        }
        return eurekaResponseBuilder.build();
        ... ...
    }

服务端处理心跳的Resource是InstanceResource

根据路径匹配,找到renewLease()方法

@PUT
public Response renewLease(
    @HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication,
    @QueryParam("overriddenstatus") String overriddenStatus,
    @QueryParam("status") String status,
    @QueryParam("lastDirtyTimestamp") String lastDirtyTimestamp) {
    boolean isFromReplicaNode = "true".equals(isReplication);
    boolean isSuccess = registry.renew(app.getName(), id, isFromReplicaNode);

    // Not found in the registry, immediately ask for a register
    if (!isSuccess) {
        logger.warn("Not Found (Renew): {} - {}", app.getName(), id);
        return Response.status(Status.NOT_FOUND).build();
    }
    // Check if we need to sync based on dirty time stamp, the client
    // instance might have changed some value
    Response response;
    if (lastDirtyTimestamp != null && serverConfig.shouldSyncWhenTimestampDiffers()) {
        response = this.validateDirtyTimestamp(Long.valueOf(lastDirtyTimestamp), isFromReplicaNode);
        // Store the overridden status since the validation found out the node that replicates wins
        if (response.getStatus() == Response.Status.NOT_FOUND.getStatusCode()
            && (overriddenStatus != null)
            && !(InstanceStatus.UNKNOWN.name().equals(overriddenStatus))
            && isFromReplicaNode) {
            registry.storeOverriddenStatusIfRequired(app.getAppName(), id, InstanceStatus.valueOf(overriddenStatus));
        }
    } else {
        response = Response.ok().build();
    }
    logger.debug("Found (Renew): {} - {}; reply status={}", app.getName(), id, response.getStatus());
    return response;
}

发现server端处理心跳续约的是registry的renew(...)方法,服务注册相关的功能,使用服务注册表来处理也算正常

这个PeerAwareInstanceRegistry我提醒你注意,跟之前提过的DiscoveryClient类似,也是核心类,server端的很多核心机制,都在这个注册表的类里面,要善于抓住核心。

调用了父类的renew方法

public boolean renew(final String appName, final String id, final boolean isReplication) {
    if (super.renew(appName, id, isReplication)) {
        replicateToPeers(Action.Heartbeat, appName, id, null, null, isReplication);
        return true;
    }
    return false;
}

这个方法的核心逻辑是根据服务名称和实例Id,拿到一个Lease<InstanceInfo>,然后调用renew()方法

这么大一坨代码,有用的就下面那一行,leaseToRenew.renew();

public boolean renew(String appName, String id, boolean isReplication) {
    RENEW.increment(isReplication);
    Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
    Lease<InstanceInfo> leaseToRenew = null;
    if (gMap != null) {
        leaseToRenew = gMap.get(id);
    }
    if (leaseToRenew == null) {
        RENEW_NOT_FOUND.increment(isReplication);
        logger.warn("DS: Registry: lease doesn't exist, registering resource: {} - {}", appName, id);
        return false;
    } else {
        InstanceInfo instanceInfo = leaseToRenew.getHolder();
        if (instanceInfo != null) {
            // touchASGCache(instanceInfo.getASGName());
            InstanceStatus overriddenInstanceStatus = this.getOverriddenInstanceStatus(
                instanceInfo, leaseToRenew, isReplication);
            if (overriddenInstanceStatus == InstanceStatus.UNKNOWN) {
                logger.info("Instance status UNKNOWN possibly due to deleted override for instance {}"
                            + "; re-register required", instanceInfo.getId());
                RENEW_NOT_FOUND.increment(isReplication);
                return false;
            }
            if (!instanceInfo.getStatus().equals(overriddenInstanceStatus)) {
                logger.info(
                    "The instance status {} is different from overridden instance status {} for instance {}. "
                    + "Hence setting the status to overridden status", instanceInfo.getStatus().name(),
                    overriddenInstanceStatus.name(),
                    instanceInfo.getId());
                instanceInfo.setStatusWithoutDirty(overriddenInstanceStatus);

            }
        }
        renewsLastMin.increment();
        leaseToRenew.renew();
        return true;
    }
}

调用renew()方法,更新了一下lastUpdateTimestamp这个变量的时间戳

总结一下,心跳续约就是更新了服务实例的上次更新的时间戳为当前时间戳

public void renew() {
    //这个duration是服务注册的补偿时间,默认是90s
    //public static final int DEFAULT_DURATION_IN_SECS = 90;
    lastUpdateTimestamp = System.currentTimeMillis() + duration;

}

服务注册时,lastUpdateTimestamp取的就是当前时间戳

registrant.setLastUpdatedTimestamp();
public void setLastUpdatedTimestamp() {
    this.lastUpdatedTimestamp = System.currentTimeMillis();
}

在一个分布式系统里面,心跳机制,是很重要的,可以让一个中枢控制的服务,监控所有其他的工作服务是否还活着,这个所以是一个心跳机制,就是每次更新心跳,就更新最近的一次时间戳就可以了。

eureka心跳机制

实例摘除

比如现在线上有一个注册中心,还有很多个服务,在线上跑着,各个服务都会每隔一段时间发送一次心跳,但如果某个服务现在要停机,或者是重启,首先就会关闭,此时需要你自己去调用eurekaClient的shutdown(),将服务实例停止,所以说呢,我们重点就是从eurekaClient的shutdown()方法开始入手来看。

比如说你如果eureka client也是跟着一个web容器来启动的,ContextListener,里面有一个contextDestroyed(),在这个方法里,你就调用eureka client的shutdown()方法就可以了。

来规矩,调用的是DisvoveryClient的shutdown()方法

/**
   * Shuts down Eureka Client. Also sends a deregistration request to the
   * eureka server.
   */
@PreDestroy
@Override
public synchronized void shutdown() {
    if (isShutdown.compareAndSet(false, true)) {
        logger.info("Shutting down DiscoveryClient ...");

        if (statusChangeListener != null && applicationInfoManager != null) {
            applicationInfoManager.unregisterStatusChangeListener(statusChangeListener.getId());
        }

        cancelScheduledTasks();

        // If APPINFO was registered
        if (applicationInfoManager != null
            && clientConfig.shouldRegisterWithEureka()
            && clientConfig.shouldUnregisterOnShutdown()) {
            applicationInfoManager.setInstanceStatus(InstanceStatus.DOWN);
            unregister();
        }

        if (eurekaTransport != null) {
            eurekaTransport.shutdown();
        }

        heartbeatStalenessMonitor.shutdown();
        registryStalenessMonitor.shutdown();

        Monitors.unregisterObject(this);

        logger.info("Completed shut down of DiscoveryClient");
    }
}

然后调用unregister()来取消服务实例的注册

调用EurekaHttpClient的cancel()方法,http://localhost:8080/v2/apps/ServiceA/i-00000-1,发送的是DELETE请求

void unregister() {
    // It can be null if shouldRegisterWithEureka == false
    if(eurekaTransport != null && eurekaTransport.registrationClient != null) {
        try {
            logger.info("Unregistering ...");
            EurekaHttpResponse<Void> httpResponse = eurekaTransport.registrationClient.cancel(instanceInfo.getAppName(), instanceInfo.getId());
            logger.info(PREFIX + "{} - deregister  status: {}", appPathIdentifier, httpResponse.getStatusCode());
        } catch (Exception e) {
            logger.error(PREFIX + "{} - de-registration failed{}", appPathIdentifier, e.getMessage(), e);
        }
    }
}
@Override
public EurekaHttpResponse<Void> cancel(String appName, String id) {
    String urlPath = "apps/" + appName + '/' + id;
    Response response = null;
    try {
        Builder resourceBuilder = jerseyClient.target(serviceUrl).path(urlPath).request();
        addExtraProperties(resourceBuilder);
        addExtraHeaders(resourceBuilder);
        //发送Delete请求
        response = resourceBuilder.delete();
        return anEurekaHttpResponse(response.getStatus()).headers(headersOf(response)).build();
    } finally {
        if (logger.isDebugEnabled()) {
            logger.debug("Jersey2 HTTP DELETE {}/{}; statusCode={}", serviceUrl, urlPath, response == null ? "N/A" : response.getStatus());
        }
        if (response != null) {
            response.close();
        }
    }
}

根据路径匹配规则,最终会调用server端,由InstanceResource组件的cancelLease()方法来处理服务下线的请求

@DELETE
public Response cancelLease(
    @HeaderParam(PeerEurekaNode.HEADER_REPLICATION) String isReplication) {
    try {
        boolean isSuccess = registry.cancel(app.getName(), id,
                                            "true".equals(isReplication));

        if (isSuccess) {
            logger.debug("Found (Cancel): {} - {}", app.getName(), id);
            return Response.ok().build();
        } else {
            logger.info("Not Found (Cancel): {} - {}", app.getName(), id);
            return Response.status(Status.NOT_FOUND).build();
        }
    } catch (Throwable e) {
        logger.error("Error (cancel): {} - {}", app.getName(), id, e);
        return Response.serverError().build();
    }

}

走到registry的cancel方法中

先在本地cancel清理掉这个注册的服务实例,然后调用replicateToPeers方法同步给集群其他的节点,通知它们把该服务实例进行清理

@Override
public boolean cancel(final String appName, final String id,
                      final boolean isReplication) {
    if (super.cancel(appName, id, isReplication)) {
        replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);

        return true;
    }
    return false;
}

服务下线的逻辑比较简单,

(1)就是将服务实例从eureka server的map结构的注册表中移除掉,gMap.remove(id)

(2)把这个被移除的服务放入了一个recentCanceledQueue队列中,看字面意思就是最近被移除这么个队列

(3)调用leaseToCancel.cancel(),设置evictionTimestamp驱逐时间为当前时间戳

(4)把这个被移除的实例放入recentlyChangedQueue最新变动队列中,那么下次客户端增量拉取的时候就可以拉取到,然后在本地进行清理,就不会调用到这个下线的实例上面去了

(5)最后调用invalidateCache()方法过期RW缓存中的数据,因为这个实例已经被清理了

protected boolean internalCancel(String appName, String id, boolean isReplication) {
    read.lock();
    try {
        CANCEL.increment(isReplication);
        Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
        Lease<InstanceInfo> leaseToCancel = null;
        if (gMap != null) {
            //清理服务实例,从Map数据结构中移除
            leaseToCancel = gMap.remove(id);
        }
        recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
        InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
        if (instanceStatus != null) {
            logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
        }
        if (leaseToCancel == null) {
            CANCEL_NOT_FOUND.increment(isReplication);
            logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
            return false;
        } else {
            leaseToCancel.cancel();
            InstanceInfo instanceInfo = leaseToCancel.getHolder();
            String vip = null;
            String svip = null;
            if (instanceInfo != null) {
                instanceInfo.setActionType(ActionType.DELETED);
                recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
                instanceInfo.setLastUpdatedTimestamp();
                vip = instanceInfo.getVIPAddress();
                svip = instanceInfo.getSecureVipAddress();
            }
            invalidateCache(appName, vip, svip);
            logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
        }
    } finally {
        read.unlock();
    }

    synchronized (lock) {
        if (this.expectedNumberOfClientsSendingRenews > 0) {
            // Since the client wants to cancel it, reduce the number of clients to send renews.
            this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
            updateRenewsPerMinThreshold();
        }
    }

    return true;
}
/**
 * Cancels the lease by updating the eviction time.
 */
public void cancel() {
    if (evictionTimestamp <= 0) {
        evictionTimestamp = System.currentTimeMillis();
    }
}

过期RW缓存的代码单独贴出来,最终会把这个服务对应的缓存数据清理,不过看代码应该是把这个RW缓存整个清空了,感兴趣的话可以打个断点看看

@Override
public void invalidate(String appName, @Nullable String vipAddress, @Nullable String secureVipAddress) {
    for (Key.KeyType type : Key.KeyType.values()) {
        for (Version v : Version.values()) {
            invalidate(
                new Key(Key.EntityType.Application, appName, type, v, EurekaAccept.full),
                new Key(Key.EntityType.Application, appName, type, v, EurekaAccept.compact),
                new Key(Key.EntityType.Application, ALL_APPS, type, v, EurekaAccept.full),
                new Key(Key.EntityType.Application, ALL_APPS, type, v, EurekaAccept.compact),
                new Key(Key.EntityType.Application, ALL_APPS_DELTA, type, v, EurekaAccept.full),
                new Key(Key.EntityType.Application, ALL_APPS_DELTA, type, v, EurekaAccept.compact)
            );
            if (null != vipAddress) {
                invalidate(new Key(Key.EntityType.VIP, vipAddress, type, v, EurekaAccept.full));
            }
            if (null != secureVipAddress) {
                invalidate(new Key(Key.EntityType.SVIP, secureVipAddress, type, v, EurekaAccept.full));
            }
        }
    }
}
public void invalidate(Key... keys) {
    for (Key key : keys) {
        logger.debug("Invalidating the response cache key : {} {} {} {}, {}",
                     key.getEntityType(), key.getName(), key.getVersion(), key.getType(), key.getEurekaAccept());

        readWriteCacheMap.invalidate(key);
        Collection<Key> keysWithRegions = regionSpecificKeys.get(key);
        if (null != keysWithRegions && !keysWithRegions.isEmpty()) {
            for (Key keysWithRegion : keysWithRegions) {
                logger.debug("Invalidating the response cache key : {} {} {} {} {}",
                             key.getEntityType(), key.getName(), key.getVersion(), key.getType(), key.getEurekaAccept());
                readWriteCacheMap.invalidate(keysWithRegion);
            }
        }
    }
}
readWriteCacheMap.invalidate(key);

下次eureka client来增量拉取服务注册表的时候,会发现readOnlyCacheMap里没有,会找readWriteCacheMap也会发现没有,然后就会从注册表里抓取增量注册表,此时就会将上面那个recentlyChangedQueue中的记录返回。

其实在eureka server将本地服务下线之后,也会向集群中其他的server节点去同步服务下线的这么一个事情,通知其他节点将此服务下线,那么下次client来拉取注册表的时候,无论从哪个server去拉取最新的注册表,都没有了这个已经下线的服务实例,保证了集群数据的一致性;
这块代码最终会调用到HttpReplicationClient.cancel()方法去通知集群其他节点执行操作,感兴趣的可以去跟一下,这里就不赘述了。

@Override
public boolean cancel(final String appName, final String id,
                       final boolean isReplication) {
     if (super.cancel(appName, id, isReplication)) {
         //同步服务实例下线通知到集群其他节点上面
         replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);

         return true;
     }
     return false;
 }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值