Eureka核心源码解析(二):Eureka Server自我保护机制、失效剔除、Eureka Server集群复制

本文主要来解析Eureka Server自我保护机制、失效剔除和Eureka Server集群复制的核心源码,基于1.9.8版本

四、Eureka Server自我保护机制

1、什么是自我保护机制

当Eureka Server节点在短时间内丢失过多客户端时(可能发生了网络分区故障,服务实例与Eureka Server之间无法正常通信),那么这个节点就会进入自我保护模式(eureka.server.enable-self-preservation=true,默认开启自我保护模式)。一旦进入该模式,Eureka Server就会保护服务注册表中的信息,不再删除服务注册表中的数据(也就是不会注销任何微服务)。当网络故障恢复后,该Eureka Server节点会自动退出自我保护模式

2、自我保护机制实现

1)、开启条件

在这里插入图片描述

Renews threshold:Eureka Server期望每分钟收到客户端实例续约的阈值

Renews (last min):Eureka Server最后1分钟收到客户端实例续约的总数

自我保护模式开启的条件是:1分钟后,若Renews (last min) < Renews threshold,那么开启自我保护机制

2)、计算公式
public abstract class AbstractInstanceRegistry implements InstanceRegistry {

    protected void updateRenewsPerMinThreshold() {
        this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
                * (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
                * serverConfig.getRenewalPercentThreshold());
    }
  • numberOfRenewsPerMinThreshold就是Dashboard中的Renews threshold
  • expectedNumberOfClientsSendingRenews期望收到客户端续约的总数(实际为服务实例的总数)
  • getExpectedClientRenewalIntervalSeconds()获取客户端续约间隔(秒为单位)的方法,(默认30s)
  • getRenewalPercentThreshold()获取自我保护续约百分比阈值因子(默认85%)

那么:

  • Renews threshold = 服务实例总数 * (60 / 续约间隔) * 自我保护续约百分比阈值因子
  • Renews (last min) = 服务实例总数 * (60 / 续约间隔)
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {

    @Override
    public boolean isLeaseExpirationEnabled() {
        if (!isSelfPreservationModeEnabled()) {
            // The self preservation mode is disabled, hence allowing the instances to expire.
            return true;
        }
        return numberOfRenewsPerMinThreshold > 0 && getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
    }

isLeaseExpirationEnabled()是Eureka Server失效剔除时调用,判断是否需要清理。如果自我保护模式没开启,那就可以清理。如果自我保护模式开启了,且当续约阈值 > 0,上一分钟的续约数 > 阈值,那么可以清理;当上一分钟续约数 < 阈值,那么就不清理

3)、Renews threshold更新时机
1)应用实例注册
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
  
    public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
        // ...省略其他代码
                synchronized (lock) {
                    if (this.expectedNumberOfClientsSendingRenews > 0) {
                        // Since the client wants to register it, increase the number of clients sending renews
                        this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
                        updateRenewsPerMinThreshold();
                    }
                }
                logger.debug("No previous lease information found; it is new registration");
            }
        // ...省略其他代码
    }

当有应用实例注册时,expectedNumberOfClientsSendingRenews会增加,然后触发updateRenewsPerMinThreshold()更新Renews threshold

2)应用实例下线
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {

    public boolean cancel(final String appName, final String id,
                          final boolean isReplication) {
        // ...省略其他代码
            synchronized (lock) {
                if (this.expectedNumberOfClientsSendingRenews > 0) {
                    // Since the client wants to cancel it, reduce the number of clients to send renews
                    this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
                    updateRenewsPerMinThreshold();
                }
            }
        // ...省略其他代码
    }

当有应用实例下线时,expectedNumberOfClientsSendingRenews会减少,然后触发updateRenewsPerMinThreshold()更新Renews threshold

3)定时重置(默认15分钟)
public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {

    private void scheduleRenewalThresholdUpdateTask() {
        timer.schedule(new TimerTask() {
                           @Override
                           public void run() {
                               updateRenewalThreshold();
                           }
                       }, serverConfig.getRenewalThresholdUpdateIntervalMs(),
                serverConfig.getRenewalThresholdUpdateIntervalMs());
    }
  
    private void updateRenewalThreshold() {
        try {
          	// 计算应用实例数
            Applications apps = eurekaClient.getApplications();
            int count = 0;
            for (Application app : apps.getRegisteredApplications()) {
                for (InstanceInfo instance : app.getInstances()) {
                    if (this.isRegisterable(instance)) {
                        ++count;
                    }
                }
            }
            synchronized (lock) {
                // Update threshold only if the threshold is greater than the
                // current expected threshold or if self preservation is disabled.
              	// 重新计算expectedNumberOfClientsSendingRenews和numberOfRenewsPerMinThreshold
                if ((count) > (serverConfig.getRenewalPercentThreshold() * expectedNumberOfClientsSendingRenews)
                        || (!this.isSelfPreservationModeEnabled())) {
                    this.expectedNumberOfClientsSendingRenews = count;
                    updateRenewsPerMinThreshold();
                }
            }
            logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
        } catch (Throwable e) {
            logger.error("Cannot update renewal threshold", e);
        }
    }  

五、应用实例失效剔除

应用实例失效剔除核心流程如下图:

在这里插入图片描述

1、为什么需要失效剔除

正常情况下,应用实例下线时候会主动向Eureka Server发起下线请求。但实际情况下,应用实例可能异常崩溃,又或者是网络异常等原因,导致下线请求无法被成功提交

介于这种情况,通过Eureka Client心跳延长租约,配合Eureka Server清理超时的租约解决上述异常

2、EvictionTask

com.netflix.eureka.registry.AbstractInstanceRegistry.EvictionTask清理租约过期任务。在Eureka Server启动时,初始化EvictionTask定时执行,实现代码如下:

public abstract class AbstractInstanceRegistry implements InstanceRegistry {

    protected void postInit() {
        renewsLastMin.start();
        if (evictionTaskRef.get() != null) {
            evictionTaskRef.get().cancel();
        }
        // 初始化清理租约过期任务
        evictionTaskRef.set(new EvictionTask());
        evictionTimer.schedule(evictionTaskRef.get(),
                serverConfig.getEvictionIntervalTimerInMs(),
                serverConfig.getEvictionIntervalTimerInMs());
    }

eureka.evictionIntervalTimerInMs清理租约过期任务执行频率,默认1分钟

EvictionTask实现代码如下:

public abstract class AbstractInstanceRegistry implements InstanceRegistry {

    class EvictionTask extends TimerTask {

        private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);

        @Override
        public void run() {
            try {
                // 获取补偿时间毫秒数(当前时间-最后任务执行时间-任务执行频率)
                long compensationTimeMs = getCompensationTimeMs();
                logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
                // 清理过期租约逻辑
                evict(compensationTimeMs);
            } catch (Throwable e) {
                logger.error("Could not run the evict task", e);
            }
        }

3、失效剔除逻辑

调用AbstractInstanceRegistry的 evict(long additionalLeaseMs) 方法,执行清理过期租约逻辑,实现代码如下:

public abstract class AbstractInstanceRegistry implements InstanceRegistry {

    public void evict(long additionalLeaseMs) {
        logger.debug("Running the evict task");

        if (!isLeaseExpirationEnabled()) {
            logger.debug("DS: lease expiration is currently disabled.");
            return;
        }

        // 获得所有过期的租约
        // We collect first all expired items, to evict them in random order. For large eviction sets,
        // if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
        // the impact should be evenly distributed across all applications.
        List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
        for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
            Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
            if (leaseMap != null) {
                for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
                    Lease<InstanceInfo> lease = leaseEntry.getValue();
                  	// 1)
                    if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
                        expiredLeases.add(lease);
                    }
                }
            }
        }

        // 计算最大允许清理租约数量
        // To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
        // triggering self-preservation. Without that we would wipe out full registry.
        int registrySize = (int) getLocalRegistrySize();
        int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
        int evictionLimit = registrySize - registrySizeThreshold;

        // 计算清理租约数量
        int toEvict = Math.min(expiredLeases.size(), evictionLimit);
        if (toEvict > 0) {
            logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);

            // 逐个过期
            Random random = new Random(System.currentTimeMillis());
            for (int i = 0; i < toEvict; i++) {
                // Pick a random item (Knuth shuffle algorithm)
                int next = i + random.nextInt(expiredLeases.size() - i);
                Collections.swap(expiredLeases, i, next);
                Lease<InstanceInfo> lease = expiredLeases.get(i);

                String appName = lease.getHolder().getAppName();
                String id = lease.getHolder().getId();
                EXPIRED.increment();
                logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
                // 下线已过期的租约
                internalCancel(appName, id, false);
            }
        }
    }

代码1)处调用Lease的isExpired(long additionalLeaseMs)方法,判断租约是否过期

public class Lease<T> {

    public boolean isExpired(long additionalLeaseMs) {
        return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
    }
  
    public void renew() {
        lastUpdateTimestamp = System.currentTimeMillis() + duration;
    }  

在不考虑参数additionalLeaseMs的情况下,租约过期时间比预期多了一个duration,原因在于续约renew()方法错误的设置lastUpdateTimestamp = System.currentTimeMillis() + duration,正确的设置应该是lastUpdateTimestamp = System.currentTimeMillis()

六、Eureka Server集群复制

1、概述

Eureka Server集群、服务提供者及服务消费者架构图如下:

在这里插入图片描述

  • Eureka Server集群所有节点相同角色,完全对等
  • Eureka Client可以向任意Eureka Server节点发起注册、续约、下线等操作,该节点将操作复制到另外的Eureka Server节点以达到最终一致性
  • 启动服务消费者的时候,Eureka Client会发送一个REST请求给任意Eureka Server节点,获取上面注册的服务列表,并将其缓存下来,Eureka Client会定期刷新缓存的服务列表
  • 服务消费者在获取服务列表后,通过服务名可以获得具体提供服务的实例名和该实例的元数据信息。在Ribbon中会默认采用轮询的方式进行调用,从而实现客户端的负载均衡

2、获取初始注册信息

Eureka Server启动时,会调用PeerAwareInstanceRegistryImpl的syncUp()方法,从集群的一个Eureka Server节点获取初始注册信息,代码如下:

public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {

    @Override
    public int syncUp() {
        // Copy entire entry from neighboring DS node
        int count = 0;

        for (int i = 0; ((i < serverConfig.getRegistrySyncRetries()) && (count == 0)); i++) {
            // 重试过程中,sleep等待一段时间
            if (i > 0) {
                try {
                    Thread.sleep(serverConfig.getRegistrySyncRetryWaitMs());
                } catch (InterruptedException e) {
                    logger.warn("Interrupted during registry transfer..");
                    break;
                }
            }
            // 获取初始注册信息
            Applications apps = eurekaClient.getApplications();
            for (Application app : apps.getRegisteredApplications()) {
                for (InstanceInfo instance : app.getInstances()) {
                    try {
                        if (isRegisterable(instance)) {
                            register(instance, instance.getLeaseInfo().getDurationInSecs(), true);
                            count++;
                        }
                    } catch (Throwable t) {
                        logger.error("During DS init copy", t);
                    }
                }
            }
        }
        return count;
    }

3、同步注册信息

Eureka Server接收到Eureka Client的注册、续约、下线等操作,固定间隔(默认,500毫秒)向Eureka Server集群内其他节点同步

1)、发起Eureka Server同步操作

以注册操作为例,代码如下:

public class PeerAwareInstanceRegistryImpl extends AbstractInstanceRegistry implements PeerAwareInstanceRegistry {
  
    @Override
    public void register(final InstanceInfo info, final boolean isReplication) {
        // 租约过期时间
        int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
        if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
            leaseDuration = info.getLeaseInfo().getDurationInSecs();
        }
        // 注册应用实例信息
        super.register(info, leaseDuration, isReplication);
        // Eureka Server复制
        replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
    }
  
    private void replicateToPeers(Action action, String appName, String id,
                                  InstanceInfo info /* optional */,
                                  InstanceStatus newStatus /* optional */, boolean isReplication) {
        Stopwatch tracer = action.getTimer().start();
        try {
            if (isReplication) {
                numberOfReplicationsLastMin.increment();
            }
            // 1)Eureka Server发起的请求或者集群为空
            // If it is a replication already, do not replicate again as this will create a poison replication
            if (peerEurekaNodes == Collections.EMPTY_LIST || isReplication) {
                return;
            }

            // 循环集群内每个节点,调用replicateInstanceActionsToPeers
            for (final PeerEurekaNode node : peerEurekaNodes.getPeerEurekaNodes()) {
                // If the url represents this host, do not replicate to yourself.
                if (peerEurekaNodes.isThisMyUrl(node.getServiceUrl())) {
                    continue;
                }
                replicateInstanceActionsToPeers(action, appName, id, info, newStatus, node);
            }
        } finally {
            tracer.stop();
        }
    }  
  
    private void replicateInstanceActionsToPeers(Action action, String appName,
                                                 String id, InstanceInfo info, InstanceStatus newStatus,
                                                 PeerEurekaNode node) {
        try {
            InstanceInfo infoFromRegistry = null;
            CurrentRequestVersion.set(Version.V2);
          	// 根据操作类型,调用PeerEurekaNode的对应方法
            switch (action) {
                case Cancel:
                    node.cancel(appName, id);
                    break;
                case Heartbeat:
                    InstanceStatus overriddenStatus = overriddenInstanceStatusMap.get(id);
                    infoFromRegistry = getInstanceByAppAndId(appName, id, false);
                    node.heartbeat(appName, id, infoFromRegistry, overriddenStatus, false);
                    break;
                case Register:
                    node.register(info);
                    break;
                case StatusUpdate:
                    infoFromRegistry = getInstanceByAppAndId(appName, id, false);
                    node.statusUpdate(appName, id, newStatus, infoFromRegistry);
                    break;
                case DeleteStatusOverride:
                    infoFromRegistry = getInstanceByAppAndId(appName, id, false);
                    node.deleteStatusOverride(appName, id, infoFromRegistry);
                    break;
            }
        } catch (Throwable t) {
            logger.error("Cannot replicate information to {} for action {}", node.getServiceUrl(), action.name(), t);
        }
    }  

代码1)处判断了isReplication的值,该值是来源于Request Header的x-netflix-discovery-replication,Eureka Client的注册请求isReplication为false,接收注册请求的Eureka Server节点会将该注册信息同步到其他Eureka Server节点,同步请求的isReplication为true,表示该注册信息是由其他Eureka Server节点复制过来的,这时候就不会继续往下传递了,避免了复制死循环的问题

public class PeerEurekaNode {
  
    public void register(final InstanceInfo info) throws Exception {
        long expiryTime = System.currentTimeMillis() + getLeaseRenewalOf(info);
        batchingDispatcher.process(
                // 生成任务编号 相同应用实例的相同同步操作使用相同任务编号
                taskId("register", info),
                // 发起注册应用实例
                new InstanceReplicationTask(targetHost, Action.Register, info, null, true) {
                    public EurekaHttpResponse<Void> execute() {
                        return replicationClient.register(info);
                    }
                },
                expiryTime
        );
    }
2)、接收Eureka Server同步操作
@Path("/{version}/peerreplication")
@Produces({"application/xml", "application/json"})
public class PeerReplicationResource {
  
    @Path("batch")
    @POST
    public Response batchReplication(ReplicationList replicationList) {
        try {
            ReplicationListResponse batchResponse = new ReplicationListResponse();
            // 逐个同步操作任务处理,并将处理结果合并到ReplicationListResponse
            for (ReplicationInstance instanceInfo : replicationList.getReplicationList()) {
                try {
                    batchResponse.addResponse(dispatch(instanceInfo));
                } catch (Exception e) {
                    batchResponse.addResponse(new ReplicationInstanceResponse(Status.INTERNAL_SERVER_ERROR.getStatusCode(), null));
                    logger.error("{} request processing failed for batch item {}/{}",
                            instanceInfo.getAction(), instanceInfo.getAppName(), instanceInfo.getId(), e);
                }
            }
            return Response.ok(batchResponse).build();
        } catch (Throwable e) {
            logger.error("Cannot execute batch Request", e);
            return Response.status(Status.INTERNAL_SERVER_ERROR).build();
        }
    }
  
    private ReplicationInstanceResponse dispatch(ReplicationInstance instanceInfo) {
        ApplicationResource applicationResource = createApplicationResource(instanceInfo);
        InstanceResource resource = createInstanceResource(instanceInfo, applicationResource);

        String lastDirtyTimestamp = toString(instanceInfo.getLastDirtyTimestamp());
        String overriddenStatus = toString(instanceInfo.getOverriddenStatus());
        String instanceStatus = toString(instanceInfo.getStatus());

        Builder singleResponseBuilder = new Builder();
        switch (instanceInfo.getAction()) {
            case Register:
                singleResponseBuilder = handleRegister(instanceInfo, applicationResource);
                break;
            case Heartbeat:
                singleResponseBuilder = handleHeartbeat(serverConfig, resource, lastDirtyTimestamp, overriddenStatus, instanceStatus);
                break;
            case Cancel:
                singleResponseBuilder = handleCancel(resource);
                break;
            case StatusUpdate:
                singleResponseBuilder = handleStatusUpdate(instanceInfo, resource);
                break;
            case DeleteStatusOverride:
                singleResponseBuilder = handleDeleteStatusOverride(instanceInfo, resource);
                break;
        }
        return singleResponseBuilder.build();
    }  

dispatch()方法是把单个同步操作任务提交到其他Resource处理,和Eureka Server收到Eureka Client请求响应的Resource是相同的逻辑,只是isReplication值固定为true

参考:

Eureka 源码解析 —— 应用实例注册发现(四)之自我保护机制

Eureka 源码解析 —— 应用实例注册发现(五)之过期

Eureka 源码解析 —— Eureka-Server 集群同步

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

邋遢的流浪剑客

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值