- 为什么使用注册中心
在之前ribbon, openfeigin的示例代码中,我们服务提供者的地址列表是手动维护的,
# 配置指定服务的提供者的地址列表
order-service.ribbon.listOfServers=\
localhost:8080,localhost:8082
这会带来两个主要问题:
- 服务上线,下线(宕机),调用方不会动态感知
- 服务调用这的维护工作困难
服务注册中心就是来解决这些问题的,它的原理示意图如下:
图中订单服务调用会员服务,还会涉及到服务列表更新(IRule-> serviceList), ribbon客户端负载均衡。
SpringCloud 服务注册中心组件的使用,必须要创建独立的项目,通过idea插件spring initializr工具创建eureka项目:
选择eureka server组件:
然后选择项目存放路径,finish之后,修改springboot版本号为2.3.0
1. 单节点eureka server
添加单注册中心eureka server配置:
2. 开启服务注册
启动并访问服务注册中心:
2.双节点eureka server
再创建一个服务注册中心的项目spring-cloud-eureka-server2
配置如下:
spring-cloud-eureka-server1的配置修改为:
然后分别启动两个注册中心,再次访问http://localhost:9090/ ,可以看到有两个注册中心的实例了。
3. Eureka客户端
对注册中心来讲,其他的服务都属于客户端,作为客户端的服务,需要添加spring-cloud-starter-netflix-eureka-client的依赖
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>
对order-service user-service 的配置文件中都加上
eureka.client.service-url.defaultZone=http://localhost:9090/eureka,http://localhost:9091/eureka
user-service中手动指定的ribbon负载均衡的服务器列表就可以去掉了:
#order-service.ribbon.listOfServers=localhost:8080,localhost:8082
分别启动两个Eureka Server , 2个order-service实例,一个user-service实例,此时注册中心一共注册了5个实例:
3. Eureka自我保护机制
Eureka Server在运行期间会统计心跳失败的比例,如果在15分钟内低于85%,Eureka Server会认为当前实例的客户端与自己的心跳连接出现了网络故障,那么EurekaServer会把这些实例保护起来,让这些实例不会过期导致实例剔除。这样做的目的是为了减少网络不稳定或者网络分区的情况下,Eureka Server将健康服务剔除下线的问题(即服务本身是好的,只是由于网络原因导致Eureka Server检测不到实例的心跳包),使用自我保护机制可以让EurekaServer 集群更加健壮稳定运行
进入自我保护机制后,会出现下面的情况:
- Eureka Server不再从注册中心列表中移除因为长时间没有收到心跳而应该提出的过期服务
- Eureka Server 仍然能够接受新服务的注册和查询请求,但是不会被同步到其他节点上,保证当前节点依然可用
3.1 Eureka自我保护机制演示
#设置 eureka server同步失败的等待时间默认5分, 在这期间,它不向客户端提供服务注册信息
eureka.server.wait-time-in-ms-when-sync-empty=10000
为了演示效果,将eureka server1判定时间改为10s,接着启动Eureka Server,等待10s之后,就会出现以上提示信息,表示自我保护被激活了。
3.1 Eureka自我保护机制原理
Eureka的自我保护机制,都是围绕下面两个变量来实现的,是在com.netflix.eureka.registry.AbstractInstanceRegistry这个类中定义的
// 每分钟的最小续约数量
protected volatile int numberOfRenewsPerMinThreshold;
// 预期每分钟收到的续约的客户端续数量,取决于注册到eureka server上的服务的数量
protected volatile int expectedNumberOfClientsSendingRenews;
numberOfRenewsPerMinThreshold表示每分钟的最小续约数量,它eureka server期望每分钟收到客户端实例续约的总数的阈值,
如果小于这个阈值,就会触发自我保护机制,它是在下面代码中赋值的:
protected void updateRenewsPerMinThreshold() {
this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
* (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
* serverConfig.getRenewalPercentThreshold());
}
- getExpectedClientRenewalIntervalSeconds() 是客户端续约的时间间隔,默认30s
- getRenewalPercentThreshold() 自我保护续约的百分比阈值因子,默认0.85,也就是每分钟续约的数量要大于85%
因此,上面自我保护的阈值=服务总数* 每分钟续约数量(60s/客户端续约间隔)* 自我保护续约的百分比阈值因子,
所以 520.85 取整就是8
上面两个变量是动态变化的,会有四个地方会对这两个变量进行更新
- Eureka-Server的初始化
在EurekaBootstrap这个类中,有一个 initEurekaServerContext 方法
protected void initEurekaServerContext() throws Exception {
EurekaServerConfig eurekaServerConfig = new DefaultEurekaServerConfig();
…………………………
// Copy registry from neighboring eureka node
int registryCount = registry.syncUp();
registry.openForTraffic(applicationInfoManager, registryCount);
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#openForTraffic:
public void openForTraffic(ApplicationInfoManager applicationInfoManager, int count) {
// Renewals happen every 30 seconds and for a minute it should be a factor of 2.
// 初始化
this.expectedNumberOfClientsSendingRenews = count;
// 更新每分钟最小续约数量
updateRenewsPerMinThreshold();
logger.info("Got {} instances from neighboring DS node", count);
logger.info("Renew threshold is: {}", numberOfRenewsPerMinThreshold);
this.startupTime = System.currentTimeMillis();
if (count > 0) {
this.peerInstancesTransferEmptyOnStartup = false;
}
DataCenterInfo.Name selfName = applicationInfoManager.getInfo().getDataCenterInfo().getName();
boolean isAws = Name.Amazon == selfName;
if (isAws && serverConfig.shouldPrimeAwsReplicaConnections()) {
logger.info("Priming AWS connections for all replicas..");
primeAwsReplicas(applicationInfoManager);
}
logger.info("Changing status to UP");
applicationInfoManager.setInstanceStatus(InstanceStatus.UP);
super.postInit();
}
- 服务注册
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#register
当有新的服务提供者注册到eureka-server上时,需要增加续约的客户端数量,所以在register方法中会
进行处理
public void register(final InstanceInfo info, final boolean isReplication) {
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
super.register(info, leaseDuration, isReplication);
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
调用了父类的方法进行处理:com.netflix.eureka.registry.AbstractInstanceRegistry#register
public void register(InstanceInfo registrant, int leaseDuration, boolean
isReplication) {
//....
// The lease does not exist and hence it is a new registration
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews =
this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
}
- 服务下线
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#cancel -> com.netflix.eureka.registry.AbstractInstanceRegistry#cancel ->
com.netflix.eureka.registry.AbstractInstanceRegistry#internalCancel
PeerAwareInstanceRegistryImpl#cancel:
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
if (super.cancel(appName, id, isReplication)) {
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}
AbstractInstanceRegistry#internalCancel:
protected boolean internalCancel(String appName, String id, boolean isReplication) {
try {
read.lock();
CANCEL.increment(isReplication);
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToCancel = null;
if (gMap != null) {
leaseToCancel = gMap.remove(id);
}
recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
if (leaseToCancel == null) {
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
leaseToCancel.cancel();
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
instanceInfo.setActionType(ActionType.DELETED);
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
}
} finally {
read.unlock();
}
// 服务下线之后,意味着需要发送续约的客户端数量递减了,所以在这里进行修改
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to cancel it, reduce the number of clients to send renews.
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
updateRenewsPerMinThreshold();
}
}
return true;
}
- 定时任务
15分钟运行一次,判断在15分钟之内心跳失败比例是否低于85%。它的初始化及调用链如下:
com.netflix.eureka.DefaultEurekaServerContext#initialize 这个方法上面加了一个@PostConstruct注解
当spring加载完成,就会立即调用这个initialize方法
@PostConstruct
@Override
public void initialize() {
logger.info("Initializing ...");
peerEurekaNodes.start();
try {
registry.init(peerEurekaNodes);
} catch (Exception e) {
throw new RuntimeException(e);
}
logger.info("Initialized");
}
registry.init(peerEurekaNodes); 的实现是在com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#init
具体执行处:
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#scheduleRenewalThresholdUpdateTask
private void scheduleRenewalThresholdUpdateTask() {
timer.schedule(new TimerTask() {
@Override
public void run() {
updateRenewalThreshold();
}
}, serverConfig.getRenewalThresholdUpdateIntervalMs(),
serverConfig.getRenewalThresholdUpdateIntervalMs());
}
所以不管是eureka初始化,服务上下线还是定时任务,最终都是调用下面这段逻辑:
private void updateRenewalThreshold() {
try {
Applications apps = eurekaClient.getApplications();
int count = 0;
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
if (this.isRegisterable(instance)) {
++count;
}
}
}
synchronized (lock) {
// Update threshold only if the threshold is greater than the
// current expected threshold or if self preservation is disabled.
if ((count) > (serverConfig.getRenewalPercentThreshold() * expectedNumberOfClientsSendingRenews)
|| (!this.isSelfPreservationModeEnabled())) {
this.expectedNumberOfClientsSendingRenews = count;
updateRenewsPerMinThreshold();
}
}
logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
} catch (Throwable e) {
logger.error("Cannot update renewal threshold", e);
}
}
3.2 Eureka自我保护机制的触发
com.netflix.eureka.registry.AbstractInstanceRegistry#postInit 在这个方法中,会开启一个EvictionTask的任务,这个任务用来检测是否需要开启自我保护机制
protected void postInit() {
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
evictionTaskRef.get().cancel();
}
evictionTaskRef.set(new EvictionTask());
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(),
serverConfig.getEvictionIntervalTimerInMs());
}
EvictionTask是最终执行的任务:
/* visible for testing */ class EvictionTask extends TimerTask {
private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);
@Override
public void run() {
try {
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
其中调用了 evict(compensationTimeMs); 方法
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
// 是否需要开启自我保护机制,如果需要,那么直接RETURE, 不需要继续往下执行了
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
}
isLeaseExpirationEnabled方法的实现在
com.netflix.eureka.registry.PeerAwareInstanceRegistryImpl#isLeaseExpirationEnabled
@Override
public boolean isLeaseExpirationEnabled() {
if (!isSelfPreservationModeEnabled()) {
// The self preservation mode is disabled, hence allowing the instances to expire.
return true;
}
return numberOfRenewsPerMinThreshold > 0 && getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
}
4 Eureka自我保护机制关闭
可以设置 eureka.server.enable-self-preservation=false 来关闭自我保护机制
也可以通过降低每分钟最小续约数量阈值,来避免自我保护机制的开启
eureka.server.renewal-percent-threshold=0.5
这样修改后,对于5个节点,eureka自我保护的阈值=服务总数* 每分钟续约数量(60s/客户端续约间隔)* 自我保护续约的百分比阈值因子
= 5 *2 *0.5 = 5, 也就是每个节点每分钟只需要检测到一个心跳包,eureka server就认为客户端节点是正常的。