一 .Eureka基本使用
常用的注册中心解决方案
常用注册中心 | 推送方式 | 是否支持持久化存储 | CAP特性 | |||
---|---|---|---|---|---|---|
Eureka | Pull | 缓存在内存中,不支持 | AP | |||
Consul | long polling | |||||
Zookeeper | push | |||||
Etcd | long polling | |||||
Nacos | long polling | 支持持久化存储到Mysql | ||||
redis |
- Eureka 非持久化存储 、 ap(高可用模型)、 集群节点的角色是相等的、
- Consul raft (redis-sentinel / nacos 选举 ) long polling
- Zookeeper zab协议(paxos) push
- Etcd (raft) long polling
- Nacos (raft) long polling
- redis
推送发送
- push / pull
存储
- 是否持久化
高可用机制
- 集群特性 -> 选举特性、 一致性问题、
cap特性
- cp 、ap
api的提供形式
- http协议、 netty通信
CAP的特点
C 一致性(consistency 强一致性)
A 可用性()
P 分区容错性 (集群节点/跨区域的高可用)
CP / AP
强一致性
高可用性
使用Eureka
1.对于Eureka-Service端,在pom文件中添加spring-cloud-starter-netflix-eureka-server依赖
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
</dependency>
2.在springboot启动类上添加@EnableEurekaServer注解
@EnableEurekaServer
@SpringBootApplication
public class SpringCloudEurekaServerApplication {
public static void main(String[] args) {
SpringApplication.run(SpringCloudEurekaServerApplication.class, args);
}
}
3.在application.properties配置文件中配置eureka注册的地址
spring.application.name=spring-cloud-eureka-server
server.port=9090
#指向服务注册中心的地址
eureka.client.service-url.defaultZone=http://localhost:9091/eureka
# 定时任务轮询的时间,默认是15分钟
eureka.server.wait-time-in-ms-when-sync-empty=10000
# Eureka Server在运行期间会去统计心跳失败的比例在15分钟之内是否低于85%
eureka.server.renewal-percent-threshold=0.5
# 关闭自我保护机制
eureka.server.enable-self-preservation=false
二 .Eureka自我保护机制
Eureka Server在运行期间会去统计心跳失败的比例在15分钟之内是否低于85% , 如果低于85%, Eureka Server会认为当前实例的客户端与自己的心跳连接出现了网络故障,那么Eureka Server会把这 些实例保护起来,让这些实例不会过期导致实例剔除。
这样做的目的是为了减少网络不稳定或者网络分区的情况下,Eureka Server将健康服务剔除下线的问 题。 使用自我保护机制可以使得Eureka 集群更加健壮和稳定的运行。
进入自我保护状态后,会出现以下几种情况
- Eureka Server不再从注册列表中移除因为长时间没有收到心跳而应该剔除的过期服务
- Eureka Server仍然能够接受新服务的注册和查询请求,但是不会被同步到其他节点上,保证当前节点依然可用。
自我保护机制的开启
为了演示效果,我们通过如下配置,将判定时间改为10s,接着启动Eureka Server,等待10s之 后,就会出现以上提示信息,表示自我保护被激活了。
#设置 eureka server同步失败的等待时间默认15分
#在这期间,它不向客户端提供服务注册信息
eureka.server.wait-time-in-ms-when-sync-empty=10000
重要变量
在Eureka的自我保护机制中,有两个很重要的变量,Eureka的自我保护机制,都是围绕这两个变量来 实现的,在AbstractInstanceRegistry这个类中定义的
protected volatile int numberOfRenewsPerMinThreshold; //每分钟最小续约数量
protected volatile int expectedNumberOfClientsSendingRenews; //预期每分钟收到续约的 客户端数量,取决于注册到eureka server上的服务数量
numberOfRenewsPerMinThreshold 表示每分钟的最小续约数量,它表示什么意思呢?就是Eureka Server期望每分钟收到客户端实例续约的总数的阈值。如果小于这个阈值,就会触发自我保护机制。
它是在一下代码中赋值的:
protected void updateRenewsPerMinThreshold() {
this.numberOfRenewsPerMinThreshold = (int) (this.expectedNumberOfClientsSendingRenews
* (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds())
* serverConfig.getRenewalPercentThreshold());
}
//自我保护阀值 = 服务总数 * 每分钟续约数(60S/客户端续约间隔) * 自我保护续约百分比阀值因子
numberOfRenewsPerMinThreshold = expectedNumberOfClientsSendingRenews * (60.0 / serverConfig.getExpectedClientRenewalIntervalSeconds()) * serverConfig.getRenewalPercentThreshold()
-
expectedNumberOfClientsSendingRenews:实际注册服务的总数量,可以从eureka界面查看。
-
getExpectedClientRenewalIntervalSeconds,客户端的续约间隔,默认为30s ,相当于一分钟要发送两次心跳。
-
getRenewalPercentThreshold,自我保护续约百分比阈值因子,默认0.85。 也就是说每分钟的续 约数量要大于85%
expectedNumberOfClientsSendingRenews 变量的动态更新
Eureka-Server的初始化 (服务初始化动态更新)
在EurekaBootstrap这个类中,有一个 initEurekaServerContext 方法
protected void initEurekaServerContext() throws Exception {
EurekaServerConfig eurekaServerConfig = new DefaultEurekaServerConfig();
。。。。。。。。。。
ApplicationInfoManager applicationInfoManager = null;
if (eurekaClient == null) {
EurekaInstanceConfig instanceConfig = isCloud(ConfigurationManager.getDeploymentContext())
? new CloudInstanceConfig()
: new MyDataCenterInstanceConfig();
applicationInfoManager = new ApplicationInfoManager(
instanceConfig, new EurekaConfigBasedInstanceInfoProvider(instanceConfig).get());
EurekaClientConfig eurekaClientConfig = new DefaultEurekaClientConfig();
eurekaClient = new DiscoveryClient(applicationInfoManager, eurekaClientConfig);
} else {
applicationInfoManager = eurekaClient.getApplicationInfoManager();
}
PeerAwareInstanceRegistry registry;
if (isAws(applicationInfoManager.getInfo())) {
registry = new AwsInstanceRegistry(
eurekaServerConfig,
eurekaClient.getEurekaClientConfig(),
serverCodecs,
eurekaClient
);
awsBinder = new AwsBinderDelegate(eurekaServerConfig, eurekaClient.getEurekaClientConfig(), registry, applicationInfoManager);
awsBinder.start();
} else {
registry = new PeerAwareInstanceRegistryImpl(
eurekaServerConfig,
eurekaClient.getEurekaClientConfig(),
serverCodecs,
eurekaClient
);
}
。。。。。。。。。。
// Copy registry from neighboring eureka node
int registryCount = registry.syncUp();
//启动的时候注册服务的应用
registry.openForTraffic(applicationInfoManager, registryCount);
// Register all monitoring statistics.
EurekaMonitors.registerAllStats();
}
PeerAwareInstanceRegistryImpl的openForTraffic方法
public void openForTraffic(ApplicationInfoManager applicationInfoManager, int count) {
// Renewals happen every 30 seconds and for a minute it should be a factor of 2.
this.expectedNumberOfClientsSendingRenews = count;
updateRenewsPerMinThreshold();
logger.info("Got {} instances from neighboring DS node", count);
logger.info("Renew threshold is: {}", numberOfRenewsPerMinThreshold);
this.startupTime = System.currentTimeMillis();
if (count > 0) {
this.peerInstancesTransferEmptyOnStartup = false;
}
DataCenterInfo.Name selfName = applicationInfoManager.getInfo().getDataCenterInfo().getName();
boolean isAws = Name.Amazon == selfName;
if (isAws && serverConfig.shouldPrimeAwsReplicaConnections()) {
logger.info("Priming AWS connections for all replicas..");
primeAwsReplicas(applicationInfoManager);
}
logger.info("Changing status to UP");
applicationInfoManager.setInstanceStatus(InstanceStatus.UP);
super.postInit();
}
PeerAwareInstanceRegistryImpl.cancel (服务下线动态更新)
当服务提供者主动下线时,表示这个时候Eureka-Server要剔除这个服务提供者的地址,同时也代表这 这个心跳续约的阈值要发生变化。所以在 PeerAwareInstanceRegistryImpl.cancel 中可以看到数据 的更新
调用路径 PeerAwareInstanceRegistryImpl.cancel -> AbstractInstanceRegistry.cancel>internalCancel
PeerAwareInstanceRegistryImpl.cancel
@Override
public boolean cancel(final String appName, final String id,
final boolean isReplication) {
if (super.cancel(appName, id, isReplication)) {
replicateToPeers(Action.Cancel, appName, id, null, null, isReplication);
return true;
}
return false;
}
AbstractInstanceRegistry.cancel
@Override
public boolean cancel(String appName, String id, boolean isReplication) {
return internalCancel(appName, id, isReplication);
}
AbstractInstanceRegistry.internalCancel
protected boolean internalCancel(String appName, String id, boolean isReplication) {
。。。。。。。。
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to cancel it, reduce the number of clients to send renews.
//服务下线,更新实际注册服务的总数量-1
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews - 1;
updateRenewsPerMinThreshold();
}
}
return true;
}
PeerAwareInstanceRegistryImpl.register (服务上下动态感知)
当有新的服务提供者注册到eureka-server上时,需要增加续约的客户端数量,所以在register方法中会 进行处理
调用的顺序:register ->super.register(AbstractInstanceRegistry)
PeerAwareInstanceRegistryImpl.register
@Override
public void register(final InstanceInfo info, final boolean isReplication) {
int leaseDuration = Lease.DEFAULT_DURATION_IN_SECS;
if (info.getLeaseInfo() != null && info.getLeaseInfo().getDurationInSecs() > 0) {
leaseDuration = info.getLeaseInfo().getDurationInSecs();
}
super.register(info, leaseDuration, isReplication);
replicateToPeers(Action.Register, info.getAppName(), info.getId(), info, null, isReplication);
}
AbstractInstanceRegistry.register
public void register(InstanceInfo registrant, int leaseDuration, boolean isReplication) {
try {
read.lock();
Map<String, Lease<InstanceInfo>> gMap = registry.get(registrant.getAppName());
REGISTER.increment(isReplication);
if (gMap == null) {
final ConcurrentHashMap<String, Lease<InstanceInfo>> gNewMap = new ConcurrentHashMap<String, Lease<InstanceInfo>>();
gMap = registry.putIfAbsent(registrant.getAppName(), gNewMap);
if (gMap == null) {
gMap = gNewMap;
}
}
Lease<InstanceInfo> existingLease = gMap.get(registrant.getId());
// Retain the last dirty timestamp without overwriting it, if there is already a lease
if (existingLease != null && (existingLease.getHolder() != null)) {
.........
} else {
// The lease does not exist and hence it is a new registration
synchronized (lock) {
if (this.expectedNumberOfClientsSendingRenews > 0) {
// Since the client wants to register it, increase the number of clients sending renews
this.expectedNumberOfClientsSendingRenews = this.expectedNumberOfClientsSendingRenews + 1;
updateRenewsPerMinThreshold();
}
}
logger.debug("No previous lease information found; it is new registration");
}
..............
}
PeerAwareInstanceRegistryImpl.scheduleRenewalThreshold UpdateTask(每15分钟更新一次)
15分钟运行一次,判断在15分钟之内心跳失败比例是否低于85%。在
DefaultEurekaServerContext 》@PostConstruct修饰的initialize()方法》init()
DefaultEurekaServerContext.initialize
@PostConstruct
@Override
public void initialize() {
logger.info("Initializing ...");
peerEurekaNodes.start();
try {
registry.init(peerEurekaNodes);
} catch (Exception e) {
throw new RuntimeException(e);
}
logger.info("Initialized");
}
PeerAwareInstanceRegistryImpl.init()
@Override
public void init(PeerEurekaNodes peerEurekaNodes) throws Exception {
this.numberOfReplicationsLastMin.start();
this.peerEurekaNodes = peerEurekaNodes;
initializedResponseCache();
//定时任务,每15分钟检查一次,
scheduleRenewalThresholdUpdateTask();
initRemoteRegionRegistry();
try {
Monitors.registerObject(this);
} catch (Throwable e) {
logger.warn("Cannot register the JMX monitor for the InstanceRegistry :", e);
}
}
PeerAwareInstanceRegistryImpl.scheduleRenewalThresholdUpdateTask()
/**
* Schedule the task that updates <em>renewal threshold</em> periodically.
* The renewal threshold would be used to determine if the renewals drop
* dramatically because of network partition and to protect expiring too
* many instances at a time.
*
*/
private void scheduleRenewalThresholdUpdateTask() {
timer.schedule(new TimerTask() {
@Override
public void run() {
updateRenewalThreshold();
}
}, serverConfig.getRenewalThresholdUpdateIntervalMs(),
serverConfig.getRenewalThresholdUpdateIntervalMs());
}
private void updateRenewalThreshold() {
try {
Applications apps = eurekaClient.getApplications();
int count = 0;
for (Application app : apps.getRegisteredApplications()) {
for (InstanceInfo instance : app.getInstances()) {
if (this.isRegisterable(instance)) {
++count;
}
}
}
synchronized (lock) {
// Update threshold only if the threshold is greater than the
// current expected threshold or if self preservation is disabled.
//检查自我保护机制的条件,并且更新期望的服务续约数量。
if ((count) > (serverConfig.getRenewalPercentThreshold() * expectedNumberOfClientsSendingRenews)
|| (!this.isSelfPreservationModeEnabled())) {
this.expectedNumberOfClientsSendingRenews = count;
updateRenewsPerMinThreshold();
}
}
logger.info("Current renewal threshold is : {}", numberOfRenewsPerMinThreshold);
} catch (Throwable e) {
logger.error("Cannot update renewal threshold", e);
}
}
自我保护机制触发任务
在AbstractInstanceRegistry的postInit方法中,会开启一个EvictionTask的任务,这个任务用来检测是 否需要开启自我保护机制。默认是60s
protected void postInit() {
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
evictionTaskRef.get().cancel();
}
//EvictionTask为要执行的任务
evictionTaskRef.set(new EvictionTask());
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(),
serverConfig.getEvictionIntervalTimerInMs());
}
其中,EvictionTask表示最终执行的任务
//EvictionTask内部类
class EvictionTask extends TimerTask {
private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);
@Override
public void run() {
try {
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
evict
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
//是否需要开启自我保护机制,如果需要,那么直接RETURE, 不需要继续往下执行了
if (!isLeaseExpirationEnabled()) {
logger.debug("DS: lease expiration is currently disabled.");
return;
}
//这下面主要是做服务自动下线的操作的
// We collect first all expired items, to evict them in random order. For large eviction sets,
// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
// the impact should be evenly distributed across all applications.
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
// triggering self-preservation. Without that we would wipe out full registry.
int registrySize = (int) getLocalRegistrySize();
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
int evictionLimit = registrySize - registrySizeThreshold;
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
Random random = new Random(System.currentTimeMillis());
for (int i = 0; i < toEvict; i++) {
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
Lease<InstanceInfo> lease = expiredLeases.get(i);
String appName = lease.getHolder().getAppName();
String id = lease.getHolder().getId();
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id);
internalCancel(appName, id, false);
}
}
}
isLeaseExpirationEnabled
- 是否开启了自我保护机制,如果没有,则跳过,默认是开启
- 计算是否需要开启自我保护,判断最后一分钟收到的续约数量是否大于 numberOfRenewsPerMinThreshold
@Override
public boolean isLeaseExpirationEnabled() {
if (!isSelfPreservationModeEnabled()) {
// The self preservation mode is disabled, hence allowing the instances to expire.
return true;
}
return numberOfRenewsPerMinThreshold > 0 && getNumOfRenewsInLastMin() > numberOfRenewsPerMinThreshold;
}