一,nacos核心功能点
服务注册: Nacos Client会通过发送REST请求的方式向Nacos Server注册自己的服务,提供自身的元 数据,比如ip地址、端口等信息。Nacos Server接收到注册请求后,就会把这些元数据信息存储在一个双层的内存Map中。
服务心跳:在服务注册后,Nacos Client会维护一个定时心跳来持续通知Nacos Server,说明服务一直处于可用状态,防止被剔除。默认5s发送一-次心跳。
服务同步: Nacos Server集群之间会互相同步服务实例,用来保证服务信息的一致性。
服务发现:服务消费者(Nacos Client)在调用服务提供者的服务时,会发送一个REST请求给Nacos Server,获取上面注册的服务清单,并且缓存在Nacos Client本地,同时会在Nacos Client本地开启一个定时任务定时拉取服务端最新的注册表信息更新到本地缓存
服务健康检查: Nacos Server会开启一个定时任务用来检查注册服务实例的健康情况,对于超过15s没有收到客户端心跳的实例会将它的healthy属性置为false(客户端服务发现时不会发现),如果某个实例超过30秒没有收到心跳,直接剔除该实例(被剔除的实例如果恢复发送心跳则会重新注册)
二,源码分析
1,切入点
1)入口
客户端想要在nacos服务器注册,需要引入依赖:
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-alibaba-nacos-discovery</artifactId>
<version>2.1.0.RELEASE</version>
</dependency>
这是一个springboot项目,所以点开这个依赖包下的spring.factories,里面导入了一个自动配置类 NacosDiscoverAutoConfiguration
@Configuration
@EnableConfigurationProperties
@ConditionalOnNacosDiscoveryEnabled
@ConditionalOnProperty(
value = {"spring.cloud.service-registry.auto-registration.enabled"},
matchIfMissing = true
)
@AutoConfigureAfter({AutoServiceRegistrationConfiguration.class, AutoServiceRegistrationAutoConfiguration.class})
public class NacosDiscoveryAutoConfiguration {
public NacosDiscoveryAutoConfiguration() {
}
@Bean
public NacosServiceRegistry nacosServiceRegistry(NacosDiscoveryProperties nacosDiscoveryProperties) {
return new NacosServiceRegistry(nacosDiscoveryProperties);
}
@Bean
@ConditionalOnBean({AutoServiceRegistrationProperties.class})
public NacosRegistration nacosRegistration(NacosDiscoveryProperties nacosDiscoveryProperties, ApplicationContext context) {
return new NacosRegistration(nacosDiscoveryProperties, context);
}
/**
*
*/
@Bean
@ConditionalOnBean({AutoServiceRegistrationProperties.class})
public NacosAutoServiceRegistration nacosAutoServiceRegistration(NacosServiceRegistry registry, AutoServiceRegistrationProperties autoServiceRegistrationProperties, NacosRegistration registration) {
return new NacosAutoServiceRegistration(registry, autoServiceRegistrationProperties, registration);
}
}
分析这个类可以发现,上面两个注入到容器中的组件实际上是用在了第三个组件 NacosAutoServiceRegistration
2)NacosAutoServiceRegistration
接下来分析这个组件,看他的类的继承关系:
public class NacosAutoServiceRegistration extends AbstractAutoServiceRegistration<Registration>
public abstract class AbstractAutoServiceRegistration<R extends Registration> implements AutoServiceRegistration, ApplicationContextAware, ApplicationListener<WebServerInitializedEvent
它实现了 ApplicationEvent接口,监听了web容器启动事件
3)onApplicationEvent()
public void onApplicationEvent(WebServerInitializedEvent event) {
this.bind(event);
}
4)bind()
public void bind(WebServerInitializedEvent event) {
ApplicationContext context = event.getApplicationContext();
if (!(context instanceof ConfigurableWebServerApplicationContext) || !"management".equals(((ConfigurableWebServerApplicationContext)context).getServerNamespace())) {
this.port.compareAndSet(0, event.getWebServer().getPort());
this.start();
}
}
5)start()
public void start() {
if (!this.isEnabled()) {
if (logger.isDebugEnabled()) {
logger.debug("Discovery Lifecycle disabled. Not starting");
}
} else {
if (!this.running.get()) {
this.context.publishEvent(new InstancePreRegisteredEvent(this, this.getRegistration()));
//切入点
this.register();
if (this.shouldRegisterManagement()) {
this.registerManagement();
}
this.context.publishEvent(new InstanceRegisteredEvent(this, this.getConfiguration()));
this.running.compareAndSet(false, true);
}
}
}
在**start()**里面调用了 register(),以这个方法为切入点进行分析
2,客户端注册逻辑
1)register()
protected void register() {
this.serviceRegistry.register(this.getRegistration());
}
public void register(Registration registration) {
if (StringUtils.isEmpty(registration.getServiceId())) {
log.warn("No service to register for nacos client...");
} else {
//服务名称
String serviceId = registration.getServiceId();
//封装一个节点实例
Instance instance = this.getNacosInstanceFromRegistration(registration);
try {
//真正的注册逻辑
this.namingService.registerInstance(serviceId, instance);
log.info("nacos registry, {} {}:{} register finished", new Object[]{serviceId, instance.getIp(), instance.getPort()});
} catch (Exception var5) {
log.error("nacos registry, {} register failed...{},", new Object[]{serviceId, registration.toString(), var5});
}
}
}
这个方法其实就是封装了一个节点实例 Instance,然后利用 **registerInstance()**实现注册的逻辑。
2)registerInstance()
//这个方法就是nacos-client包里面的
public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
if (instance.isEphemeral()) {
//配置信息封装
BeatInfo beatInfo = new BeatInfo();
beatInfo.setServiceName(NamingUtils.getGroupedName(serviceName, groupName));
beatInfo.setIp(instance.getIp());
beatInfo.setPort(instance.getPort());
beatInfo.setCluster(instance.getClusterName());
beatInfo.setWeight(instance.getWeight());
beatInfo.setMetadata(instance.getMetadata());
beatInfo.setScheduled(false);
long instanceInterval = instance.getInstanceHeartBeatInterval();
beatInfo.setPeriod(instanceInterval == 0L ? DEFAULT_HEART_BEAT_INTERVAL : instanceInterval);
//心跳
this.beatReactor.addBeatInfo(NamingUtils.getGroupedName(serviceName, groupName), beatInfo);
}
//注册服务
this.serverProxy.registerService(NamingUtils.getGroupedName(serviceName, groupName), groupName, instance);
}
这里面有两个方法,一个是发送心跳的方法 addBeatInfo(),一个是注册服务的方法 registerService()。分别看一下这两个逻辑
①发送心跳
//心跳
public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
LogUtils.NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
this.dom2Beat.put(this.buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort()), beatInfo);
//定时任务发送心跳
this.executorService.schedule(new BeatReactor.BeatTask(beatInfo), 0L, TimeUnit.MILLISECONDS);
MetricsMonitor.getDom2BeatSizeMonitor().set((double)this.dom2Beat.size());
}
这个方法利用异步线程池定时发送心跳任务,看一下发送的是什么任务。
//定时任务发送心跳
class BeatTask implements Runnable {
BeatInfo beatInfo;
public BeatTask(BeatInfo beatInfo) {
this.beatInfo = beatInfo;
}
public void run() {
if (!this.beatInfo.isStopped()) {
long result = BeatReactor.this.serverProxy.sendBeat(this.beatInfo);
long nextTime = result > 0L ? result : this.beatInfo.getPeriod();
BeatReactor.this.executorService.schedule(BeatReactor.this.new BeatTask(this.beatInfo), nextTime, TimeUnit.MILLISECONDS);
}
}
}
核心方法就是 sendBeat()
//真正发送心跳的方法
public long sendBeat(BeatInfo beatInfo) {
try {
if (LogUtils.NAMING_LOGGER.isDebugEnabled()) {
LogUtils.NAMING_LOGGER.debug("[BEAT] {} sending beat to server: {}", this.namespaceId, beatInfo.toString());
}
Map<String, String> params = new HashMap(4);
params.put("beat", JSON.toJSONString(beatInfo));
params.put("namespaceId", this.namespaceId);
params.put("serviceName", beatInfo.getServiceName());
//发送心跳的核心
String result = this.reqAPI(UtilAndComs.NACOS_URL_BASE + "/instance/beat", params, (String)"PUT");
JSONObject jsonObject = JSON.parseObject(result);
if (jsonObject != null) {
return jsonObject.getLong("clientBeatInterval");
}
} catch (Exception var5) {
LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: " + JSON.toJSONString(beatInfo), var5);
}
return 0L;
}
原来底层就是用了一个 httpclient往一个固定的url发送请求。
②服务注册
回到刚才的 registerInstance(),来看服务注册的逻辑 registerService()。
//注册服务
public void registerService(String serviceName, String groupName, Instance instance) throws NacosException {
LogUtils.NAMING_LOGGER.info("[REGISTER-SERVICE] {} registering service {} with instance: {}", new Object[]{this.namespaceId, serviceName, instance});
Map<String, String> params = new HashMap(9);
params.put("namespaceId", this.namespaceId);
params.put("serviceName", serviceName);
params.put("groupName", groupName);
params.put("clusterName", instance.getClusterName());
params.put("ip", instance.getIp());
params.put("port", String.valueOf(instance.getPort()));
params.put("weight", String.valueOf(instance.getWeight()));
params.put("enable", String.valueOf(instance.isEnabled()));
params.put("healthy", String.valueOf(instance.isHealthy()));
params.put("ephemeral", String.valueOf(instance.isEphemeral()));
params.put("metadata", JSON.toJSONString(instance.getMetadata()));
//核心逻辑
this.reqAPI(UtilAndComs.NACOS_URL_INSTANCE, params, (String)"POST");
}
前面都是参数封装,核心逻辑就是reqAPI(),看到这里,发现其实服务注册也是客户端通过HTTP请求往服务端一个固定的URL发送请求。
public String reqAPI(String api, Map<String, String> params, List<String> servers, String method) {
params.put("namespaceId", this.getNamespaceId());
if (CollectionUtils.isEmpty(servers) && StringUtils.isEmpty(this.nacosDomain)) {
throw new IllegalArgumentException("no server available");
} else {
Exception exception = new Exception();
if (servers != null && !servers.isEmpty()) {
Random random = new Random(System.currentTimeMillis());
int index = random.nextInt(servers.size());
for(int i = 0; i < servers.size(); ++i) {
String server = (String)servers.get(index);
try {
// 调用HttpClient给服务端发送请求
return this.callServer(api, params, server, method);
} catch (NacosException var11) {
exception = var11;
LogUtils.NAMING_LOGGER.error("request {} failed.", server, var11);
} catch (Exception var12) {
exception = var12;
LogUtils.NAMING_LOGGER.error("request {} failed.", server, var12);
}
index = (index + 1) % servers.size();
}
throw new IllegalStateException("failed to req API:" + api + " after all servers(" + servers + ") tried: " + ((Exception)exception).getMessage());
} else {
int i = 0;
while(i < 3) {
try {
return this.callServer(api, params, this.nacosDomain);
} catch (Exception var13) {
exception = var13;
LogUtils.NAMING_LOGGER.error("[NA] req api:" + api + " failed, server(" + this.nacosDomain, var13);
++i;
}
}
throw new IllegalStateException("failed to req API:/api/" + api + " after all servers(" + servers + ") tried: " + ((Exception)exception).getMessage());
}
}
}
至此,客户端源码分析完毕。
3,服务端注册逻辑
nacos源码里面的naming模块就是对应的服务注册模块
打开里面的controller包,根据客户端的源码,可以定位到InstanceController里面POST的方法
@CanDistro
@PostMapping
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public String register(HttpServletRequest request) throws Exception {
final String namespaceId = WebUtils
.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
NamingUtils.checkServiceNameFormat(serviceName);
final Instance instance = parseInstance(request);
//由此处进入service层的业务逻辑
serviceManager.registerInstance(namespaceId, serviceName, instance);
return "ok";
}
1)registerInstance()
public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
//创建空的service --健康检查
createEmptyService(namespaceId, serviceName, instance.isEphemeral());
Service service = getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.INVALID_PARAM,
"service not found, namespace: " + namespaceId + ", service: " + serviceName);
}
//添加实例
addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
}
这个方法里面主要完成了两个逻辑 健康检查 createEmptyService() 添加实例 addInstance()。
①健康检查
public void createEmptyService(String namespaceId, String serviceName, boolean local) throws NacosException {
createServiceIfAbsent(namespaceId, serviceName, local, null);
}
public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster)
throws NacosException {
//可以通过这里研究分区结构
Service service = getService(namespaceId, serviceName);
if (service == null) {
Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
service = new Service();
service.setName(serviceName);
service.setNamespaceId(namespaceId);
service.setGroupName(NamingUtils.getGroupName(serviceName));
// now validate the service. if failed, exception will be thrown
service.setLastModifiedMillis(System.currentTimeMillis());
service.recalculateChecksum();
if (cluster != null) {
cluster.setService(service);
service.getClusterMap().put(cluster.getName(), cluster);
}
service.validate();
//初始化服务,放到了内存map,此处放的是空service
putServiceAndInit(service);
if (!local) {
//添加或者替换
addOrReplaceService(service);
}
}
}
**createServiceIfAbsent()**这个方法获取到所有的service实例,然后通过 **putServiceAndInit()**初始化服务,放到了内存map,此处放的是空service。
a.getService()
public Service getService(String namespaceId, String serviceName) {
if (serviceMap.get(namespaceId) == null) {
return null;
}
return chooseServiceMap(namespaceId).get(serviceName);
}
看一下这个serviceMap ,nacos分区
/**
* Map(namespace, Map(group::serviceName, Service)).
*/
private final Map<String, Map<String, Service>> serviceMap = new ConcurrentHashMap<>();
b.putServiceAndInit()
private void putServiceAndInit(Service service) throws NacosException {
//放到map
putService(service);
service = getService(service.getNamespaceId(), service.getName());
//初始化 --服务心跳
service.init();
consistencyService
.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
consistencyService
.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJson());
}
在这个方法里,先将实例放到内存实例列表 ,然后进行心跳检查 init() 。
public void init() {
//健康检查
HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
entry.getValue().setService(this);
entry.getValue().init();
}
}
他利用了定时任务不停的进行心跳检查 scheduleCheck()
public static void scheduleCheck(ClientBeatCheckTask task) {
//执行任务ClientBeatCheckTask --run()
futureMap.computeIfAbsent(task.taskKey(),
k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
}
看这个 ClientBeatCheckTask的 run()
@Override
public void run() {
try {
if (!getDistroMapper().responsible(service.getName())) {
return;
}
if (!getSwitchDomain().isHealthCheckEnabled()) {
return;
}
List<Instance> instances = service.allIPs(true);
// first set health status of instances:
for (Instance instance : instances) {
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
if (!instance.isMarked()) {
if (instance.isHealthy()) {
instance.setHealthy(false);
getPushService().serviceChanged(service);
ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
}
}
}
}
if (!getGlobalConfig().isExpireInstance()) {
return;
}
// then remove obsolete instances:
for (Instance instance : instances) {
if (instance.isMarked()) {
continue;
}
if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
// delete instance
deleteIp(instance);
}
}
} catch (Exception e) {
Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
}
}
这个方法的主要逻辑其实就是先获取到所有的服务实例列表,然后根据上次发心跳的时间判断这个服务是不是健康实例,如果不是将健康状态设置为false,然后广播出去,在判断这个服务是不是down了,如果down了,移除掉这个实例。
②服务注册
public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
throws NacosException {
String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
//获取实例列表
Service service = getService(namespaceId, serviceName);
synchronized (service) {
//添加并返回对应service的所有实例
List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
Instances instances = new Instances();
instances.setInstanceList(instanceList);
//将service的所有实例放入到内存注册表
consistencyService.put(key, instances);
}
}
来看 addIpAddresses()
private List<Instance> addIpAddresses(Service service, boolean ephemeral, Instance... ips) throws NacosException {
return updateIpAddresses(service, UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD, ephemeral, ips);
}
public List<Instance> updateIpAddresses(Service service, String action, boolean ephemeral, Instance... ips)
throws NacosException {
Datum datum = consistencyService
.get(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), ephemeral));
List<Instance> currentIPs = service.allIPs(ephemeral);
Map<String, Instance> currentInstances = new HashMap<>(currentIPs.size());
Set<String> currentInstanceIds = Sets.newHashSet();
for (Instance instance : currentIPs) {
currentInstances.put(instance.toIpAddr(), instance);
currentInstanceIds.add(instance.getInstanceId());
}
Map<String, Instance> instanceMap;
if (datum != null && null != datum.value) {
instanceMap = setValid(((Instances) datum.value).getInstanceList(), currentInstances);
} else {
instanceMap = new HashMap<>(ips.length);
}
for (Instance instance : ips) {
if (!service.getClusterMap().containsKey(instance.getClusterName())) {
Cluster cluster = new Cluster(instance.getClusterName(), service);
cluster.init();
service.getClusterMap().put(instance.getClusterName(), cluster);
}
if (UtilsAndCommons.UPDATE_INSTANCE_ACTION_REMOVE.equals(action)) {
instanceMap.remove(instance.getDatumKey());
} else {
Instance oldInstance = instanceMap.get(instance.getDatumKey());
if (oldInstance != null) {
instance.setInstanceId(oldInstance.getInstanceId());
} else {
instance.setInstanceId(instance.generateInstanceId(currentInstanceIds));
}
instanceMap.put(instance.getDatumKey(), instance);
}
}
if (instanceMap.size() <= 0 && UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD.equals(action)) {
throw new IllegalArgumentException(
"ip list can not be empty, service: " + service.getName() + ", ip list: " + JacksonUtils
.toJson(instanceMap.values()));
}
return new ArrayList<>(instanceMap.values());
}
这个方法的主要逻辑就是根据前面拼装好的key来获取服务实例列表,因为这个服务实例在添加的时候,可能内存列表里面已经存在该服务的多个实例,所以要先尝试获取,获取之后在添加。
③集群同步
回头看服务注册的**addInstance()**里面的逻辑
因为现在我们操作的是内存map都是临时节点,所以我们来看 **consistencyService.put(key, instances)**时,选择的实现类是 DistroConsistencyServiceImpl。
@Override
public void put(String key, Record value) throws NacosException {
//放到内存-集群同步
onPut(key, value);
distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE,
globalConfig.getTaskDispatchPeriod() / 2);
}
public void onPut(String key, Record value) {
if (KeyBuilder.matchEphemeralInstanceListKey(key)) {
Datum<Instances> datum = new Datum<>();
datum.value = (Instances) value;
datum.key = key;
datum.timestamp.incrementAndGet();
dataStore.put(key, datum);
}
if (!listeners.containsKey(key)) {
return;
}
//同步给其他节点
notifier.addTask(key, DataOperation.CHANGE);
}
public void addTask(String datumKey, DataOperation action) {
if (services.containsKey(datumKey) && action == DataOperation.CHANGE) {
return;
}
if (action == DataOperation.CHANGE) {
services.put(datumKey, StringUtils.EMPTY);
}
//将刚刚的key放入到队列
tasks.offer(Pair.with(datumKey, action));
}
这里的逻辑大概是将这个新增的服务添加到阻塞队列里面,也就是同步到其他nacos服务端节点,做集群同步。
那么其他集群节点是什么时候从队列获取数据的呢?
回头看 DistroConsistencyServiceImpl,这个类里面有一个init()
/**
* 调用run()
* 很多开源框架为了提升操作性能会大量使用这种异步任务操作,
* 这些操作本身并不需要写入之后立即成功,
* 用这种方式对提升操作性能有很大帮助
*/
@PostConstruct
public void init() {
GlobalExecutor.submitDistroNotifyTask(notifier);
}
看他提交的是一个什么任务 Notifier ,这个类实现了Runnable接口,所以直接来看 run()。
@Override
public void run() {
Loggers.DISTRO.info("distro notifier started");
for (; ; ) {
try {
//从阻塞队列拿数据
Pair<String, DataOperation> pair = tasks.take();
//处理
handle(pair);
} catch (Throwable e) {
Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
}
}
}
private void handle(Pair<String, DataOperation> pair) {
try {
String datumKey = pair.getValue0();
DataOperation action = pair.getValue1();
services.remove(datumKey);
int count = 0;
if (!listeners.containsKey(datumKey)) {
return;
}
for (RecordListener listener : listeners.get(datumKey)) {
count++;
try {
if (action == DataOperation.CHANGE) {//更新
//入口方法
listener.onChange(datumKey, dataStore.get(datumKey).value);
continue;
}
if (action == DataOperation.DELETE) {//删除
listener.onDelete(datumKey);
continue;
}
} catch (Throwable e) {
}
}
if (Loggers.DISTRO.isDebugEnabled()) {
}
} catch (Throwable e) {
}
}
}
这个类实际上通过一对条件判断,调用了 OnChange()。
@Override
public void onChange(String key, Instances value) throws Exception {
for (Instance instance : value.getInstanceList()) {
if (instance == null) {
throw new RuntimeException("got null instance " + key);
}
if (instance.getWeight() > 10000.0D) {
instance.setWeight(10000.0D);
}
if (instance.getWeight() < 0.01D && instance.getWeight() > 0.0D) {
instance.setWeight(0.01D);
}
}
//更新IP到内存结构
updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key));
recalculateChecksum();
}
/**
* nacos这个更新注册表内存方法里,为了防止读写并发冲突,
* 大量的运用了CopyOnWrite思想防止并发读写冲突,
* 具体做法就 是把原内存结构复制- -份,操作完最后再合并回真正的注册表内存里去。
*
* Eureka防止读写并发冲突用的方法是注册表的多级缓存结构,
* 只读缓存,读写缓存,内存注册表,各级缓存之间定时同步,
* 客户端感知的及时性不如nacos
* @param instances
* @param ephemeral
*/
public void updateIPs(Collection<Instance> instances, boolean ephemeral) {
Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size());
for (String clusterName : clusterMap.keySet()) {
ipMap.put(clusterName, new ArrayList<>());
}
for (Instance instance : instances) {
try {
if (instance == null) {
Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null");
continue;
}
if (StringUtils.isEmpty(instance.getClusterName())) {
instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME);
}
if (!clusterMap.containsKey(instance.getClusterName())) {
Cluster cluster = new Cluster(instance.getClusterName(), this);
cluster.init();
getClusterMap().put(instance.getClusterName(), cluster);
}
List<Instance> clusterIPs = ipMap.get(instance.getClusterName());
if (clusterIPs == null) {
clusterIPs = new LinkedList<>();
ipMap.put(instance.getClusterName(), clusterIPs);
}
clusterIPs.add(instance);
} catch (Exception e) {
}
}
for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) {
//make every ip mine
List<Instance> entryIPs = entry.getValue();
//真正更新注册表的逻辑
clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral);
}
setLastModifiedMillis(System.currentTimeMillis());
getPushService().serviceChanged(this);
StringBuilder stringBuilder = new StringBuilder();
for (Instance instance : allIPs()) {
stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(",");
}
}
nacos这个更新注册表内存方法里,为了防止读写并发冲突,大量的运用了CopyOnWrite思想防止并发读写冲突,具体做法就 是把原内存结构复制- -份,操作完最后再合并回真正的注册表内存里去。
Eureka防止读写并发冲突用的方法是注册表的多级缓存结构,只读缓存,读写缓存,内存注册表,各级缓存之间定时同步,客户端感知的及时性不如nacos 。
4,服务端接收心跳逻辑
InstanceController
如何定位到这个类里面的方法的?在分析客户端代码的时候,我们知道,实际上心跳检查是通过一次http请求到服务端做发送心跳,因此在客户端我们就可以获取到请求路径,从而在服务端定位到具体controller方法
@CanDistro
@PutMapping("/beat")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public ObjectNode beat(HttpServletRequest request) throws Exception {
ObjectNode result = JacksonUtils.createEmptyJsonNode();
result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, switchDomain.getClientBeatInterval());
String beat = WebUtils.optional(request, "beat", StringUtils.EMPTY);
RsInfo clientBeat = null;
if (StringUtils.isNotBlank(beat)) {
clientBeat = JacksonUtils.toObj(beat, RsInfo.class);
}
String clusterName = WebUtils
.optional(request, CommonParams.CLUSTER_NAME, UtilsAndCommons.DEFAULT_CLUSTER_NAME);
String ip = WebUtils.optional(request, "ip", StringUtils.EMPTY);
int port = Integer.parseInt(WebUtils.optional(request, "port", "0"));
if (clientBeat != null) {
if (StringUtils.isNotBlank(clientBeat.getCluster())) {
clusterName = clientBeat.getCluster();
} else {
// fix #2533
clientBeat.setCluster(clusterName);
}
ip = clientBeat.getIp();
port = clientBeat.getPort();
}
String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
NamingUtils.checkServiceNameFormat(serviceName);
Loggers.SRV_LOG.debug("[CLIENT-BEAT] full arguments: beat: {}, serviceName: {}", clientBeat, serviceName);
Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
if (instance == null) {
if (clientBeat == null) {
result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
return result;
}
Loggers.SRV_LOG.warn("[CLIENT-BEAT] The instance has been removed for health mechanism, "
+ "perform data compensation operations, beat: {}, serviceName: {}", clientBeat, serviceName);
instance = new Instance();
instance.setPort(clientBeat.getPort());
instance.setIp(clientBeat.getIp());
instance.setWeight(clientBeat.getWeight());
instance.setMetadata(clientBeat.getMetadata());
instance.setClusterName(clusterName);
instance.setServiceName(serviceName);
instance.setInstanceId(instance.getInstanceId());
instance.setEphemeral(clientBeat.isEphemeral());
serviceManager.registerInstance(namespaceId, serviceName, instance);
}
Service service = serviceManager.getService(namespaceId, serviceName);
if (service == null) {
throw new NacosException(NacosException.SERVER_ERROR,
"service not found: " + serviceName + "@" + namespaceId);
}
if (clientBeat == null) {
clientBeat = new RsInfo();
clientBeat.setIp(ip);
clientBeat.setPort(port);
clientBeat.setCluster(clusterName);
}
service.processClientBeat(clientBeat);
result.put(CommonParams.CODE, NamingResponseCode.OK);
if (instance.containsMetadata(PreservedMetadataKeys.HEART_BEAT_INTERVAL)) {
result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, instance.getInstanceHeartBeatInterval());
}
result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
return result;
}
这个方法实际上就是根据客户端发来的数据拼装key,然后从服务端获取实例注册列表,修改实例健康状态和最后接收心跳时间。
小细节:这里为什么又注册了一次?
因为可能网络不通或者nacos服务端down机,导致服务实例列表没有了。
5,客户端服务发现逻辑
NacosNamingService
public List<Instance> getAllInstances(String serviceName, String groupName, List<String> clusters, boolean subscribe) throws NacosException {
ServiceInfo serviceInfo;
if (subscribe) {
serviceInfo = this.hostReactor.getServiceInfo(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ","));
} else {
serviceInfo = this.hostReactor.getServiceInfoDirectlyFromServer(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ","));
}
List list;
return (List)(serviceInfo != null && !CollectionUtils.isEmpty(list = serviceInfo.getHosts()) ? list : new ArrayList());
}
1)getServiceInfo()
public ServiceInfo getServiceInfo(String serviceName, String clusters) {
LogUtils.NAMING_LOGGER.debug("failover-mode: " + this.failoverReactor.isFailoverSwitch());
String key = ServiceInfo.getKey(serviceName, clusters);
if (this.failoverReactor.isFailoverSwitch()) {
return this.failoverReactor.getService(key);
} else {
ServiceInfo serviceObj = this.getServiceInfo0(serviceName, clusters);
if (null == serviceObj) {
serviceObj = new ServiceInfo(serviceName, clusters);
this.serviceInfoMap.put(serviceObj.getKey(), serviceObj);
this.updatingMap.put(serviceName, new Object());
this.updateServiceNow(serviceName, clusters);
this.updatingMap.remove(serviceName);
} else if (this.updatingMap.containsKey(serviceName)) {
synchronized(serviceObj) {
try {
serviceObj.wait(5000L);
} catch (InterruptedException var8) {
LogUtils.NAMING_LOGGER.error("[getServiceInfo] serviceName:" + serviceName + ", clusters:" + clusters, var8);
}
}
}
this.scheduleUpdateIfAbsent(serviceName, clusters);
return (ServiceInfo)this.serviceInfoMap.get(serviceObj.getKey());
}
}
①getService()
public ServiceInfo getService(String key) {
ServiceInfo serviceInfo = (ServiceInfo)this.serviceMap.get(key);
if (serviceInfo == null) {
serviceInfo = new ServiceInfo();
serviceInfo.setName(key);
}
return serviceInfo;
}
private Map<String, ServiceInfo> serviceMap = new ConcurrentHashMap();
这是在服务本地缓存的一份服务列表 ,getService()是直接从本地缓存列表拿数据。
2)updateServiceNow()
public void updateServiceNow(String serviceName, String clusters) {
ServiceInfo oldService = this.getServiceInfo0(serviceName, clusters);
boolean var15 = false;
label121: {
try {
var15 = true;
String result = this.serverProxy.queryList(serviceName, clusters, this.pushReceiver.getUDPPort(), false);
if (StringUtils.isNotEmpty(result)) {
this.processServiceJSON(result);
var15 = false;
} else {
var15 = false;
}
break label121;
} catch (Exception var19) {
LogUtils.NAMING_LOGGER.error("[NA] failed to update serviceName: " + serviceName, var19);
var15 = false;
} finally {
if (var15) {
if (oldService != null) {
synchronized(oldService) {
oldService.notifyAll();
}
}
}
}
if (oldService != null) {
synchronized(oldService) {
oldService.notifyAll();
}
}
return;
}
if (oldService != null) {
synchronized(oldService) {
oldService.notifyAll();
}
}
}
①serverProxy.queryList()
public String queryList(String serviceName, String clusters, int udpPort, boolean healthyOnly) throws NacosException {
Map<String, String> params = new HashMap(8);
params.put("namespaceId", this.namespaceId);
params.put("serviceName", serviceName);
params.put("clusters", clusters);
params.put("udpPort", String.valueOf(udpPort));
params.put("clientIP", NetUtils.localIP());
params.put("healthyOnly", String.valueOf(healthyOnly));
return this.reqAPI(UtilAndComs.NACOS_URL_BASE + "/instance/list", params, (String)"GET");
}
由此可见看到,他是通过httpclient远程调用来从服务端拉取实例列表的,并且会缓存一份在本地。
6,服务端服务发现逻辑
@GetMapping("/list")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.READ)
public ObjectNode list(HttpServletRequest request) throws Exception {
String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
NamingUtils.checkServiceNameFormat(serviceName);
String agent = WebUtils.getUserAgent(request);
String clusters = WebUtils.optional(request, "clusters", StringUtils.EMPTY);
String clientIP = WebUtils.optional(request, "clientIP", StringUtils.EMPTY);
int udpPort = Integer.parseInt(WebUtils.optional(request, "udpPort", "0"));
String env = WebUtils.optional(request, "env", StringUtils.EMPTY);
boolean isCheck = Boolean.parseBoolean(WebUtils.optional(request, "isCheck", "false"));
String app = WebUtils.optional(request, "app", StringUtils.EMPTY);
String tenant = WebUtils.optional(request, "tid", StringUtils.EMPTY);
boolean healthyOnly = Boolean.parseBoolean(WebUtils.optional(request, "healthyOnly", "false"));
return doSrvIpxt(namespaceId, serviceName, agent, clusters, clientIP, udpPort, env, isCheck, app, tenant,
healthyOnly);
}
前面都是参数封装,只有最后一行 doSrvIpxt() 才是真正的逻辑
srvedIPs = service.srvIPs(Arrays.asList(StringUtils.split(clusters, ",")));
public List<Instance> srvIPs(List<String> clusters) {
if (CollectionUtils.isEmpty(clusters)) {
clusters = new ArrayList<>();
clusters.addAll(clusterMap.keySet());
}
return allIPs(clusters);
}
public List<Instance> allIPs(List<String> clusters) {
List<Instance> result = new ArrayList<>();
for (String cluster : clusters) {
Cluster clusterObj = clusterMap.get(cluster);
if (clusterObj == null) {
continue;
}
result.addAll(clusterObj.allIPs());
}
return result;
}
服务端的逻辑就是获取到实例的注册列表,其中大量使用了写时复制技术,提高响应效率。