业务系统本身配置很多,一般都会引入配置中心,把配置文件都放到配置中心上,方便统一管理,且能做到配置热更新。如果配置中心服务不可用,会造成业务系统拉取不到配置,严重的话会造成业务系统服务不可用。因此,配置中心的高可用至关重要。
nacos高可用流程
nacos除了服务端高可用设计之外,本身自带的客户端sdk同样也加入了高可用流程设计。
首先,考虑最极端情况,nacos服务端所有节点全部宕机不可用。为了应对这种情况,nacos sdk会优先读取本地机器上的配置文件。一般情况下,这个文件都是不存在的,只有在nacos服务不可用时,可把配置文件放置在固定目录下,这样业务系统依然能正常加载配置信息。
其次,nacos向服务端拉取配置时,会采用轮询重试机制,如果服务端某个节点不可用,会自动去下一个节点拉取配置。这样,即使nacos集群有多个节点不可用,也不会影响客户端正常拉取配置。
最后,nacos每次从服务端拉取到配置后,都会将此次配置信息作为一个快照,存储到本地文件中。这样,业务系统下次加载配置时,如果nacos服务不可用,且本地没有预先设置应急的配置文件,则会从上次快照文件中加载配置。
源码解析
高可用整体流程:
private String getConfigInner(String tenant, String dataId, String group, long timeoutMs) throws NacosException {
group = null2defaultGroup(group);
ParamUtils.checkKeyParam(dataId, group);
ConfigResponse cr = new ConfigResponse();
cr.setDataId(dataId);
cr.setTenant(tenant);
cr.setGroup(group);
// 优先使用本地配置
String content = LocalConfigInfoProcessor.getFailover(agent.getName(), dataId, group, tenant);
if (content != null) {
LOGGER.warn("[{}] [get-config] get failover ok, dataId={}, group={}, tenant={}, config={}", agent.getName(),
dataId, group, tenant, ContentUtils.truncateContent(content));
cr.setContent(content);
configFilterChainManager.doFilter(null, cr);
content = cr.getContent();
return content;
}
// 去服务端拉取配置
try {
String[] ct = worker.getServerConfig(dataId, group, tenant, timeoutMs);
cr.setContent(ct[0]);
configFilterChainManager.doFilter(null, cr);
content = cr.getContent();
return content;
} catch (NacosException ioe) {
if (NacosException.NO_RIGHT == ioe.getErrCode()) {
throw ioe;
}
LOGGER.warn("[{}] [get-config] get from server error, dataId={}, group={}, tenant={}, msg={}",
agent.getName(), dataId, group, tenant, ioe.toString());
}
//服务端异常的话,读取上一次快照信息
LOGGER.warn("[{}] [get-config] get snapshot ok, dataId={}, group={}, tenant={}, config={}", agent.getName(),
dataId, group, tenant, ContentUtils.truncateContent(content));
content = LocalConfigInfoProcessor.getSnapshot(agent.getName(), dataId, group, tenant);
cr.setContent(content);
configFilterChainManager.doFilter(null, cr);
content = cr.getContent();
return content;
}
getServerConfig()方法代码如下,去服务端拉取请求,并缓存到本地文件:
public String[] getServerConfig(String dataId, String group, String tenant, long readTimeout)
throws NacosException {
String[] ct = new String[2];
if (StringUtils.isBlank(group)) {
group = Constants.DEFAULT_GROUP;
}
HttpRestResult<String> result = null;
try {
Map<String, String> params = new HashMap<String, String>(3);
if (StringUtils.isBlank(tenant)) {
params.put("dataId", dataId);
params.put("group", group);
} else {
params.put("dataId", dataId);
params.put("group", group);
params.put("tenant", tenant);
}
//调用get请求
result = agent.httpGet(Constants.CONFIG_CONTROLLER_PATH, null, params, agent.getEncode(), readTimeout);
} catch (Exception ex) {
String message = String
.format("[%s] [sub-server] get server config exception, dataId=%s, group=%s, tenant=%s",
agent.getName(), dataId, group, tenant);
LOGGER.error(message, ex);
throw new NacosException(NacosException.SERVER_ERROR, ex);
}
//针对不同的错误码做处理
switch (result.getCode()) {
//错误码200,则请求成功,缓存到本地文件中
case HttpURLConnection.HTTP_OK:
LocalConfigInfoProcessor.saveSnapshot(agent.getName(), dataId, group, tenant, result.getData());
ct[0] = result.getData();
if (result.getHeader().getValue(CONFIG_TYPE) != null) {
ct[1] = result.getHeader().getValue(CONFIG_TYPE);
} else {
ct[1] = ConfigType.TEXT.getType();
}
return ct;
//404,没找到,则顺便清空本地配置
case HttpURLConnection.HTTP_NOT_FOUND:
LocalConfigInfoProcessor.saveSnapshot(agent.getName(), dataId, group, tenant, null);
return ct;
case HttpURLConnection.HTTP_CONFLICT: {
LOGGER.error(
"[{}] [sub-server-error] get server config being modified concurrently, dataId={}, group={}, "
+ "tenant={}", agent.getName(), dataId, group, tenant);
throw new NacosException(NacosException.CONFLICT,
"data being modified, dataId=" + dataId + ",group=" + group + ",tenant=" + tenant);
}
case HttpURLConnection.HTTP_FORBIDDEN: {
LOGGER.error("[{}] [sub-server-error] no right, dataId={}, group={}, tenant={}", agent.getName(),
dataId, group, tenant);
throw new NacosException(result.getCode(), result.getMessage());
}
default: {
LOGGER.error("[{}] [sub-server-error] dataId={}, group={}, tenant={}, code={}", agent.getName(),
dataId, group, tenant, result.getCode());
throw new NacosException(result.getCode(),
"http error, code=" + result.getCode() + ",dataId=" + dataId + ",group=" + group + ",tenant="
+ tenant);
}
}
}
httpGet方法实现了轮询重试机制:
public HttpRestResult<String> httpGet(String path, Map<String, String> headers, Map<String, String> paramValues,
String encode, long readTimeoutMs) throws Exception {
final long endTime = System.currentTimeMillis() + readTimeoutMs;
injectSecurityInfo(paramValues);
String currentServerAddr = serverListMgr.getCurrentServerAddr();
int maxRetry = this.maxRetry;
HttpClientConfig httpConfig = HttpClientConfig.builder()
.setReadTimeOutMillis(Long.valueOf(readTimeoutMs).intValue())
.setConTimeOutMillis(ConfigHttpClientManager.getInstance().getConnectTimeoutOrDefault(100)).build();
//在超时时间以内进行重试
do {
try {
Header newHeaders = getSpasHeaders(paramValues, encode);
if (headers != null) {
newHeaders.addAll(headers);
}
HttpRestResult<String> result = NACOS_RESTTEMPLATE
.get(getUrl(currentServerAddr, path), httpConfig, newHeaders, paramValues, String.class);
if (isFail(result)) {
LOGGER.error("[NACOS ConnectException] currentServerAddr: {}, httpCode: {}",
serverListMgr.getCurrentServerAddr(), result.getCode());
} else {
//更新当前可用的服务端地址
serverListMgr.updateCurrentServerAddr(currentServerAddr);
return result;
}
//如果是连接超时或者socket超时,则catch异常进行重试,其他异常直接抛出了
} catch (ConnectException connectException) {
LOGGER.error("[NACOS ConnectException httpGet] currentServerAddr:{}, err : {}",
serverListMgr.getCurrentServerAddr(), connectException.getMessage());
} catch (SocketTimeoutException socketTimeoutException) {
LOGGER.error("[NACOS SocketTimeoutException httpGet] currentServerAddr:{}, err : {}",
serverListMgr.getCurrentServerAddr(), socketTimeoutException.getMessage());
} catch (Exception ex) {
LOGGER.error("[NACOS Exception httpGet] currentServerAddr: " + serverListMgr.getCurrentServerAddr(),
ex);
throw ex;
}
//轮询下一个服务端地址
if (serverListMgr.getIterator().hasNext()) {
currentServerAddr = serverListMgr.getIterator().next();
} else {
//如果轮询完了,则从头重新开始,最多3轮
maxRetry--;
if (maxRetry < 0) {
throw new ConnectException(
"[NACOS HTTP-GET] The maximum number of tolerable server reconnection errors has been reached");
}
serverListMgr.refreshCurrentServerAddr();
}
} while (System.currentTimeMillis() <= endTime);
LOGGER.error("no available server");
throw new ConnectException("no available server");
}