目录
3.soul-admin 中 /configs/listener 接口
1.前情回顾
书接上文,昨天的文章【Soul源码阅读】12.soul-admin 与 soul-bootstrap 同步机制之 http 长轮询解析(上) 4.2小节的坑(不开 zk 网关不能启动的错误),没研究明白,最后放弃了,把 soul-admin 和 soul-bootstrap 都关掉,把 soul 数据库删掉,然后依次重启 soul-admin 和 soul-bootstrap,项目正常启动。还是重置大法好啊,不过那个数据库我备份了,等后面把几个不同的场景理解更透彻了,再拿出来研究下,我们这里继续我们的源码阅读之旅。
2.soul-bootstrap 长轮询任务
这里接昨天文章的 3.2.2 小节,昨天只分析了获取所有配置数据,后面为每个 soul-admin 分别创建各自的线程,执行 HttpLongPollingTask 任务。
// 开启 http 长轮询,每一个 soul-admin 创建一个线程去监听变化
this.serverList.forEach(server -> this.executor.execute(new HttpLongPollingTask(server)));
让我们一起去看看这个 Http 长轮询任务具体是什么吧,废话不多说,直接上代码:
// HttpSyncDataService.java
class HttpLongPollingTask implements Runnable {
private String server;
// 尝试次数,默认为3
private final int retryTimes = 3;
HttpLongPollingTask(final String server) {
this.server = server;
}
@Override
public void run() {
while (RUNNING.get()) {
for (int time = 1; time <= retryTimes; time++) {
try {
// 真正执行逻辑封装
doLongPolling(server);
} catch (Exception e) {
// print warnning log.
if (time < retryTimes) {
log.warn("Long polling failed, tried {} times, {} times left, will be suspended for a while! {}",
time, retryTimes - time, e.getMessage());
ThreadUtils.sleep(TimeUnit.SECONDS, 5);
continue;
}
// print error, then suspended for a while.
log.error("Long polling failed, try again after 5 minutes!", e);
ThreadUtils.sleep(TimeUnit.MINUTES, 5);
}
}
}
log.warn("Stop http long polling.");
}
}
核心方法 run,第一行 while 循环,条件是 RUNNING.get(),找到了定义和改变状态的方法,就是当关闭时,任务关掉,线程池也关闭掉:
public class HttpSyncDataService implements SyncDataService, AutoCloseable {
private static final AtomicBoolean RUNNING = new AtomicBoolean(false);
@Override
public void close() throws Exception {
RUNNING.set(false);
if (executor != null) {
executor.shutdownNow();
// help gc
executor = null;
}
}
...
}
真正执行长连接逻辑:
// HttpSyncDataService.java
@SuppressWarnings("unchecked")
private void doLongPolling(final String server) {
MultiValueMap<String, String> params = new LinkedMultiValueMap<>(8);
for (ConfigGroupEnum group : ConfigGroupEnum.values()) {
// 根据类型从缓存中获取对应类型的同步
ConfigData<?> cacheConfig = factory.cacheConfigData(group);
// MD5 + 最后更新时间 拼接
String value = String.join(",", cacheConfig.getMd5(), String.valueOf(cacheConfig.getLastModifyTime()));
params.put(group.name(), Lists.newArrayList(value));
}
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_FORM_URLENCODED);
// 把 params 作为 body 体
HttpEntity httpEntity = new HttpEntity(params, headers);
// 拼接调用接口 http://localhost:9095/configs/listener
String listenerUrl = server + "/configs/listener";
log.debug("request listener configs: [{}]", listenerUrl);
JsonArray groupJson = null;
try {
// 通过 RestTemplate 向接口发送 POST 请求
String json = this.httpClient.postForEntity(listenerUrl, httpEntity, String.class).getBody();
log.debug("listener result: [{}]", json);
groupJson = GSON.fromJson(json, JsonObject.class).getAsJsonArray("data");
} catch (RestClientException e) {
String message = String.format("listener configs fail, server:[%s], %s", server, e.getMessage());
throw new SoulException(message, e);
}
if (groupJson != null) {
// fetch group configuration async.
ConfigGroupEnum[] changedGroups = GSON.fromJson(groupJson, ConfigGroupEnum[].class);
if (ArrayUtils.isNotEmpty(changedGroups)) {
log.info("Group config changed: {}", Arrays.toString(changedGroups));
// 如果返回消息中有变化的数据,会主动拉取
this.doFetchGroupConfig(server, changedGroups);
}
}
}
for 循环执行完时 params 的数据如下图:
向 soul-admin 发送 POST 请求后,断点迟迟没有相应,过了好长时间才返回如下成功信息,感觉这个接口事有蹊跷,先把这个方法看完,马上就去分析。
{"code":200,"message":"success","data":[]}
如果返回消息中有变化的数据,会通过前面那个接口主动拉取有变化数据对应的类型,而不是所有5种类型:
// HttpSyncDataService.java
private void doFetchGroupConfig(final String server, final ConfigGroupEnum... groups) {
StringBuilder params = new StringBuilder();
for (ConfigGroupEnum groupKey : groups) {
params.append("groupKeys").append("=").append(groupKey.name()).append("&");
}
String url = server + "/configs/fetch?" + StringUtils.removeEnd(params.toString(), "&");
log.info("request configs: [{}]", url);
String json = null;
try {
json = this.httpClient.getForObject(url, String.class);
} catch (RestClientException e) {
String message = String.format("fetch config fail from server[%s], %s", url, e.getMessage());
log.warn(message);
throw new SoulException(message, e);
}
// update local cache
// 更新本地缓存,这个方法在昨天已经分析过了,最终会调用到 dataRefresh.refresh(data) 这个模板方法
boolean updated = this.updateCacheWithJson(json);
if (updated) {
log.info("get latest configs: [{}]", json);
return;
}
// not updated. it is likely that the current config server has not been updated yet. wait a moment.
log.info("The config of the server[{}] has not been updated or is out of date. Wait for 30s to listen for changes again.", server);
// 休眠 30 秒
ThreadUtils.sleep(TimeUnit.SECONDS, 30);
}
好的,到这里 soul-bootstrap 端的长轮询任务就分析完了。
刚才有一个类似于阻塞的接口调用,我们到 soul-admin 端看看这个接口中有什么幺蛾子。
3.soul-admin 中 /configs/listener 接口
查询 "/listener",找到了 ConfigController
@ConditionalOnBean(HttpLongPollingDataChangedListener.class)
@RestController
@RequestMapping("/configs")
@Slf4j
public class ConfigController {
...
/**
* Listener.
*
* @param request the request
* @param response the response
*/
@PostMapping(value = "/listener")
public void listener(final HttpServletRequest request, final HttpServletResponse response) {
longPollingListener.doLongPolling(request, response);
}
...
}
下面这个方法的注释明确了2点:
1.如果配置数据变化了,这个类型的变化信息会立即响应。
2.否则,这个客户端请求线程会被阻塞,直到任意数据变化了,或者指定的超时时间到了。
// HttpLongPollingDataChangedListener.java
/**
* If the configuration data changes, the group information for the change is immediately responded.
* Otherwise, the client's request thread is blocked until any data changes or the specified timeout is reached.
*
* @param request the request
* @param response the response
*/
public void doLongPolling(final HttpServletRequest request, final HttpServletResponse response) {
// compare group md5
List<ConfigGroupEnum> changedGroup = compareChangedGroup(request);
String clientIp = getRemoteIp(request);
// response immediately.
// 因为数据变化了,立即响应
if (CollectionUtils.isNotEmpty(changedGroup)) {
this.generateResponse(response, changedGroup);
log.info("send response with the changed group, ip={}, group={}", clientIp, changedGroup);
return;
}
// listen for configuration changed.
final AsyncContext asyncContext = request.startAsync();
// AsyncContext.settimeout() does not timeout properly, so you have to control it yourself
asyncContext.setTimeout(0L);
// block client's thread.
scheduler.execute(new LongPollingClient(asyncContext, clientIp, HttpConstants.SERVER_MAX_HOLD_TIMEOUT));
}
3.1比对数据是否有变化
这里将发送过来的数据与当前缓存中数据进行比对,看看是否有变化的数据,逻辑如下:
// HttpLongPollingDataChangedListener.java
private List<ConfigGroupEnum> compareChangedGroup(final HttpServletRequest request) {
List<ConfigGroupEnum> changedGroup = new ArrayList<>(ConfigGroupEnum.values().length);
for (ConfigGroupEnum group : ConfigGroupEnum.values()) {
// md5,lastModifyTime
// 解析发送过来的 body 体
String[] params = StringUtils.split(request.getParameter(group.name()), ',');
if (params == null || params.length != 2) {
throw new SoulException("group param invalid:" + request.getParameter(group.name()));
}
String clientMd5 = params[0];
long clientModifyTime = NumberUtils.toLong(params[1]);
ConfigDataCache serverCache = CACHE.get(group.name());
// do check.
if (this.checkCacheDelayAndUpdate(serverCache, clientMd5, clientModifyTime)) {
changedGroup.add(group);
}
}
return changedGroup;
}
/**
* check whether the client needs to update the cache.
* @param serverCache the admin local cache
* @param clientMd5 the client md5 value
* @param clientModifyTime the client last modify time
* @return true: the client needs to be updated, false: not need.
*/
private boolean checkCacheDelayAndUpdate(final ConfigDataCache serverCache, final String clientMd5, final long clientModifyTime) {
// is the same, doesn't need to be updated
// MD5 值相同,没有变化,无需更新
if (StringUtils.equals(clientMd5, serverCache.getMd5())) {
return false;
}
// if the md5 value is different, it is necessary to compare lastModifyTime.
// 到这里 MD5 值就不同了,有变化,需要更新
long lastModifyTime = serverCache.getLastModifyTime();
if (lastModifyTime >= clientModifyTime) {
// the client's config is out of date.
return true;
}
// the lastModifyTime before client, then the local cache needs to be updated.
// Considering the concurrency problem, admin must lock,
// otherwise it may cause the request from soul-web to update the cache concurrently, causing excessive db pressure
boolean locked = false;
try {
locked = LOCK.tryLock(5, TimeUnit.SECONDS);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return true;
}
if (locked) {
try {
ConfigDataCache latest = CACHE.get(serverCache.getGroup());
if (latest != serverCache) {
// the cache of admin was updated. if the md5 value is the same, there's no need to update.
return !StringUtils.equals(clientMd5, latest.getMd5());
}
// load cache from db.
// 从数据库中捞数据更新本地缓存
this.refreshLocalCache();
latest = CACHE.get(serverCache.getGroup());
return !StringUtils.equals(clientMd5, latest.getMd5());
} finally {
LOCK.unlock();
}
}
// not locked, the client need to be updated.
return true;
}
3.2阻塞并监听变化
// HttpLongPollingDataChangedListener.java
/**
* If you exceed {@link HttpConstants#SERVER_MAX_HOLD_TIMEOUT} and still have no data change,
* empty data is returned. If the data changes within this time frame, the DataChangeTask
* cancellations the timed task and responds to the changed group data.
*/
class LongPollingClient implements Runnable {
/**
* The Async context.
*/
private final AsyncContext asyncContext;
/**
* The Ip.
*/
private final String ip;
/**
* The Timeout time.
*/
private final long timeoutTime;
/**
* The Async timeout future.
*/
private Future<?> asyncTimeoutFuture;
/**
* Instantiates a new Long polling client.
*
* @param ac the ac
* @param ip the ip
* @param timeoutTime the timeout time
*/
LongPollingClient(final AsyncContext ac, final String ip, final long timeoutTime) {
this.asyncContext = ac;
this.ip = ip;
this.timeoutTime = timeoutTime;
}
@Override
public void run() {
this.asyncTimeoutFuture = scheduler.schedule(() -> {
clients.remove(LongPollingClient.this);
List<ConfigGroupEnum> changedGroups = compareChangedGroup((HttpServletRequest) asyncContext.getRequest());
sendResponse(changedGroups);
}, timeoutTime, TimeUnit.MILLISECONDS);
clients.add(this);
}
/**
* Send response.
*
* @param changedGroups the changed groups
*/
void sendResponse(final List<ConfigGroupEnum> changedGroups) {
// cancel scheduler
if (null != asyncTimeoutFuture) {
asyncTimeoutFuture.cancel(false);
}
generateResponse((HttpServletResponse) asyncContext.getResponse(), changedGroups);
asyncContext.complete();
}
}
/**
* Send response datagram.
*
* @param response the response
* @param changedGroups the changed groups
*/
private void generateResponse(final HttpServletResponse response, final List<ConfigGroupEnum> changedGroups) {
try {
response.setHeader("Pragma", "no-cache");
response.setDateHeader("Expires", 0);
response.setHeader("Cache-Control", "no-cache,no-store");
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.setStatus(HttpServletResponse.SC_OK);
response.getWriter().println(GsonUtils.getInstance().toJson(SoulAdminResult.success(SoulResultMessage.SUCCESS, changedGroups)));
} catch (IOException ex) {
log.error("Sending response failed.", ex);
}
}
这里把请求如何阻塞住还是有点儿懵,带我去研究研究大佬的分析,然后再补上吧。