2021-11-29
2021SC@SDUSC
11-29-DolphinScheduler(10)
WorkerServer
最后我们分析WorkerServer,这是与master同级的类。与master分析思路一致,还是先来看stop方法。
此处不再展示代码,只总结stop逻辑。
1.调用Stopper.stop设置全局变量。停止所有线程的“死”循环
2.休眠3秒
3.停止worker心跳。heartbeatWorkerService.shutdownNow
4.停止worker任务线程池。ThreadPoolExecutors.getInstance().shutdown
5.停止killExecutor线程池。killExecutorService.shutdownNow
6.停止fetchTask线程池。fetchTaskExecutorService.shutdownNow
7.停止zookeeper客户端。zkWorkerClient.close
heartBeatThread大体上就是上报worker的当前资源使用情况。
ZKWorkerClient
最后我们再来看ZKWorkerClient的逻辑,它与worker的容灾有很大关系。
private ZKWorkerClient(){
init();
}
/**
* init
*/
private void init(){
// init system znode
this.initSystemZNode();
// monitor worker
this.listenerWorker();
// register worker
this.registWorker();
}
初始化过程,就是一次调用initSystemZNode、listenerWorker、registWorker。
protected void initSystemZNode(){
try {
createNodePath(getMasterZNodeParentPath());
createNodePath(getWorkerZNodeParentPath());
createNodePath(getDeadZNodeParentPath());
} catch (Exception e) {
logger.error("init system znode failed : " + e.getMessage(),e);
}
}
private void createNodePath(String zNodeParentPath) throws Exception {
if(null == zkClient.checkExists().forPath(zNodeParentPath)){
zkClient.create().creatingParentContainersIfNeeded()
.withMode(CreateMode.PERSISTENT).forPath(zNodeParentPath);
}
}
根据initSystemZNode源码,以及涉及到的三个函数来看,
就是在zookeeper中依次创建了3个节点。
值得注意的是,在worker节点初始化过程中居然会创建master相关的子节点。
下面我们先分析registWorker,我觉得就是应该先注册worker节点,在开启监听
registWorker是调用registerServer(ZKNodeType.WORKER)注册了当前节点
public String registerServer(ZKNodeType zkNodeType) throws Exception {
String registerPath = null;
String host = OSUtils.getHost();
if(checkZKNodeExists(host, zkNodeType)){
logger.error("register failure , {} server already started on host : {}" ,
zkNodeType.toString(), host);
return registerPath;
}
registerPath = createZNodePath(zkNodeType);
// handle dead server
handleDeadServer(registerPath, zkNodeType, Constants.DELETE_ZK_OP);
return registerPath;
}
registerServer先检查当前节点是否存在,存在则退出;不存在则创建节点。然后调用handleDeadServer,查找死掉的节点,然后从zk中删除。
private void listenerWorker(){
workerPathChildrenCache = new PathChildrenCache(zkClient, getZNodeParentPath(ZKNodeType.WORKER), true, defaultThreadFactory);
try {
workerPathChildrenCache.start();
workerPathChildrenCache.getListenable().addListener(new PathChildrenCacheListener() {
@Override
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) throws Exception {
switch (event.getType()) {
case CHILD_ADDED:
logger.info("node added : {}" ,event.getData().getPath());
break;
case CHILD_REMOVED:
String path = event.getData().getPath();
//find myself dead
String serverHost = getHostByEventDataPath(path);
if(checkServerSelfDead(serverHost, ZKNodeType.WORKER)){
return;
}
break;
case CHILD_UPDATED:
break;
default:
break;
}
}
});
}catch (Exception e){
logger.error("monitor worker failed : " + e.getMessage(),e);
}
}
listenerWorker就是监听worker的CHILD_REMOVED事件,之后调用checkServerSelfDead。worker本身并不会对其他worker节点的移除进行处理。
protected boolean checkServerSelfDead(String serverHost, ZKNodeType zkNodeType) {
if (serverHost.equals(OSUtils.getHost())) {
logger.error("{} server({}) of myself dead , stopping...",
zkNodeType.toString(), serverHost);
stoppable.stop(String.format(" {} server {} of myself dead , stopping...",
zkNodeType.toString(), serverHost));
return true;
}
return false;
}
checkServerSelfDead判断是否为当前节点.如果是,则调用stoppable.stop,(stoppable是在WorkerServer.run函数中设置的)
zkWorkerClient.setStoppable(this);
listenerWorker就是监听当前节点是否超时被zookeeper删除,删除后则调用stop方法退出。