背景
Azkaban在多Executor模式下,Webserver会根据当前Executor的状态,选择一个合适的Executor执行job。但是,当同一时间点,比如整点的时候,同时有多个schedule flow被调度的时候,会出现连续多个flow被调度到同一个Executor执行,其它的Executor却没有flow执行的情况。
代码解析
Azkaban webserver在启动的时候,会初始化一个名为executorInfoRefresherService的Service,专门用于在特定条件下刷新Webserver内存中Executor的状态。其部分代码逻辑如下所示。
// init executorInfoRefresherService
private ExecutorService createExecutorInfoRefresherService() {
return Executors.newFixedThreadPool(this.azkProps.getInt(
ConfigurationKeys.EXECUTORINFO_REFRESH_MAX_THREADS, 5));
}
// Refresh Executor status for all the active executors in this executorManager
private void refreshExecutors() {
final List<Pair<Executor, Future<ExecutorInfo>>> futures =
new ArrayList<>();
for (final Executor executor : this.activeExecutors.getAll()) {
// execute each executorInfo refresh task to fetch
final Future<ExecutorInfo> fetchExecutionInfo =
this.executorInfoRefresherService.submit(
() -> this.apiGateway.callForJsonType(executor.getHost(),
executor.getPort(), "/serverStatistics", null, ExecutorInfo.class));
futures.add(new Pair<>(executor,
fetchExecutionInfo));
}
......
}
当有一个flow提交到Webserver进行执行的时候,会根据设定的Comparator,权重和Executor的状态选择最佳的Executor进行执行,但是当满足一定条件的时候,会向Executor发送http请求来更新Executor的状态,具体代码如下所示。
// if we have dispatched more than maxContinuousFlowProcessed or
// It has been more then activeExecutorsRefreshWindow millisec since we
// refreshed
if (currentTime - lastExecutorRefreshTime > activeExecutorsRefreshWindow
|| currentContinuousFlowProcessed >= maxContinuousFlowProcessed) {
// Refresh executorInfo for all active Executors
refreshExecutors();
lastExecutorRefreshTime = currentTime;
currentContinuousFlowProcessed = 0;
}
// get the value of activeExecutorsRefreshWindow and maxContinuousFlowProcessed
activeExecutorsRefreshWindow =
this.azkProps.getLong(Constants.ConfigurationKeys.ACTIVE_EXECUTOR_REFRESH_IN_MS, 50000);
maxContinuousFlowProcessed =
this.azkProps.getInt(Constants.ConfigurationKeys.ACTIVE_EXECUTOR_REFRESH_IN_NUM_FLOW, 5);
public static final String ACTIVE_EXECUTOR_REFRESH_IN_MS =
"azkaban.activeexecutor.refresh.milisecinterval";
public static final String ACTIVE_EXECUTOR_REFRESH_IN_NUM_FLOW =
"azkaban.activeexecutor.refresh.flowinterval";
从上述代码,我们可以发现整点的时候,有连续多个flow被load到同一个Executor的原因在于,默认更新Executor状态的时间间隔过大,连续调度的flow数过大,因此我们可以调小这两个参数。此外,为了防止向Executor发送请求的频率变高,导致Webserver hang住的时间过长,我们可以调大createExecutorInfoRefresherService的线程数,使得同时可以往更多的executor发送更新状态的请求。
参数设置
azkaban.activeexecutor.refresh.milisecinterval=10000 # default is 50000
azkaban.activeexecutor.refresh.flowinterval=3 # default is 5
azkaban.executorinfo.refresh.maxThreads=10 # default is 5