本文基于Azkaban3.79.0代码版本Mutli Executor模式进行分析。
当azkaban.poll.model设为false(默认)的情况下,Executor的管理和flow的调度是通过ExecutorManager类进行管理的,当azkaban.poll.model设为true的情况下,就用ExecutionController类进行代替。在当前生产模式下,较为成熟的还是调用ExecutorManager类进行管理,本文接下来分析的也是ExecutorManager类中Executor的选择过程。
1. selectExecutorAndDispatchFlow函数
ExecutorManager中Executor的选择和flow的调度主要是通过 private void selectExecutorAndDispatchFlow(final ExecutionReference reference, final ExecutableFlow exflow) 方法实现的。
/* process flow with a snapshot of available Executors */
private void selectExecutorAndDispatchFlow(final ExecutionReference reference,
final ExecutableFlow exflow)
throws ExecutorManagerException {
final Set<Executor> remainingExecutors = new HashSet<>(
ExecutorManager.this.activeExecutors.getAll());
Throwable lastError;
synchronized (exflow) {
do {
final Executor selectedExecutor = selectExecutor(exflow, remainingExecutors);
if (selectedExecutor == null) {
ExecutorManager.this.commonMetrics.markDispatchFail();
handleNoExecutorSelectedCase(reference, exflow);
// RE-QUEUED - exit
return;
} else {
try {
dispatch(reference, exflow, selectedExecutor);
ExecutorManager.this.commonMetrics.markDispatchSuccess();
// SUCCESS - exit
return;
} catch (final ExecutorManagerException e) {
lastError = e;
logFailedDispatchAttempt(reference, exflow, selectedExecutor, e);
ExecutorManager.this.commonMetrics.markDispatchFail();
reference.setNumErrors(reference.getNumErrors() + 1);
// FAILED ATTEMPT - try other executors except selectedExecutor
updateRemainingExecutorsAndSleep(remainingExecutors, selectedExecutor);
}
}
} while (reference.getNumErrors() < this.maxDispatchingErrors);
// GAVE UP DISPATCHING
final String message = "Failed to dispatch queued execution " + exflow.getId() + " because "
+ "reached " + ConfigurationKeys.MAX_DISPATCHING_ERRORS_PERMITTED
+ " (tried " + reference.getNumErrors() + " executors)";
ExecutorManager.logger.error(message);
AXReportingExecutorManager.this.executionFinalizer.finalizeFlow(exflow, message, lastError);
}
}
对于selectExecutorAndDispatchFlow方法而言,其完成了以下几件事:
- 通过调用 ExecutorManager.this.activeExecutors.getAll(),获取当前内存中所有Active Executor的信息;
- 对当前的可执行flow加锁,防止被其他线程使用;
- 调用selectExecutor函数,根据当前所有的active executor和当前flow,选择最合适的executor;
- 如果最后没有选择任何executor,即selectedExecutor == null,则标记调度失败,调用handleNoExecutorSelectedCase函数将该flow放入等待队列中,等到有active executor的时候,再从队列中逐一调度flow;
- 如果找到了合适的executor,则调用dispatch函数对flow进行调度,并标记调度成功;
- 如果在这一过程中出现Exception,则将当前的Executor从Remaining Executor进行剔除,从剩下的Executor再选择一个进行调度,并将numError值加1;如果剩下的Executor为空,则调用 ExecutorManager.this.activeExecutors.getAll(),从数据库中重新获取所有 Active Executor 的信息,然后sleep一段时间后重新进行调度;
- 如果当前flow的失败次数超过设置的阈值时,则将该flow置为fail。
2. selectExecutor函数
接下来对selectExecutorAndDispatchFlow调用的selectExecutor函数进行分析。
/* Choose Executor for exflow among the available executors */
private Executor selectExecutor(final ExecutableFlow exflow,
final Set<Executor> availableExecutors) {
Executor choosenExecutor =
getUserSpecifiedExecutor(exflow.getExecutionOptions(),
exflow.getExecutionId());
// If no executor was specified by admin
if (choosenExecutor == null) {
ExecutorManager.logger.info("Using dispatcher for execution id :"
+ exflow.getExecutionId());
final ExecutorSelector selector = new ExecutorSelector(ExecutorManager.this.filterList,
ExecutorManager.this.comparatorWeightsMap);
choosenExecutor = selector.getBest(availableExecutors, exflow);
}
return choosenExecutor;
}
selectExecutor函数实现了如下逻辑:
- 首先调用getUserSpecifiedExecutor函数判断用户是否通过useExecutor参数来指定executor进行运行;
- 如果当前用户未指定executor或者用户指定的executor id并未在当前的active executor set中,即构建一个executor选择器,并根据当前各executor的资源使用情况和Compactor选择最佳executor进行调度。
3. getUserSpecifiedExecutor函数
/* Helper method to fetch overriding Executor, if a valid user has specifed otherwise return null */
private Executor getUserSpecifiedExecutor(final ExecutionOptions options,
final int executionId) {
Executor executor = null;
if (options != null
&& options.getFlowParameters() != null
&& options.getFlowParameters().containsKey(
ExecutionOptions.USE_EXECUTOR)) {
try {
final int executorId =
Integer.valueOf(options.getFlowParameters().get(
ExecutionOptions.USE_EXECUTOR));
executor = fetchExecutor(executorId);
if (executor == null) {
ExecutorManager.logger
.warn(String
.format(
"User specified executor id: %d for execution id: %d is not active, Looking up db.",
executorId, executionId));
executor = ExecutorManager.this.executorLoader.fetchExecutor(executorId);
if (executor == null) {
ExecutorManager.logger
.warn(String
.format(
"User specified executor id: %d for execution id: %d is missing from db. Defaulting to availableExecutors",
executorId, executionId));
}
}
} catch (final ExecutorManagerException ex) {
ExecutorManager.logger.error("Failed to fetch user specified executor for exec_id = "
+ executionId, ex);
}
}
return executor;
}
getUserSpecifiedExecutor函数实现了用户指定Executor进行调度的逻辑:
首先判断用户输入的flow parameters中,是否使用了useExecutor参数,如果有的话,则根据ExecutorId查看当前内存中存储的executor列表中是否含有该Executor,如果存在该Executor则使用其执行flow,如果不存在该Executor,则去数据库中查看Executor表,查看该Executor是否存在于数据库中,如果存在该Executor则使用其执行flow,如果仍然不存在该Executor的话则返回null值。
4. getBest函数
public K getBest(final Collection<K> candidateList, final V dispatchingObject) {
// shortcut if the candidateList is empty.
if (null == candidateList || candidateList.size() == 0) {
logger.error("failed to getNext candidate as the passed candidateList is null or empty.");
return null;
}
logger.debug("start candidate selection logic.");
logger.debug(String.format("candidate count before filtering: %s", candidateList.size()));
// to keep the input untouched, we will form up a new list based off the filtering result.
Collection<K> filteredList = new ArrayList<>();
if (null != this.filter) {
for (final K candidateInfo : candidateList) {
if (this.filter.filterTarget(candidateInfo, dispatchingObject)) {
filteredList.add(candidateInfo);
}
}
} else {
filteredList = candidateList;
logger.debug("skipping the candidate filtering as the filter object is not specifed.");
}
logger.debug(String.format("candidate count after filtering: %s", filteredList.size()));
if (filteredList.size() == 0) {
logger.debug("failed to select candidate as the filtered candidate list is empty.");
return null;
}
if (null == this.comparator) {
logger.debug(
"candidate comparator is not specified, default hash code comparator class will be used.");
}
// final work - find the best candidate from the filtered list.
final K executor = Collections.max(filteredList, this.comparator);
logger.debug(String.format("candidate selected %s",
null == executor ? "(null)" : executor.toString()));
return executor;
}
getBest函数主要实现的逻辑是从Executor List中根据构造的比较器comparator通过比较当前各Executor的负载情况选择最佳的Executor执行flow。此处就不再往下赘述。