研究调度器,我们首先要知道调度器的客户是谁,谁需要调度,有以下两种来源:
(1)作业提交的时候,申请运行AppMaster需要的容器时
(2)作业运行期间,AppMaster申请运行map以及reduce任务,或者spark任务等等所需要的容器
这一节主要分析第一种请求的源码,即申请AppMaster需要的容器时候的整个过程。申请运行AppMaster需要的容器的起源:
当AppMaster运行需要的容器是在RMAppAttempt对象处于Submttied状态,收到ATTEMPT_ADDED事件,然后调用SchedulerTransition()进行申请的:
RMAppAttempt的存在的实例是RMAppAttemptImpl,对应申请过程的状态机跳变事件的代码为:
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
if (!subCtx.getUnmanagedAM()) {
// Need reset #containers before create new attempt, because this request
// will be passed to scheduler, and scheduler will deduct the number after
// AM container allocated
// Currently, following fields are all hard coded,
// TODO: change these fields when we want to support
// priority or multiple containers AM container allocation.
for (ResourceRequest amReq : appAttempt.amReqs) {
amReq.setNumContainers(1);
amReq.setPriority(AM_CONTAINER_PRIORITY);
}
设置好容器个数优先级参数后,进行容器的申请,所谓的getUnmanagedAM是判断AM是否是通过RM进行申请的,如果是就是正常的AM,如果不是说明AM是用户自己通过命令行起起来的而不通过RM进行申请。
// AM resource has been checked when submission
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(
appAttempt.applicationAttemptId,
appAttempt.amReqs,
EMPTY_CONTAINER_RELEASE_LIST,
amBlacklist.getAdditions(),
amBlacklist.getRemovals());
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
根据所选择的调度器的不同,调用对应调度器的allocate函数,这里调用的是FairScheduler的allocate函数,如果该RMAppAttempt的allocate没有收揽到容器(amContainerAllocation.getContainers().size() == 0),那么将会停留在RMAppAttemptState.SCHEDULED。appAttempt.scheduler.allocate()这个函数中,一旦收揽到了容器,会把触发RMContainerEventType.ACQUIRED事件,从而推动RMAppAttempt的状态机事件RMAppAttemptState.CONTAINER_ALLOCATED,状态机的之间都是互相推动相辅相成。
所谓allocate函数只不过是从新分配容器的列表中,收揽过来RMContainer对象,并为这些容器办理NMToken等对象,然后打包成一个Allocation对象。
@Override
public Allocation allocate(ApplicationAttemptId appAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) {
// Make sure this application exists
FSAppAttempt application = getSchedulerApp(appAttemptId);
if (application == null) {
LOG.info("Calling allocate on removed " +
"or non existant application " + appAttemptId);
return EMPTY_ALLOCATION;
}
// Sanity check
SchedulerUtils.normalizeRequests(ask, DOMINANT_RESOURCE_CALCULATOR,
getClusterResource(), minimumAllocation, getMaximumResourceCapability(),
incrAllocation);
// Record container allocation start time
application.recordContainerRequestTime(getClock().getTime());
// Release containers
releaseContainers(release, application);
synchronized (application) {
if (!ask.isEmpty()) {
if (LOG.isDebugEnabled()) {
LOG.debug("allocate: pre-update" +
" applicationAttemptId=" + appAttemptId +
" application=" + application.getApplicationId());
}
application.showRequests();
// Update application requests
application.updateResourceRequests(ask);
application.showRequests();
}
Set<ContainerId> preemptionContainerIds =
application.getPreemptionContainerIds();
if (LOG.isDebugEnabled()) {
LOG.debug(
"allocate: post-update" + " applicationAttemptId=" + appAttemptId
+ " #ask=" + ask.size() + " reservation= " + application
.getCurrentReservation());
LOG.debug("Preempting " + preemptionContainerIds.size()
+ " container(s)");
}
if (application.isWaitingForAMContainer(application.getApplicationId())) {
// Allocate is for AM and update AM blacklist for this
application.updateAMBlacklist(
blacklistAdditions, blacklistRemov