ResourceManage中Slot的管理
ResourceManager资源管理器其继承了FencedRpcEndpoint实现了RPC服务,其内部组件主要包含
- 管理所有TaskExecutor上报的slot资源、申请(SlotManager)
- 为每个job任务选择出对应ha可用的JobMaster,并将该job任务分配该JobMaster服务(JobLeaderIdService)、高可用leader选举服务leaderElectionService等
- 心跳管理器taskManagerHeartbeatManager、jobManagerHeartbeatManager(HeartbeatManagerSenderImpl)
- 指标监控服务MetricRegistry等
- 所有已注册的TaskExecutors
之后其会调用ResourceManager#start()方法来启动此RM;在ResourceManager启动的回调函数中,会通过HighAvailabilityServices获取到选举服务,从而参与到选举之中。并启动JobLeaderIdService,管理向当前ResourceManager注册的作业的leader id。其主要启动服务内容如下:
在上诉简单分析了TaskExecutor中slot的管理机制,其主要针对单个TaskExecutor上的slot资源进行分配管理,而ResoureManager需要对所有注册的TaskExecutor上的slot进行统一的资源管理。所有的JobManager都是通过向ResourceManager进行资源的申请,ResourceManager会实时的根据当前集群(ResoureManager--TaskExecutor--JobManager)的计算资源使用情况将对应资源的请求"转发"给TaskExecutor进行slot资源的申请。
SlotManager
SlotManager从全局的角度维护了当前有多少个taskManager、每个taskManager有多少空闲的slot和slot等资源的使用情况。当flink作业调度执行时,根据slot分配策略为task分配执行的位置。其主要功能如下:
- 对TaskManager提供注册、取消注册、空闲退出等管理操作,注册则集群可用的slot资源变多,取消注册、空闲退出则释放资源,交还给资源管理集群。
- 对Flink作业,接收slot的请求和释放、资源汇报等。当资源不足的时候,SlotManager将资源请求暂存在等待队列中,SlotManager通知ResourceManager去申请更多的资源,启动新的taskManager,taskManager注册到SlotManager之后,SlotManager就有可用的新资源了,从等待队列中依次分配资源。
ResourceManager中是通过委托给其内部slot组件SlotManager来管理slot资源。SlotManager维护了所有已经注册的TaskExecutor上的所有slot的状态以及它们的分配情况。SlotManager还维护了所有处于等待状态的slot请求(pendingSlotRequests)。每当有一个新的slot注册或者一个已经分配的slot被释放的时候,SlotManager会尝试去满足处于等待状态的slot request。如果可用的slot不足以满足要求,SlotManager会通过ResourceActions#allocateResource(ResourceProfile)来告知ResourceManager,其会向ResourceManager来申请额外的slot资源(比如向Yarn申请额外的container资源);ResourceManager可能会尝试启动新的TaskExecutor(如Yarn模式下)。此外,长时间处于空闲状态的TaskExecutor或者长时间没有被满足的pending slot request,会触发超时机制进行处理。
SlotManager组件内部的一些比较重要的成员变量如下;其主要是对应的slot资源的状态和待处理的pending slot request:
class SlotManager {
/** Map for all registered slots. */
private final HashMap<SlotID, TaskManagerSlot> slots; // 所有的slot资源
/** Index of all currently free slots. */
private final LinkedHashMap<SlotID, TaskManagerSlot> freeSlots;
/** All currently registered task managers. */
private final HashMap<InstanceID, TaskManagerRegistration> taskManagerRegistrations; // 所有已经注册的task managers
/** Map of fulfilled and active allocations for request deduplication purposes. */
private final HashMap<AllocationID, SlotID> fulfilledSlotRequests;
/** Map of pending/unfulfilled slot allocation requests. */
private final HashMap<AllocationID, PendingSlotRequest> pendingSlotRequests;
// 当前ResourceManager资源不足的时候会通过ResourceActions#allocateResource(ResourceProfile)向Yarn(yarn cluster模式)申请新的资源
// 会可能尝试启动新的TaskManager,也可能什么也不做
// 这些新申请的资源会被封装为 PendingTaskManagerSlot
private final HashMap<TaskManagerSlotId, PendingTaskManagerSlot> pendingSlots;
/** ResourceManager's id. */
private ResourceManagerId resourceManagerId;
/** Callbacks for resource (de-)allocations. */
private ResourceActions resourceActions;
}
Slot注册及周期性心跳上报
当一个新的TaskManager向RM注册的时候,其会通过RPC方式调用ResourceManager#registerTaskExecutor()方法进行自身TaskManager的注册;主要是将自己的注册信息(注册接口、连接信息、硬件描述等)放在对应的ResourceManager#Map<ResourceID, WorkerRegistration<WorkerType>> taskExecutors对象中;并且RM向对应的TM返回对应的注册成功信息(registrationId、resourceManagerResourceId、clusterInformation)对象TaskExecutorRegistrationSuccess;TaskManager接受到RM返回响应的注册成功信息后;其会回调自己的TaskManager#TaskExecutorToResourceManagerConnection#ResourceManagerRegistrationListener#onRegistrationSuccess()注册成功回调监听函数来进行TM到RM的连接建立以及向对应的RM进行RPC接口:slot资源的上报resourceManagerGateway.sendSlotReport();
ResourceManager在接收到来自TaskExecutor进行的RPC接口调用请求:sendSlotReport slot资源信息上报的时候;其会委托给内部的组件SlotManager进行对应TaskExecutor slot资源的注册;
此外除此之外,TaskExecutor也会定期通过心跳向ResourceManager报告slot的状态。在reportSlotStatus方法中会更新slot的状态。
//SlotManager#registerTaskManager
public void registerTaskManager(final TaskExecutorConnection taskExecutorConnection, SlotReport initialSlotReport) {
checkInit();
LOG.debug("Registering TaskManager {} under {} at the SlotManager.", taskExecutorConnection.getResourceID(), taskExecutorConnection.getInstanceID());
// we identify task managers by their instance id
if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
} else {
// first register the TaskManager
ArrayList<SlotID> reportedSlots = new ArrayList<>();
for (SlotStatus slotStatus : initialSlotReport) {
reportedSlots.add(slotStatus.getSlotID());
}
// 注册记录 实例Id 与 对应的TaskManagerRegistration(连接、slot总数) 信息
TaskManagerRegistration taskManagerRegistration = new TaskManagerRegistration(
taskExecutorConnection,
reportedSlots);
taskManagerRegistrations.put(taskExecutorConnection.getInstanceID(), taskManagerRegistration);
// next register the new slots
// 依次注册所有的slot
for (SlotStatus slotStatus : initialSlotReport) {
registerSlot(
slotStatus.getSlotID(),
slotStatus.getAllocationID(),
slotStatus.getJobID(),
slotStatus.getResourceProfile(),
taskExecutorConnection);
}
}
}
// ............
private void registerSlot( // 单个slot的注册流程操作
SlotID slotId,
AllocationID allocationId,
JobID jobId,
ResourceProfile resourceProfile,
TaskExecutorConnection taskManagerConnection) {
if (slots.containsKey(slotId)) {
// remove the old slot first
removeSlot(slotId);
}
// 创建一个TaskManagerSlot对象,并加入slots中
final TaskManagerSlot slot = createAndRegisterTaskManagerSlot(slotId, resourceProfile, taskManagerConnection);
final PendingTaskManagerSlot pendingTaskManagerSlot;
if (allocationId == null) {
// 这个slot还没有被分配,则找到和当前slot的计算资源相匹配的PendingTaskManagerSlot
pendingTaskManagerSlot = findExactlyMatchingPendingTaskManagerSlot(resourceProfile);
} else {
// 这个slot已经被分配了
pendingTaskManagerSlot = null;
}
if (pendingTaskManagerSlot == null) {
// 两种可能:1、slot已经被分配了 2、没有匹配的PendingTaskManagerSlot
updateSlot(slotId, allocationId, jobId);
} else {
// 新注册的slot能够满足PendingTaskManagerSlot的要求; 尝试将该slot资源分配给当前的slot Request
pendingSlots.remove(pendingTaskManagerSlot.getTaskManagerSlotId());
final PendingSlotRequest assignedPendingSlotRequest = pendingTaskManagerSlot.getAssignedPendingSlotRequest();
// PendingTaskManagerSlot可能有关联的PedningSlotRequest
if (assignedPendingSlotRequest == null) {
handleFreeSlot(slot); // 没有关联的PedningSlotRequest,则尝试再次从pendingSlots中寻找合适的Requestslot进行分配,否则标记释放该slot为Free状态
} else {
assignedPendingSlotRequest.unassignPendingTaskManagerSlot();
allocateSlot(slot, assignedPendingSlotRequest); // 有关联的PedningSlotRequest,则这个request可以被满足,分配slot
}
}
}
// ............
private void handleFreeSlot(TaskManagerSlot freeSlot) {
Preconditions.checkState(freeSlot.getState() == TaskManagerSlot.State.FREE);
// 先查找是否有能够满足的PendingSlotReques
PendingSlotRequest pendingSlotRequest = findMatchingRequest(freeSlot.getResourceProfile());
if (null != pendingSlotRequest) {
allocateSlot(freeSlot, pendingSlotRequest); // 如果有匹配的PendingSlotRequest,则分配slot
} else {
freeSlots.put(freeSlot.getSlotId(), freeSlot);
}
}
请求Slot
ResourceManager#requestSlot会委托给组件SlotManager的registerSlotRequest(SlotRequest slotRequest)方法来请求slot资源,SlotRequest中封装了请求的JobId(表明该slot是被分配给具体的job任务),AllocationID以及请求的资源描述ResourceProfile,SlotManager会将slot request进一步封装为PendingSlotRequest,标识该slot request为一个尚未被满足要求的、等待被处理的pending slot request。
// ResourceManager#requestSlot()
public CompletableFuture<Acknowledge> requestSlot(
JobMasterId jobMasterId,
SlotRequest slotRequest,
final Time timeout) {
JobID jobId = slotRequest.getJobId();
JobManagerRegistration jobManagerRegistration = jobManagerRegistrations.get(jobId); // 获取对应任务jobid的jobmaster rpc-getwary接口代理
if (null != jobManagerRegistration) {
if (Objects.equals(jobMasterId, jobManagerRegistration.getJobMasterId())) {
log.info("Request slot with profile {} for job {} with allocation id {}.",
slotRequest.getResourceProfile(),
slotRequest.getJobId(),
slotRequest.getAllocationId());
try {
slotManager.registerSlotRequest(slotRequest); // 委托给内部的slotManager的registerSlotRequest方法进行slot资源的请求申请
} catch (SlotManagerException e) {
return FutureUtils.completedExceptionally(e);
}
return CompletableFuture.completedFuture(Acknowledge.get());
} else {
return FutureUtils.completedExceptionally(new ResourceManagerException("The job leader's id " +
jobManagerRegistration.getJobMasterId() + " does not match the received id " + jobMasterId + '.'));
}
} else {
return FutureUtils.completedExceptionally(new ResourceManagerException("Could not find registered job manager for job " + jobId + '.'));
}
}
// slotManager#registerSlotRequest(slotRequest)
public boolean registerSlotRequest(SlotRequest slotRequest) throws SlotManagerException {
checkInit();
if (checkDuplicateRequest(slotRequest.getAllocationId())) {
LOG.debug("Ignoring a duplicate slot request with allocation id {}.", slotRequest.getAllocationId());
return false;
} else {
PendingSlotRequest pendingSlotRequest = new PendingSlotRequest(slotRequest); // 将请求封装为PendingSlotRequest
pendingSlotRequests.put(slotRequest.getAllocationId(), pendingSlotRequest);
try {
internalRequestSlot(pendingSlotRequest); // 执行请求申请分配slot的具体逻辑
} catch (ResourceManagerException e) {
// requesting the slot failed --> remove pending slot request
pendingSlotRequests.remove(slotRequest.getAllocationId());
throw new SlotManagerException("Could not fulfill slot request " + slotRequest.getAllocationId() + '.', e);
}
return true;
}
}
private void internalRequestSlot(PendingSlotRequest pendingSlotRequest) throws ResourceManagerException {
final ResourceProfile resourceProfile = pendingSlotRequest.getResourceProfile();
TaskManagerSlot taskManagerSlot = findMatchingSlot(resourceProfile); // 首先从FREE状态的已注册的slot中选择符合要求的slot(cpu、heap-direct-native-network-Memory资源>请求需要的资源)
if (taskManagerSlot != null) {
allocateSlot(taskManagerSlot, pendingSlotRequest); // 找到了符合条件的slot,将该slot尝试分配给该pendingSlotRequest
} else {
// 从PendingTaskManagerSlot中选择
// 如果连PendingTaskManagerSlot中都没有
// 请求ResourceManager再次分配资源,通过ResourceActions#allocateResource(ResourceProfile)进行委托申请回调
Optional<PendingTaskManagerSlot> pendingTaskManagerSlotOptional = findFreeMatchingPendingTaskManagerSlot(resourceProfile); // 从PendingTaskManagerSlot中选择
if (!pendingTaskManagerSlotOptional.isPresent()) {
// 向RM(如Yarn)再次申请资源 再次申请container;并用java cmd命令启动对应的flink TaskManager
pendingTaskManagerSlotOptional = allocateResource(resourceProfile);
}
// 将PendingTaskManagerSlot指派给对应的PendingSlotRequest
pendingTaskManagerSlotOptional.ifPresent(pendingTaskManagerSlot -> assignPendingTaskManagerSlot(pendingSlotRequest, pendingTaskManagerSlot));
}
}
在SlotManager中具体的slot申请分配的逻辑方法为:allocateSlot(taskManagerSlot, pendingSlotRequest);其主要通过TaskExecutorGateway的RPC接口代理调用gateway.requestSlot()方法向对应的TaskExecutor申请请求分配slot资源;该TaskExecutorGateway的RPC方法gateway.requestSlot()调用源码如下:
// SlotManager#allocateSlot
private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
Preconditions.checkState(taskManagerSlot.getState() == TaskManagerSlot.State.FREE);
// 获取到对应的TaskExecutorGateway RPC接口代理
TaskExecutorConnection taskExecutorConnection = taskManagerSlot.getTaskManagerConnection();
TaskExecutorGateway gateway = taskExecutorConnection.getTaskExecutorGateway();
final CompletableFuture<Acknowledge> completableFuture = new CompletableFuture<>();
final AllocationID allocationId = pendingSlotRequest.getAllocationId();
final SlotID slotId = taskManagerSlot.getSlotId();
final InstanceID instanceID = taskManagerSlot.getInstanceId();
// taskManagerSlot状态变为PENDING
taskManagerSlot.assignPendingSlotRequest(pendingSlotRequest);
pendingSlotRequest.setRequestFuture(completableFuture);
// 如果有PendingTaskManager指派给当前pendingSlotRequest,要先解除关联
returnPendingTaskManagerSlotIfAssigned(pendingSlotRequest);
TaskManagerRegistration taskManagerRegistration = taskManagerRegistrations.get(instanceID);
if (taskManagerRegistration == null) {
throw new IllegalStateException("Could not find a registered task manager for instance id " + instanceID + '.');
}
taskManagerRegistration.markUsed();
// RPC call to the task manager
// 通过RPC调用向TaskExecutor申请请求slot资源
CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
slotId,
pendingSlotRequest.getJobId(),
allocationId,
pendingSlotRequest.getTargetAddress(),
resourceManagerId,
taskManagerRequestTimeout);
requestFuture.whenComplete( // RPC调用的请求完成
(Acknowledge acknowledge, Throwable throwable) -> {
if (acknowledge != null) {
completableFuture.complete(acknowledge);
} else {
completableFuture.completeExceptionally(throwable);
}
});
// PendingSlotRequest请求完成的回调函数(PendingSlotRequest请求完成可能是由于上面RPC调用完成,也可能是因为PendingSlotRequest被取消)
completableFuture.whenCompleteAsync(
(Acknowledge acknowledge, Throwable throwable) -> {
try {
if (acknowledge != null) { // 如果请求成功,则取消pendingSlotRequest,并更新slot状态PENDING->ALLOCATED
updateSlot(slotId, allocationId, pendingSlotRequest.getJobId());
} else {
if (throwable instanceof SlotOccupiedException) { // 这个slot已经被占用了,更新状态
SlotOccupiedException exception = (SlotOccupiedException) throwable;
updateSlot(slotId, exception.getAllocationId(), exception.getJobId());
} else {
removeSlotRequestFromSlot(slotId, allocationId); // 请求失败,将pendingSlotRequest从TaskManagerSlot中移除
}
if (!(throwable instanceof CancellationException)) {
handleFailedSlotRequest(slotId, allocationId, throwable); // slot request请求失败,会进行重试
} else {
LOG.debug("Slot allocation request {} has been cancelled.", allocationId, throwable); // 主动取消
}
}
} catch (Exception e) {
LOG.error("Error while completing the slot allocation.", e);
}
},
mainThreadExecutor);
}
取消slot请求
通过ResourceManager#cancelSlotRequest(allocationID)方法可以取消一个slot request;其内部实现会委托给组件SlotManager的unregisterSlotRequest(slotRequest)方法来取消该slotRequest的slot资源请求申请:
// SlotManager#unregisterSlotRequest
public boolean unregisterSlotRequest(AllocationID allocationId) {
checkInit();
PendingSlotRequest pendingSlotRequest = pendingSlotRequests.remove(allocationId); // 从pendingSlotRequests中移除
if (null != pendingSlotRequest) {
LOG.debug("Cancel slot request {}.", allocationId);
cancelPendingSlotRequest(pendingSlotRequest); // 取消请求
return true;
} else {
LOG.debug("No pending slot request with allocation id {} found. Ignoring unregistration request.", allocationId);
return false;
}
}
超时设置
ResourceManager在启动的时候会开启leaderElectionService.start(this)服务,其会在leader被选举出的时候回调通知LeaderContender的具体实现类(this指向ResourceManager当前自己);并调用其内部的grantLeadership()方法尝试进行leadership的接受tryAcceptLeadership(),在该方法内部会尝试启动对应的SlotManager组件;SlotManager组件在启动的时候会启动两个超时检测任务:
- 一个是对TaskManager长时间处于空闲状态的检测;
- 一个是对slot request超时的检测;
一旦TaskExecutor长时间处于空闲状态,则会通过ResourceActions#releaseResource()回调函数释放资源;如果一个slot request超时,则会取消PendingSlotRequest,并通过ResourceActions#notifyAllocationFailure()告知ResourceManager;
// SlotManager#start
public void start(ResourceManagerId newResourceManagerId, Executor newMainThreadExecutor, ResourceActions newResourceActions) {
LOG.info("Starting the SlotManager.");
this.resourceManagerId = Preconditions.checkNotNull(newResourceManagerId);
mainThreadExecutor = Preconditions.checkNotNull(newMainThreadExecutor);
resourceActions = Preconditions.checkNotNull(newResourceActions);
started = true;
// 检查TaskExecutor是否长时间处于idle状态
taskManagerTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
() -> checkTaskManagerTimeouts()), // 超时检查
0L,
taskManagerTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
// 检查slot request是否超时
slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
() -> checkSlotRequestTimeouts()), // 超时检查
0L,
slotRequestTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
}
void checkTaskManagerTimeouts() {
if (!taskManagerRegistrations.isEmpty()) {
long currentTime = System.currentTimeMillis();
ArrayList<TaskManagerRegistration> timedOutTaskManagers = new ArrayList<>(taskManagerRegistrations.size());
// first retrieve the timed out TaskManagers
for (TaskManagerRegistration taskManagerRegistration : taskManagerRegistrations.values()) {
if (currentTime - taskManagerRegistration.getIdleSince() >= taskManagerTimeout.toMilliseconds()) {
timedOutTaskManagers.add(taskManagerRegistration);
}
}
// second we trigger the release resource callback which can decide upon the resource release
for (TaskManagerRegistration taskManagerRegistration : timedOutTaskManagers) {
if (waitResultConsumedBeforeRelease) {
releaseTaskExecutorIfPossible(taskManagerRegistration);
} else {
releaseTaskExecutor(taskManagerRegistration.getInstanceId()); // 通过ResourceActions#releaseResource()回调函数 超时释放资源
}
}
}
}
private void checkSlotRequestTimeouts() {
if (!pendingSlotRequests.isEmpty()) {
long currentTime = System.currentTimeMillis();
Iterator<Map.Entry<AllocationID, PendingSlotRequest>> slotRequestIterator = pendingSlotRequests.entrySet().iterator();
while (slotRequestIterator.hasNext()) {
PendingSlotRequest slotRequest = slotRequestIterator.next().getValue();
if (currentTime - slotRequest.getCreationTimestamp() >= slotRequestTimeout.toMilliseconds()) {
slotRequestIterator.remove();
if (slotRequest.isAssigned()) {
cancelPendingSlotRequest(slotRequest); // 取消
}
resourceActions.notifyAllocationFailure( // 告知ResourceManager
slotRequest.getJobId(),
slotRequest.getAllocationId(),
new TimeoutException("The allocation could not be fulfilled in time."));
}
}
}
}
ResourceManager
ResourceManager作为统一的slot资源管理分配器,其具体实际上的slot资源管理是委托给内部组件SlotManager来进行的;其管理TaskExecutor注册上报的所有slot资源;虽然在ResourceManager内部具体的slot资源是委托给SlotManager进行的;但ResourceManager本身需要与外部组件进行slot资源上的管理交互,其对外提供RPC调用方法,从而将slot管理相关的方法暴露提供给外部组件JobMaster和TaskExecutor。
RPC接口:ResourceManager提供的slot管理相关的RPC方法如下;其中,requestSlot和cancelSlotRequest主要供JobMaster进行调用,而sendSlotReport和notifySlotAvailable则主要供TaskExecutor调用。ResourceManager在接收到对应的slot RPC调用后,会委托SlotManager完成具体的工作。
interface ResouceManagerGateway {
CompletableFuture<Acknowledge> requestSlot( // Sent by the JobMaster to Requests a slot from the resource manager.
JobMasterId jobMasterId,
SlotRequest slotRequest,
@RpcTimeout Time timeout);
void cancelSlotRequest(AllocationID allocationID); // Sent by the JobMaster to Cancel the slot allocation requests from the resource manager.
CompletableFuture<Acknowledge> sendSlotReport( // Sent by the TaskExecutor to Sends the given {@link SlotReport} to the ResourceManager.
ResourceID taskManagerResourceId,
InstanceID taskManagerRegistrationId,
SlotReport slotReport,
@RpcTimeout Time timeout);
void notifySlotAvailable( // Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.
InstanceID instanceId,
SlotID slotID,
AllocationID oldAllocationId);
}
动态资源管理:ResourceManager支持动态管理TaskExecutor计算资源,从而可以更好地和Yarn、Mesos、Kubernetes等框架进行集成、动态管理计算资源。在SlotManager#请求Slot的时候提到过:
- 如果当前注册的slot不能满足slot request的要求,那么SlotManager会通过ResourceActions#allocateResource回调告知当前flink内部的ResourceManager组件;使其向具体的外部资源管理框架(Yarn等)进行计算资源的申请(container);
- 当一个SlotManager检查到一个TaskExecutor长时间处于Idle状态时,也会通过ResourceActions#releaseResource回调告知当前flink内部的ResourceManager组件;使其向具体的外部资源管理框架(Yarn等)进行计算资源的回收释放(container);
通过这两个ResourceActions相关的allocateResource、releaseResource回调,ResourceManager 就可以动态申请资源及释放资源:
// ResourceManager#ResourceActionsImpl类
private class ResourceActionsImpl implements ResourceActions {
@Override // 释放资源
public void releaseResource(InstanceID instanceId, Exception cause) {
validateRunsInMainThread();
ResourceManager.this.releaseResource(instanceId, cause); // 调用具体ResourceManager的releaseResource方法
}
@Override // 申请新的资源,具体行为和不同的ResourceManager的实现有关。其返回的列表相当于是承诺即将分配的资源(在Yarn模式中,就是requestYarnContainer,申请container并启动对应的TaskManager)
public Collection<ResourceProfile> allocateResource(ResourceProfile resourceProfile) {
validateRunsInMainThread();
return startNewWorker(resourceProfile);
}
@Override
public void notifyAllocationFailure(JobID jobId, AllocationID allocationId, Exception cause) {
validateRunsInMainThread();
JobManagerRegistration jobManagerRegistration = jobManagerRegistrations.get(jobId);
if (jobManagerRegistration != null) {
jobManagerRegistration.getJobManagerGateway().notifyAllocationFailure(allocationId, cause);
}
}
}
// ResourceManager#releaseResource()
protected void releaseResource(InstanceID instanceId, Exception cause) {
WorkerType worker = null;
// TODO: Improve performance by having an index on the instanceId
for (Map.Entry<ResourceID, WorkerRegistration<WorkerType>> entry : taskExecutors.entrySet()) {
if (entry.getValue().getInstanceID().equals(instanceId)) {
worker = entry.getValue().getWorker();
break;
}
}
if (worker != null) {
if (stopWorker(worker)) { // 释放停止对应的worker,并关闭到对应TaskManager的连接; stopWorker(worker)的具体实现和不同的ResourceManager的实现有关
closeTaskManagerConnection(worker.getResourceID(), cause);
} else {
log.debug("Worker {} could not be stopped.", worker.getResourceID());
}
} else {
// unregister in order to clean up potential left over state
slotManager.unregisterTaskManager(instanceId);
}
}
// ResourceManager中提供的抽象方法;交由具体实现类去执行对应worker的具体操作
public abstract Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile);
public abstract boolean stopWorker(WorkerType worker);
在ResourceManager中的abstract抽象方法startNewWorker(ResourceProfile resourceProfile)和stopWorker(WorkerType worker)这两个抽象方法是实现动态申请和释放资源的执行关键。对Standalone模式而言,TaskExecutor是固定的,不支持动态启动和释放;而对于在Yarn上运行的Flink,YarnResourceManager中这两个方法的具体实现就涉及到启动新的container和释放已经申请的container:
// YarnResourceManager
public Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile) {
Preconditions.checkArgument(ResourceProfile.UNKNOWN.equals(resourceProfile), "The YarnResourceManager does not support custom ResourceProfiles yet. It assumes that all containers have the same resources.");
// 向Yarn申请container资源; 申请成功后通过异步回调onContainersAllocated()方法来构建ContainerLaunchContext启动上下文taskExecutorLaunchContext;
// 包含对应启动TaskExecutor的java cmd指令等;并交由对应的yarn nodeManagerClient.startContainer()去进行对应container taskExecutor进程的启动
requestYarnContainer();
return slotsPerWorker;
}
public boolean stopWorker(final YarnWorkerNode workerNode) {
final Container container = workerNode.getContainer();
log.info("Stopping container {}.", container.getId());
try {
nodeManagerClient.stopContainer(container.getId(), container.getNodeId()); // 停止并释放container
} catch (final Exception e) {
log.warn("Error while calling YARN Node Manager to stop container", e);
}
resourceManagerClient.releaseAssignedContainer(container.getId());
workerNodeMap.remove(workerNode.getResourceID());
return true;
}