Flink JobManager中Slot的计算资源管理

JobManager中Slot的管理

        相比于TaskExecutor和ResourceManager,JobManager中资源管理的部分可能要相对更为复杂一下,这主要是由于Flink允许通过SlotSharingGroup和CoLocationGroup约束使得多个子任务在相同的的slot中运行。在JobMaster中,主要通过SlotPool和ResourceManager及TaskExecutor进行通信,并管理分配给当前JobMaster的slot;而具体到当前Job的所有子任务的调度和资源分配,则主要依赖Scheduler和SlotSharingManager。其主要的资源调度分配流程如下:

JobManager与其他组件之间的交互:

  • Scheduler -> SlotPool:调度器向SlotPool申请资源
  • SlotPool -> ResourceManager:SlotPool如果无法满足资源请求,向RM发起申请
  • JobMaster -> SlotPool:从TaskManager获取的资源通过JobMaster分配给SlotPool

 

AllocatedSlot、LogicalSlot、MultiTaskSlot

        首先要区分一下AllocatedSlot和LogicalSlot这两个概念:AllocatedSlot表征的是物理意义上TaskExecutor上的一个slot资源,而LogicalSlot表征逻辑上的一个slot,一个task可以部署到一个LogicalSlot上,但它和物理上AllocatedSlot所代表的一个具体的slot并不是一一对应的。由于资源共享等机制的存在,多个LogicalSlot可能被映射到同一个AllocatedSlot上。

        AllocatedSlot继承实现了接口SlotContext;其表示JobMaster从TaskExecutor申请获取分配的slot插槽资源,代表了TaskExecutor实际物理上分配的一部分slot资源:

class AllocatedSlot implements SlotContext {
   /** The ID under which the slot is allocated. Uniquely identifies the slot. */
   private final AllocationID allocationId;
   /** The location information of the TaskManager to which this slot belongs */
   private final TaskManagerLocation taskManagerLocation;
   /** The resource profile of the slot provides */
   private final ResourceProfile resourceProfile;
   /** RPC gateway to call the TaskManager that holds this slot */
   private final TaskManagerGateway taskManagerGateway;
   /** The number of the slot on the TaskManager to which slot belongs. Purely informational. */
   private final int physicalSlotNumber;
   private final AtomicReference<Payload> payloadReference;
   
   public boolean tryAssignPayload(Payload payload) {
       return payloadReference.compareAndSet(null, payload);
   }

   interface Payload {  // Payload which can be assigned to an {@link AllocatedSlot}.
      void release(Throwable cause); // Releases the payload
   }
}

LogicSlot接口和它的实现类SingleLogicalSlot:

public interface LogicalSlot {
	TaskManagerLocation getTaskManagerLocation();
	TaskManagerGateway getTaskManagerGateway();
 
	int getPhysicalSlotNumber();
	AllocationID getAllocationId();
	SlotRequestId getSlotRequestId();
	Locality getLocality();
    @Nullable
	SlotSharingGroupId getSlotSharingGroupId();

	CompletableFuture<?> releaseSlot(@Nullable Throwable cause);
	boolean tryAssignPayload(Payload payload);
	Payload getPayload();

	interface Payload {  // Payload for a logical slot.
       void fail(Throwable cause);
       CompletableFuture<?> getTerminalStateFuture();
	}
}

public class SingleLogicalSlot implements LogicalSlot, AllocatedSlot.Payload {
	private final SlotRequestId slotRequestId;
	private final SlotContext slotContext;

	// null if the logical slot does not belong to a slot sharing group, otherwise non-null
	@Nullable
	private final SlotSharingGroupId slotSharingGroupId;

	// locality of this slot wrt the requested preferred locations
	private final Locality locality;

	// owner of this slot to which it is returned upon release
	private final SlotOwner slotOwner;
	private final CompletableFuture<Void> releaseFuture;
	private volatile State state;

	// LogicalSlot.Payload of this slot
	private volatile Payload payload;
}

        SingleLogicalSlot实现了AllocatedSlot.Payload接口,也就是说SingleLogicalSlot可以作为payload被分配给AllocatedSlot。类似地,LogicalSlot同样规定了其所能承载的payload负载信息,LogicalSlot.Payload接口的实现类是Execution,也就是需要被调度执行的一个task。

        同样在LogicalSlot中可以看到有两个申请的ID:AllocationID和SlotRequestID;其区别是:AllocationID是用来区分物理内存的分配,它总是和AllocatedSlot向关联的;而SlotRequestID是任务调度执行的时候请求LogicalSlot,是和LogicalSlot关联的。

        为了实现slot资源的共享,需要把多个LogicalSlot映射到同一个AllocatedSlot上,这个映射的具体实现方式为:其额外引入了AllocatedSlot.Payload接口的另一个实现:SlotSharingManager的内部类SlotSharingManager.MultiTaskSlot。MultiTaskSlot和SingleTaskSlot的公共父类是TaskSlot,通过构造一个由TaskSlot构成的树形结构来实现slot共享和CoLocationGroup的强制约束。MultiTaskSlot对应树形结构的内部节点,它可以包含多个子节点(可以是MultiTaskSlot,也可以是SingleTaskSlot);而SingleTaskSlot对应树形结构的叶子结点。树的根节点是MultiTaskSlot,根节点会被分配一个SlotContext,SlotContext代表了其所分配的TaskExecutor中的一个物理slot,这棵树中所有的任务都会在同一个slot中运行。一个MultiTaskSlot可以包含多个叶子节点,只要用来区分这些叶子节点TaskSlot的AbstractID不同即可(可能是JobVertexID,也可能是CoLocationGroup 的ID)。

TaskSlot

class TaskSlot {
	private final SlotRequestId slotRequestId; // 每个TaskSlot都有一个相关联的slotRequestId

	// all task slots except for the root slots have a group id assigned
	// 除了root节点,每个节点都有一个groupId用来区分一个TaskSlot。可能是JobVertexID,也可能是CoLocationGroup的ID
	@Nullable
	private final AbstractID groupId;

	public boolean contains(AbstractID groupId) {
        return Objects.equals(this.groupId, groupId);
	}

	public abstract void release(Throwable cause);
}

       MultiTaskSlot继承了TaskSlot,MultiTaskSlot可以有多个子节点。MultiTaskSlot可以作为根节点,也可以作为内部节点。MultiTaskSlot也实现了AllocatedSlot.Payload接口,可以分配给AllocatedSlot(在作为根节点的情况下):

public final class MultiTaskSlot extends TaskSlot implements AllocatedSlot.Payload {
	private final Map<AbstractID, TaskSlot> children;

	// the root node has its parent set to null
	@Nullable
	private final MultiTaskSlot parent;

	// underlying allocated slot
	private final CompletableFuture<? extends SlotContext> slotContextFuture;

	// slot request id of the allocated slot
	@Nullable
	private final SlotRequestId allocatedSlotRequestId;
}

        SingleTaskSlot只能作为叶子节点,它拥有一个LogicalSlot,后续可以用来分配具体的task:

public final class SingleTaskSlot extends TaskSlot {
   private final MultiTaskSlot parent;

   // future containing a LogicalSlot which is completed once the underlying SlotContext future is completed
   private final CompletableFuture<SingleLogicalSlot> singleLogicalSlotFuture;
   
   private SingleTaskSlot(
      SlotRequestId slotRequestId,
      AbstractID groupId,
      MultiTaskSlot parent,
      Locality locality) {
   super(slotRequestId, groupId);
   this.parent = Preconditions.checkNotNull(parent);
   Preconditions.checkNotNull(locality);
   singleLogicalSlotFuture = parent.getSlotContextFuture()
      .thenApply(
         (SlotContext slotContext) -> {
            LOG.trace("Fulfill single task slot [{}] with slot [{}].", slotRequestId, slotContext.getAllocationId());
            return new SingleLogicalSlot(  // LogicalSlot用来分配具体的task
               slotRequestId,
               slotContext,
               slotSharingGroupId,
               locality,
               slotOwner);
         });
    }
}

       对于普通的SlotShargingGroup的约束,形成的树形结构是:MultiTaskSlot作为根节点,多个SingleTaskSlot作为叶子节点,这些叶子节点分别代表不同的任务,用来区分它们的JobVertextID不同。对于CoLocationGroup强制约束,会在MultiTaskSlot根节点的下一级创建一个MultiTaskSlot节点(用CoLocationGroup ID)来区分,同一个CoLocationGroup约束下的子任务进一步作为第二层MultiTaskSlot的叶子节点。

SlotPool

       JobManager使用SlotPool来向ResourceManager申请slot,并管理所有分配给该JobManager的slots。这里所说的slot都是指的是TaskExecutor上的物理AllocatedSlot。SlotPool接口的唯一实现类是SlotPoolImpl,其主要的成员变量如下:

class SlotPoolImpl implements SlotPool {
	/** The book-keeping of all allocated slots. */
	private final AllocatedSlots allocatedSlots;     // 所有分配给当前JobManager的slots

	/** The book-keeping of all available slots. */
	private final AvailableSlots availableSlots; 	// 所有可用的slots(已经分配给该JobManager,但还没有装载payload)

	/** All pending requests waiting for slots. */
	private final DualKeyMap<SlotRequestId, AllocationID, PendingRequest> pendingRequests; // 所有处于等待状态的slot request(已经发送请求给ResourceManager)

	/** The requests that are waiting for the resource manager to be connected. */
	private final HashMap<SlotRequestId, PendingRequest> waitingForResourceManager; // 处于等待状态的slot request(还没有发送请求给ResourceManager,此时没有和ResourceManager建立连接)
}

        每一个分配给SlotPool的slot都通过AllocationID进行唯一区分。getAvailableSlotsInformation方法可以获取当前可用的slots(还没有payload),而后可以通过allocateAvailableSlot将特定AllocationID关联的AllocatedSlot分配给指定的SlotRequestID对应的请求:

class SlotPoolImpl implements SlotPool {
	@Override
	public Collection<SlotInfo> getAvailableSlotsInformation() { // 列出当前可用的slot
        return availableSlots.listSlotInfo();
	}

	@Override
	public Optional<PhysicalSlot> allocateAvailableSlot(  // 将allocationID关联的slot分配给slotRequestId对应的请求
        @Nonnull SlotRequestId slotRequestId,
        @Nonnull AllocationID allocationID) {
        componentMainThreadExecutor.assertRunningInMainThread();
        
        AllocatedSlot allocatedSlot = availableSlots.tryRemove(allocationID);   // 从availableSlots中移除
        if (allocatedSlot != null) {
            allocatedSlots.add(slotRequestId, allocatedSlot); // 加入已分配的映射关系中
            return Optional.of(allocatedSlot);
        } else {
            return Optional.empty();
        }
	}
}

如果当前没有可用的slot,则可以要求SlotPool向ResourceManager进行申请:

class SlotPoolImpl implements SlotPool {
    public CompletableFuture<PhysicalSlot> requestNewAllocatedSlot(  // 向RM申请新的slot
        @Nonnull SlotRequestId slotRequestId,
        @Nonnull ResourceProfile resourceProfile,
        Time timeout) {
       componentMainThreadExecutor.assertRunningInMainThread();  // 构造一个PendingRequest
       final PendingRequest pendingRequest = PendingRequest.createStreamingRequest(slotRequestId, resourceProfile);
       // register request timeout
       // ............
       return requestNewAllocatedSlotInternal(pendingRequest)
          .thenApply((Function.identity()));
    }
    
    private CompletableFuture<AllocatedSlot> requestNewAllocatedSlotInternal(PendingRequest pendingRequest) {
       if (resourceManagerGateway == null) {
          stashRequestWaitingForResourceManager(pendingRequest); 	// 如果当前没有和RM建立连接,则需要等待RM建立连接
       } else {
          requestSlotFromResourceManager(resourceManagerGateway, pendingRequest); // 当前已经和RM建立了连接,向RM申请slot
       }
       return pendingRequest.getAllocatedSlotFuture();
    }
    
   // 如果当前没有和RM建立连接,则需要等待RM建立连接,加入waitingForResourceManager
	// 一旦和RM建立连接,就会向RM发送请求
	private void stashRequestWaitingForResourceManager(final PendingRequest pendingRequest) {
       log.info("Cannot serve slot request, no ResourceManager connected. " +
          "Adding as pending request [{}]",  pendingRequest.getSlotRequestId());
       waitingForResourceManager.put(pendingRequest.getSlotRequestId(), pendingRequest);
	}
 
    // 当前已经和RM建立了连接,向RM申请slot
     private void requestSlotFromResourceManager(
          final ResourceManagerGateway resourceManagerGateway,
          final PendingRequest pendingRequest) {
       checkNotNull(resourceManagerGateway);
       checkNotNull(pendingRequest);
       log.info("Requesting new slot [{}] and profile {} from resource manager.", pendingRequest.getSlotRequestId(), pendingRequest.getResourceProfile());
    
       final AllocationID allocationId = new AllocationID(); // 生成一个AllocationID,后面分配的slot通过AllocationID区分
       pendingRequests.put(pendingRequest.getSlotRequestId(), allocationId, pendingRequest); // 添加到等待处理的请求中
    
       pendingRequest.getAllocatedSlotFuture().whenComplete(
          (AllocatedSlot allocatedSlot, Throwable throwable) -> {
             if (throwable != null || !allocationId.equals(allocatedSlot.getAllocationId())) {
                // cancel the slot request if there is a failure or if the pending request has
                // been completed with another allocated slot
                resourceManagerGateway.cancelSlotRequest(allocationId);
             }
          });
       // 通过RPC调用向RM请求slot,RM对于resourceManagerGateway.requestSlot的处理流程在前面已经介绍过
       CompletableFuture<Acknowledge> rmResponse = resourceManagerGateway.requestSlot(
          jobMasterId,
          new SlotRequest(jobId, allocationId, pendingRequest.getResourceProfile(), jobManagerAddress),
          rpcTimeout);
    
       FutureUtils.whenCompleteAsyncIfNotDone(
          rmResponse,
          componentMainThreadExecutor,
          (Acknowledge ignored, Throwable failure) -> {
             // on failure, fail the request future
             if (failure != null) {
                slotRequestToResourceManagerFailed(pendingRequest.getSlotRequestId(), failure);
             }
          });
      }
}

         JobMaster通过SlotPool向ResourceManager进行slot资源的申请;ResourceManager在接收到来自JobMaster#SlotPool的RPC请求resourceManagerGateway.requestSlot()后;RM会委托给其内部组件slotManager去进行对应slot资源的申请slotManager.registerSlotRequest(slotRequest);在RM#slotManager申请内部其又会通过taskExecutorGateway.requestSlot()的RPC接口向TaskExecutor进行具体slot资源的申请;在TaskExecutor接受到来自RM的slot资源申请请求后,其会检查自己taskSlotTable内的空闲资源并进行slot的分配处理流程;在成功分配到资源后,TaskExecutor会将该jobId所申请到的slot资源通过TaskExecutor#offerSlotsToJobManager(jobId)上报给对应的JobMaster,其主要通过RPC方式调用jobMasterGateway.offerSlots()将分配的slot提供给JobMaster,最终JobMaster#SlotPool.offerSlots()方法会被调用:

class SlotPoolImpl {
    // 向SlotPool分配slot,返回已经被接受的slot集合。没有被接受的slot,RM可以分配给其他Job。
    public CompletableFuture<Collection<SlotOffer>> offerSlots(
          final ResourceID taskManagerId,
          final Collection<SlotOffer> slots,
          final Time timeout) {
       Tuple2<TaskManagerLocation, TaskExecutorGateway> taskManager = registeredTaskManagers.get(taskManagerId);
       if (taskManager == null) {
          return FutureUtils.completedExceptionally(new Exception("Unknown TaskManager " + taskManagerId));
       }
       final TaskManagerLocation taskManagerLocation = taskManager.f0;
       final TaskExecutorGateway taskExecutorGateway = taskManager.f1;
       final RpcTaskManagerGateway rpcTaskManagerGateway = new RpcTaskManagerGateway(taskExecutorGateway, getFencingToken());
       return CompletableFuture.completedFuture(
          // SlotPool可以确定是否接受每一个slot(accepted or rejected by returning the collection of accepted slot offers.)
          slotPool.offerSlots(
             taskManagerLocation,
             rpcTaskManagerGateway,
             slots));
    }
    
	boolean offerSlot(
      final TaskManagerLocation taskManagerLocation,
      final TaskManagerGateway taskManagerGateway,
      final SlotOffer slotOffer) {
       componentMainThreadExecutor.assertRunningInMainThread();
       // check if this TaskManager is valid
       final ResourceID resourceID = taskManagerLocation.getResourceID();
       final AllocationID allocationID = slotOffer.getAllocationId();
       if (!registeredTaskManagers.contains(resourceID)) {
          log.debug("Received outdated slot offering [{}] from unregistered TaskManager: {}",
                slotOffer.getAllocationId(), taskManagerLocation);
          return false;
       }
    
       // check whether we have already using this slot  // 如果当前slot关联的AllocationID已经在SlotPool中出现
       AllocatedSlot existingSlot;
       if ((existingSlot = allocatedSlots.get(allocationID)) != null ||
          (existingSlot = availableSlots.get(allocationID)) != null) {
              
          // we need to figure out if this is a repeated offer for the exact same slot,
          // or another offer that comes from a different TaskManager after the ResourceManager
          // re-tried the request
    
          // we write this in terms of comparing slot IDs, because the Slot IDs are the identifiers of
          // the actual slots on the TaskManagers
          // Note: The slotOffer should have the SlotID
          final SlotID existingSlotId = existingSlot.getSlotId();
          final SlotID newSlotId = new SlotID(taskManagerLocation.getResourceID(), slotOffer.getSlotIndex());
    
          if (existingSlotId.equals(newSlotId)) { // 这个slot在之前已经被SlotPool接受了,相当于TaskExecutor发送了一个重复的offer
             log.info("Received repeated offer for slot [{}]. Ignoring.", allocationID);
             // return true here so that the sender will get a positive acknowledgement to the retry
             // and mark the offering as a success
             return true;
          } else {       // 已经有一个其他的AllocatedSlot和这个AllocationID关联了,因此不能接受当前的这个slot
             // the allocation has been fulfilled by another slot, reject the offer so the task executor
             // will offer the slot to the resource manager
             return false;
          }
       }
       // 这个slot关联的AllocationID此前没有出现过; 新建一个AllocatedSlot对象,表示新分配的slot
       final AllocatedSlot allocatedSlot = new AllocatedSlot(
          allocationID,
          taskManagerLocation,
          slotOffer.getSlotIndex(),
          slotOffer.getResourceProfile(),
          taskManagerGateway);
    
       // check whether we have request waiting for this slot 	// 检查是否有一个request和这个AllocationID关联
       PendingRequest pendingRequest = pendingRequests.removeKeyB(allocationID);
       if (pendingRequest != null) {
          // we were waiting for this!        // 有一个pending request正在等待这个slot
          allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);
          // 尝试去完成那个等待的请求  
          if (!pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot)) {  // 失败了
             // we could not complete the pending slot future --> try to fulfill another pending request
             allocatedSlots.remove(pendingRequest.getSlotRequestId());
             tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);          // 尝试去满足其他在等待的请求
          } else {
             log.debug("Fulfilled slot request [{}] with allocated slot [{}].", pendingRequest.getSlotRequestId(), allocationID);
          }
       }
       else {  // 没有请求在等待这个slot,可能请求已经被满足了; 尝试去满足其他在等待的请求
          // we were actually not waiting for this:
          //   - could be that this request had been fulfilled
          //   - we are receiving the slots from TaskManagers after becoming leaders
          tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);
       }
       // we accepted the request in any case. slot will be released after it idled for
       // too long and timed out
       return true;
    }
}

一旦有新的可用的AllocatedSlot的时候,SlotPoolImpl会尝试用这个AllocatedSlot去提前满足其他还在等待响应的请求:

class SlotManagerImpl implements SlotPool {
    private void tryFulfillSlotRequestOrMakeAvailable(AllocatedSlot allocatedSlot) {
       Preconditions.checkState(!allocatedSlot.isUsed(), "Provided slot is still in use.");
       // 查找和当前AllocatedSlot的计算资源相匹配的还在等待的请求
       final PendingRequest pendingRequest = pollMatchingPendingRequest(allocatedSlot); // 查找pending请求
       if (pendingRequest != null) {
          log.debug("Fulfilling pending slot request [{}] early with returned slot [{}]",
             pendingRequest.getSlotRequestId(), allocatedSlot.getAllocationId());
    
          allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);  // 如果有匹配的请求,那么将AllocatedSlot分配给等待的请求
          pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot);
       } else {
          log.debug("Adding returned slot [{}] to available slots", allocatedSlot.getAllocationId());
          availableSlots.add(allocatedSlot, clock.relativeTimeMillis());         // 如果没有,那么这个AllocatedSlot变成available的
       }
    }
    
    private PendingRequest pollMatchingPendingRequest(final AllocatedSlot slot) { // 查找和当前AllocatedSlot的计算资源相匹配的还在等待的请求
       final ResourceProfile slotResources = slot.getResourceProfile();
       // try the requests sent to the resource manager first
       for (PendingRequest request : pendingRequests.values()) {
          if (slotResources.isMatching(request.getResourceProfile())) {
             pendingRequests.removeKeyA(request.getSlotRequestId());
             return request;
          }
       }
       // try the requests waiting for a resource manager connection next
       for (PendingRequest request : waitingForResourceManager.values()) {
          if (slotResources.isMatching(request.getResourceProfile())) {
             waitingForResourceManager.remove(request.getSlotRequestId());
             return request;
          }
       }
       // no request pending, or no request matches
       return null;
    }
}

       slotPool启动的时候会开启一个定时调度的任务,周期性地检查空闲的slot;如果slot空闲时间过长,其会通过RPC接口taskManagerGateway.freeSlot()将该slot归还给TaskManager:

//  class SlotPoolImpl implements SlotPool
public void start(
   @Nonnull JobMasterId jobMasterId,
   @Nonnull String newJobManagerAddress,
   @Nonnull ComponentMainThreadExecutor componentMainThreadExecutor) throws Exception {
       
   this.jobMasterId = jobMasterId;
   this.jobManagerAddress = newJobManagerAddress;
   this.componentMainThreadExecutor = componentMainThreadExecutor;

   scheduleRunAsync(this::checkIdleSlot, idleSlotTimeout);                  // 检查空闲的slot,并将其归还给TaskManager
   scheduleRunAsync(this::checkBatchSlotTimeout, batchSlotTimeout);

   if (log.isDebugEnabled()) {
      scheduleRunAsync(this::scheduledLogStatus, STATUS_LOG_INTERVAL_MS, TimeUnit.MILLISECONDS);
   }
}

/**
 * Check the available slots, release the slot that is idle for a long time.
 */
protected void checkIdleSlot() {
   // The timestamp in SlotAndTimestamp is relative
   final long currentRelativeTimeMillis = clock.relativeTimeMillis();

   final List<AllocatedSlot> expiredSlots = new ArrayList<>(availableSlots.size());
   for (SlotAndTimestamp slotAndTimestamp : availableSlots.availableSlots.values()) {
      if (currentRelativeTimeMillis - slotAndTimestamp.timestamp > idleSlotTimeout.toMilliseconds()) {
         expiredSlots.add(slotAndTimestamp.slot);
      }
   }

   final FlinkException cause = new FlinkException("Releasing idle slot.");
   
   for (AllocatedSlot expiredSlot : expiredSlots) {
      final AllocationID allocationID = expiredSlot.getAllocationId();
      if (availableSlots.tryRemove(allocationID) != null) {
         log.info("Releasing idle slot [{}].", allocationID);
         final CompletableFuture<Acknowledge> freeSlotFuture = expiredSlot.getTaskManagerGateway().freeSlot(  // 将空闲的slot归还给TaskManager
            allocationID,
            cause,
            rpcTimeout);

         FutureUtils.whenCompleteAsyncIfNotDone(
            freeSlotFuture,
            componentMainThreadExecutor,
            (Acknowledge ignored, Throwable throwable) -> {
               if (throwable != null) {
                  // The slot status will be synced to task manager in next heartbeat.
                  log.debug("Releasing slot [{}] of registered TaskExecutor {} failed. Discarding slot.",
                           allocationID, expiredSlot.getTaskManagerId(), throwable);
               }
            });
      }
   }

   scheduleRunAsync(this::checkIdleSlot, idleSlotTimeout);
}

Scheduler和SlotSharingManager

        在JobMaster中SlotPool主要负责的是分配给当前JobMaster的PhysicalSlot的AllocatedSlot管理。但是,具体到每一个Task所需要的计算资源的调度和管理,是按照LogicalSlot进行组织的,不同的Task所分配的LogicalSlot各不相同,但它们底层所对应的TaskExecutor上的PhysicalSlot物理AllocatedSlot可能是同一个。主要的逻辑都封装在SlotSharingManager和Scheduler中。在前面已经提到过,通过构造一个由TaskSlot构成的树形结构可以实现SlotSharingGroup内的资源共享以及CoLocationGroup的强制约束,这主要就是通过SlotSharingManager来完成的。每一个SlotSharingGroup都会有一个与其对应的SlotSharingManager。

在Flink中有两种共享组:

  1. SlotSharingGroup:非强制性共享约束,slot共享根据组内的JobVertices ID查找是否有可以共享的Slot,只要确保相同的JobVertext ID不能出现在一个共享的slot内即可。在符合资源要求的slot中,找到没有相同JobVertices ID的slot,根据slot选择策略选择一个slot即可,如果没有符合条件的,则申请新的slot。
  2. CoLocationGroup:又叫做本地约束共享组,具有强制性的slot共享限制,CoLocationGroup用在迭代运算中,迭代运算中的Task必须共享同一个TaskManager的slot。CoLocationGroup可以看成是SlotSharingGroup的特例。

其独占及共享slot资源分配示意如下:

SlotSharingManager

        主要的成员变量如下,除了关联的SlotSharingGroupId以外,最重要的就是用于管理TaskSlot的三个Map:

class SlotSharingManager {
	private final SlotSharingGroupId slotSharingGroupId;

	/** Actions to release allocated slots after a complete multi task slot hierarchy has been released. */
	private final AllocatedSlotActions allocatedSlotActions;

	/** Owner of the slots to which to return them when they are released from the outside. */
	private final SlotOwner slotOwner;

	private final Map<SlotRequestId, TaskSlot> allTaskSlots; // 所有的TaskSlot,包括root和inner和leaf

	/** Root nodes which have not been completed because the allocated slot is still pending. */
	private final Map<SlotRequestId, MultiTaskSlot> unresolvedRootSlots; // root MultiTaskSlot,但底层的Physical Slot还没有分配好

	/** Root nodes which have been completed (the underlying allocated slot has been assigned). */
   // root MultiTaskSlot,底层的physical slot也已经分配好了,按照两层map的方式组织,
	// 可以通过已分配的Physical slot所在的TaskManager的位置进行查找
	private final Map<TaskManagerLocation, Map<AllocationID, MultiTaskSlot>> resolvedRootSlots; 
}

当需要构造一个新的TaskSlot树的时候,需要调用createRootSlot来创建根节点:

class SlotSharingManager {
    MultiTaskSlot createRootSlot(
          SlotRequestId slotRequestId,
          CompletableFuture<? extends SlotContext> slotContextFuture,
          SlotRequestId allocatedSlotRequestId) {
       final MultiTaskSlot rootMultiTaskSlot = new MultiTaskSlot(
          slotRequestId,
          slotContextFuture,
          allocatedSlotRequestId);
    
       LOG.debug("Create multi task slot [{}] in slot [{}].", slotRequestId, allocatedSlotRequestId);
    
       allTaskSlots.put(slotRequestId, rootMultiTaskSlot);
       unresolvedRootSlots.put(slotRequestId, rootMultiTaskSlot); // 先加入到unresolvedRootSlots中
    
       // add the root node to the set of resolved root nodes once the SlotContext future has
       // been completed and we know the slot's TaskManagerLocation
       slotContextFuture.whenComplete(
          (SlotContext slotContext, Throwable throwable) -> {
             if (slotContext != null) {
                // 一旦physical slot完成分配,就从unresolvedRootSlots中移除,加入到resolvedRootSlots中
                final MultiTaskSlot resolvedRootNode = unresolvedRootSlots.remove(slotRequestId);
    
                if (resolvedRootNode != null) {
                   final AllocationID allocationId = slotContext.getAllocationId();
                   LOG.trace("Fulfill multi task slot [{}] with slot [{}].", slotRequestId, allocationId);
    
                   final Map<AllocationID, MultiTaskSlot> innerMap = resolvedRootSlots.computeIfAbsent(
                      slotContext.getTaskManagerLocation(),
                      taskManagerLocation -> new HashMap<>(4));
    
                   MultiTaskSlot previousValue = innerMap.put(allocationId, resolvedRootNode);
                   Preconditions.checkState(previousValue == null);
                }
             } else {
                rootMultiTaskSlot.release(throwable);
             }
          });
       return rootMultiTaskSlot;
    }
}

        另外,Flink中不同Task只要在同一个SlotSharingGroup中就可以进行资源共享,但有一个隐含的条件是,这两个Task需要是不同的Operator的子任务。例如,如果map算子的并行度为三,map[1]子任务和map[2]子任务是不能落在同一个PhysicalSlot中的。在listResolvedRootSlotInfo和getUnresolvedRootSlot中,都有!multiTaskSlot.contains(groupId)的逻辑,也就是说要确保一棵TaskSlot构成的树中不会出现同一个算子的不同子任务。

class SlotSharingManager {
     // 列出已经分配了physical slot的root MultiTaskSlot,但要求MultiTaskSlot不包含指定的groupId
     public Collection<SlotSelectionStrategy.SlotInfoAndResources> listResolvedRootSlotInfo(@Nullable AbstractID groupId) {
       return resolvedRootSlots
          .values()
          .stream()
             .flatMap((Map<AllocationID, MultiTaskSlot> map) -> createValidMultiTaskSlotInfos(map, groupId))
             .map((MultiTaskSlotInfo multiTaskSlotInfo) -> {
                SlotInfo slotInfo = multiTaskSlotInfo.getSlotInfo();
                return new SlotSelectionStrategy.SlotInfoAndResources(
                   slotInfo,
                   slotInfo.getResourceProfile().subtract(multiTaskSlotInfo.getReservedResources()),
                   multiTaskSlotInfo.getTaskExecutorUtilization());
             }).collect(Collectors.toList());
    }
    
    // 根据SlotInfo (TasManagerLocation和AllocationId)找到MultiTaskSlot
    public MultiTaskSlot getResolvedRootSlot(@Nonnull SlotInfo slotInfo) {
       Map<AllocationID, MultiTaskSlot> forLocationEntry = resolvedRootSlots.get(slotInfo.getTaskManagerLocation());
       return forLocationEntry != null ? forLocationEntry.get(slotInfo.getAllocationId()) : null;
    }
    
	/**
	 * Gets an unresolved slot which does not yet contain the given groupId. An unresolved
	 * slot is a slot whose underlying allocated slot has not been allocated yet.
	 *
	 * @param groupId which the returned slot must not contain
	 * @return the unresolved slot or null if there was no root slot with free capacities
	 */
	 // 找到一个不包含指定groupId的root MultiTaskSlot
    MultiTaskSlot getUnresolvedRootSlot(AbstractID groupId) {
       return unresolvedRootSlots.values().stream()
          .filter(validMultiTaskSlotAndDoesNotContain(groupId))
          .findFirst()
          .orElse(null);
    }
}

任务调度时LogicalSlot资源的申请通过Scheduler接口进行管理,Scheduler接口继承了SlotProvider接口,它的唯一实现类是SchuedulerImpl。

public interface SlotProvider {
	// 申请slot,返回值一个LogicalSlot的future
	CompletableFuture<LogicalSlot> allocateSlot(
		SlotRequestId slotRequestId,
		ScheduledUnit scheduledUnit,
		SlotProfile slotProfile,
		boolean allowQueuedScheduling,
		Time allocationTimeout);

	void cancelSlotRequest(
		SlotRequestId slotRequestId,
		@Nullable SlotSharingGroupId slotSharingGroupId,
		Throwable cause);
}

public interface Scheduler extends SlotProvider, SlotOwner {
	void start(@Nonnull ComponentMainThreadExecutor mainThreadExecutor);
	boolean requiresPreviousExecutionGraphAllocations();
}

        SchedulerImpl主要使用SlotPool来申请PhysicalSlot,借助SlotSharingManager来实现slot共享。SlotSelectionStrategy接口主要用于从一组slot中选出最符合资源申请偏好的一个。SchedulerImpl 的主要成员变量及方法如下:

class SchedulerImpl implements Scheduler {
	private final SlotSelectionStrategy slotSelectionStrategy; // Strategy that selects the best slot for a given slot allocation request.
	private final SlotPool slotPool; // The slot pool from which slots are allocated
	private final Map<SlotSharingGroupId, SlotSharingManager> slotSharingManagers; // Managers for the different slot sharing groups
 
     public CompletableFuture<LogicalSlot> allocateSlot(
    	SlotRequestId slotRequestId,
    	ScheduledUnit scheduledUnit,
    	SlotProfile slotProfile,
    	boolean allowQueuedScheduling,
    	Time allocationTimeout) {
    	log.debug("Received slot request [{}] for task: {}", slotRequestId, scheduledUnit.getTaskToExecute());
    	componentMainThreadExecutor.assertRunningInMainThread();
    	final CompletableFuture<LogicalSlot> allocationResultFuture = new CompletableFuture<>();
    
    	// 如果没有指定SlotSharingGroupId,说明这个任务不运行slot共享,要独占一个slot
    	CompletableFuture<LogicalSlot> allocationFuture = scheduledUnit.getSlotSharingGroupId() == null ?
    		allocateSingleSlot(slotRequestId, slotProfile, allowQueuedScheduling, allocationTimeout) :                  // 不进行资源共享 申请单个SingleSlot
    		allocateSharedSlot(slotRequestId, scheduledUnit, slotProfile, allowQueuedScheduling, allocationTimeout);    // 资源共享
      
            allocationFuture.whenComplete((LogicalSlot slot, Throwable failure) -> {
               if (failure != null) {
                  Optional<SharedSlotOversubscribedException> sharedSlotOverAllocatedException =
                        ExceptionUtils.findThrowable(failure, SharedSlotOversubscribedException.class);
                  if (sharedSlotOverAllocatedException.isPresent() &&
                        sharedSlotOverAllocatedException.get().canRetry()) {
            
                     // Retry the allocation   // 重新尝试分配
                     internalAllocateSlot(
                           allocationResultFuture,
                           slotRequestId,
                           scheduledUnit,
                           slotProfile,
                           allocationTimeout);
                  } else {
                     cancelSlotRequest(
                           slotRequestId,
                           scheduledUnit.getSlotSharingGroupId(),
                           failure);
                     allocationResultFuture.completeExceptionally(failure);
                  }
               } else {
                  allocationResultFuture.complete(slot);
               }
            });
    	return allocationResultFuture;
    }

    @Override
    public void cancelSlotRequest(
       SlotRequestId slotRequestId,
       @Nullable SlotSharingGroupId slotSharingGroupId,
       Throwable cause) {
       componentMainThreadExecutor.assertRunningInMainThread();
       if (slotSharingGroupId != null) {
          releaseSharedSlot(slotRequestId, slotSharingGroupId, cause);
       } else {
          slotPool.releaseSlot(slotRequestId, cause);
       }
    }

    @Override
    public void returnLogicalSlot(LogicalSlot logicalSlot) {
       SlotRequestId slotRequestId = logicalSlot.getSlotRequestId();
       SlotSharingGroupId slotSharingGroupId = logicalSlot.getSlotSharingGroupId();
       FlinkException cause = new FlinkException("Slot is being returned to the SlotPool.");
       cancelSlotRequest(slotRequestId, slotSharingGroupId, cause);
    }
}

这几个对外暴露的方法的逻辑都比较清晰,接着来看下内部的具体实现。如果不允许资源共享,那么直接从SlotPool中获取PhysicalSlot,然后创建一个LogicalSlot即可

class SchedulerImpl {
    private CompletableFuture<LogicalSlot> allocateSingleSlot(
      SlotRequestId slotRequestId,
      SlotProfile slotProfile,
      @Nullable Time allocationTimeout) {
          
       Optional<SlotAndLocality> slotAndLocality = tryAllocateFromAvailable(slotRequestId, slotProfile); // 先尝试从SlotPool可用的AllocatedSlot中获取
       if (slotAndLocality.isPresent()) { // 如果有已经有可用的了,就创建一个SingleLogicalSlot,并作为AllocatedSlot的payload
          // already successful from available
          try {
             return CompletableFuture.completedFuture(
                completeAllocationByAssigningPayload(slotRequestId, slotAndLocality.get()));
          } catch (FlinkException e) {
             return FutureUtils.completedExceptionally(e);
          }
       } else {
          // we allocate by requesting a new slot
          // 暂时没有可用的,如果允许排队的话,可以要求SlotPool向RM申请一个新的slot
          return requestNewAllocatedSlot(slotRequestId, slotProfile, allocationTimeout)
             .thenApply((PhysicalSlot allocatedSlot) -> {
                try {
                   return completeAllocationByAssigningPayload(slotRequestId, new SlotAndLocality(allocatedSlot, Locality.UNKNOWN));
                } catch (FlinkException e) {
                   throw new CompletionException(e);
                }
             });
       }
    }
    private Optional<SlotAndLocality> tryAllocateFromAvailable(
       @Nonnull SlotRequestId slotRequestId,
       @Nonnull SlotProfile slotProfile) {
    
       Collection<SlotSelectionStrategy.SlotInfoAndResources> slotInfoList =
             slotPool.getAvailableSlotsInformation()
                   .stream()
                   .map(SlotSelectionStrategy.SlotInfoAndResources::fromSingleSlot)
                   .collect(Collectors.toList());
    
       Optional<SlotSelectionStrategy.SlotInfoAndLocality> selectedAvailableSlot =
          slotSelectionStrategy.selectBestSlotForProfile(slotInfoList, slotProfile);
    
       return selectedAvailableSlot.flatMap(slotInfoAndLocality -> {
          Optional<PhysicalSlot> optionalAllocatedSlot = slotPool.allocateAvailableSlot(
             slotRequestId,
             slotInfoAndLocality.getSlotInfo().getAllocationId());
    
          return optionalAllocatedSlot.map(
             allocatedSlot -> new SlotAndLocality(allocatedSlot, slotInfoAndLocality.getLocality()));
       });
    }
}

如果需要进行资源共享,那么还要进一步考虑CoLocationGroup强制约束的情况,它的核心就在于构造TaskSlot构成的树,然后在树上创建一个叶子节点,叶子节点里封装了需要的LogicalSlot。更详细的流程参考下面代码和添加的注释:

class SchedulerImpl {
    private CompletableFuture<LogicalSlot> allocateSharedSlot(
            SlotRequestId slotRequestId,
            ScheduledUnit scheduledUnit,
            SlotProfile slotProfile,
            boolean allowQueuedScheduling,
            Time allocationTimeout) {
        // 每一个SlotSharingGroup对应一个SlotSharingManager
        // allocate slot with slot sharing
        final SlotSharingManager multiTaskSlotManager = slotSharingManagers.computeIfAbsent(
                scheduledUnit.getSlotSharingGroupId(),
                id -> new SlotSharingManager(
                        id,
                        slotPool,
                        this));

        // 分配MultiTaskSlot
        final SlotSharingManager.MultiTaskSlotLocality multiTaskSlotLocality;
        try {
            if (scheduledUnit.getCoLocationConstraint() != null) {
                // 存在ColLocation 约束
                multiTaskSlotLocality = allocateCoLocatedMultiTaskSlot(
                        scheduledUnit.getCoLocationConstraint(),
                        multiTaskSlotManager,
                        slotProfile,
                        allowQueuedScheduling,
                        allocationTimeout);
            } else {
                multiTaskSlotLocality = allocateMultiTaskSlot(
                        scheduledUnit.getJobVertexId(),
                        multiTaskSlotManager,
                        slotProfile,
                        allowQueuedScheduling,
                        allocationTimeout);
            }
        } catch (NoResourceAvailableException noResourceException) {
            return FutureUtils.completedExceptionally(noResourceException);
        }

        // sanity check
        Preconditions.checkState(!multiTaskSlotLocality.getMultiTaskSlot().contains(scheduledUnit.getJobVertexId()));

        // 在MultiTaskSlot下创建叶子节点SingleTaskSlot,并获取可以分配给任务的SingleLogicalSlot
        final SlotSharingManager.SingleTaskSlot leaf = multiTaskSlotLocality.getMultiTaskSlot().allocateSingleTaskSlot(
                slotRequestId,
                scheduledUnit.getJobVertexId(),
                multiTaskSlotLocality.getLocality());
        return leaf.getLogicalSlotFuture();
    }

    private SlotSharingManager.MultiTaskSlotLocality allocateCoLocatedMultiTaskSlot(
            CoLocationConstraint coLocationConstraint,
            SlotSharingManager multiTaskSlotManager,
            SlotProfile slotProfile,
            boolean allowQueuedScheduling,
            Time allocationTimeout) throws NoResourceAvailableException {
        // coLocationConstraint会和分配给它的MultiTaskSlot(不是root)的SlotRequestId绑定
        // 这个绑定关系只有在分配了MultiTaskSlot之后才会生成
        // 可以根据SlotRequestId直接定位到MultiTaskSlot
        final SlotRequestId coLocationSlotRequestId = coLocationConstraint.getSlotRequestId();

        if (coLocationSlotRequestId != null) {
            // we have a slot assigned --> try to retrieve it
            final SlotSharingManager.TaskSlot taskSlot = multiTaskSlotManager.getTaskSlot(coLocationSlotRequestId);

            if (taskSlot != null) {
                Preconditions.checkState(taskSlot instanceof SlotSharingManager.MultiTaskSlot);
                return SlotSharingManager.MultiTaskSlotLocality.of(((SlotSharingManager.MultiTaskSlot) taskSlot), Locality.LOCAL);
            } else {
                // the slot may have been cancelled in the mean time
                coLocationConstraint.setSlotRequestId(null);
            }
        }

        if (coLocationConstraint.isAssigned()) {
            // refine the preferred locations of the slot profile
            slotProfile = new SlotProfile(
                    slotProfile.getResourceProfile(),
                    Collections.singleton(coLocationConstraint.getLocation()),
                    slotProfile.getPreferredAllocations());
        }

        // 为这个coLocationConstraint分配MultiTaskSlot,先找到符合要求的root MultiTaskSlot
        // get a new multi task slot
        SlotSharingManager.MultiTaskSlotLocality multiTaskSlotLocality = allocateMultiTaskSlot(
                coLocationConstraint.getGroupId(),
                multiTaskSlotManager,
                slotProfile,
                allowQueuedScheduling,
                allocationTimeout);

        // check whether we fulfill the co-location constraint
        if (coLocationConstraint.isAssigned() && multiTaskSlotLocality.getLocality() != Locality.LOCAL) {
            multiTaskSlotLocality.getMultiTaskSlot().release(
                    new FlinkException("Multi task slot is not local and, thus, does not fulfill the co-location constraint."));

            throw new NoResourceAvailableException("Could not allocate a local multi task slot for the " +
                    "co location constraint " + coLocationConstraint + '.');
        }

        // 在root MultiTaskSlot下面创建一个二级的MultiTaskSlot,分配给这个coLocationConstraint
        final SlotRequestId slotRequestId = new SlotRequestId();
        final SlotSharingManager.MultiTaskSlot coLocationSlot =
                multiTaskSlotLocality.getMultiTaskSlot().allocateMultiTaskSlot(
                        slotRequestId,
                        coLocationConstraint.getGroupId());

        // 为coLocationConstraint绑定slotRequestId,后续就可以直接通过这个slotRequestId定位到MultiTaskSlot
        // mark the requested slot as co-located slot for other co-located tasks
        coLocationConstraint.setSlotRequestId(slotRequestId);

        // lock the co-location constraint once we have obtained the allocated slot
        coLocationSlot.getSlotContextFuture().whenComplete(
                (SlotContext slotContext, Throwable throwable) -> {
                    if (throwable == null) {
                        // check whether we are still assigned to the co-location constraint
                        if (Objects.equals(coLocationConstraint.getSlotRequestId(), slotRequestId)) {
                            // 为这个coLocationConstraint绑定位置
                            coLocationConstraint.lockLocation(slotContext.getTaskManagerLocation());
                        } else {
                            log.debug("Failed to lock colocation constraint {} because assigned slot " +
                                            "request {} differs from fulfilled slot request {}.",
                                    coLocationConstraint.getGroupId(),
                                    coLocationConstraint.getSlotRequestId(),
                                    slotRequestId);
                        }
                    } else {
                        log.debug("Failed to lock colocation constraint {} because the slot " +
                                        "allocation for slot request {} failed.",
                                coLocationConstraint.getGroupId(),
                                coLocationConstraint.getSlotRequestId(),
                                throwable);
                    }
                });
        return SlotSharingManager.MultiTaskSlotLocality.of(coLocationSlot, multiTaskSlotLocality.getLocality());
    }

    private SlotSharingManager.MultiTaskSlotLocality allocateMultiTaskSlot(
            AbstractID groupId,
            SlotSharingManager slotSharingManager,
            SlotProfile slotProfile,
            boolean allowQueuedScheduling,
            Time allocationTimeout) throws NoResourceAvailableException {

        //找到符合要求的已经分配了 AllocatedSlot 的 root MultiTaskSlot 集合,
        //这里的符合要求是指 root MultiTaskSlot 不含有当前 groupId, 避免把 groupId 相同(同一个 JobVertex)的不同 task 分配到同一个 slot 中
        Collection<SlotInfo> resolvedRootSlotsInfo = slotSharingManager.listResolvedRootSlotInfo(groupId);

        //由 slotSelectionStrategy 选出最符合条件的
        SlotSelectionStrategy.SlotInfoAndLocality bestResolvedRootSlotWithLocality =
                slotSelectionStrategy.selectBestSlotForProfile(resolvedRootSlotsInfo, slotProfile).orElse(null);

        //对 MultiTaskSlot 和 Locality 做一层封装
        final SlotSharingManager.MultiTaskSlotLocality multiTaskSlotLocality = bestResolvedRootSlotWithLocality != null ?
                new SlotSharingManager.MultiTaskSlotLocality(
                        slotSharingManager.getResolvedRootSlot(bestResolvedRootSlotWithLocality.getSlotInfo()),
                        bestResolvedRootSlotWithLocality.getLocality()) :
                null;

        //如果 MultiTaskSlot 对应的 AllocatedSlot 和请求偏好的 slot 落在同一个 TaskManager,那么就选择这个 MultiTaskSlot
        if (multiTaskSlotLocality != null && multiTaskSlotLocality.getLocality() == Locality.LOCAL) {
            return multiTaskSlotLocality;
        }

        //这里由两种可能:
        // 1)multiTaskSlotLocality == null,说明没有找到符合条件的 root MultiTaskSlot
        // 2) multiTaskSlotLocality != null && multiTaskSlotLocality.getLocality() == Locality.LOCAL,不符合 Locality 偏好

        //尝试从 SlotPool 中未使用的 slot 中选择
        final SlotRequestId allocatedSlotRequestId = new SlotRequestId();
        final SlotRequestId multiTaskSlotRequestId = new SlotRequestId();

        Optional<SlotAndLocality> optionalPoolSlotAndLocality = tryAllocateFromAvailable(allocatedSlotRequestId, slotProfile);

        if (optionalPoolSlotAndLocality.isPresent()) {
            //如果从 SlotPool 中找到了未使用的 slot
            SlotAndLocality poolSlotAndLocality = optionalPoolSlotAndLocality.get();
            //如果未使用的 AllocatedSlot 符合 Locality 偏好,或者前一步没有找到可用的 MultiTaskSlot
            if (poolSlotAndLocality.getLocality() == Locality.LOCAL || bestResolvedRootSlotWithLocality == null) {

                //基于 新分配的 AllocatedSlot 创建一个 root MultiTaskSlot
                final PhysicalSlot allocatedSlot = poolSlotAndLocality.getSlot();
                final SlotSharingManager.MultiTaskSlot multiTaskSlot = slotSharingManager.createRootSlot(
                        multiTaskSlotRequestId,
                        CompletableFuture.completedFuture(poolSlotAndLocality.getSlot()),
                        allocatedSlotRequestId);

                //将新创建的 root MultiTaskSlot 作为 AllocatedSlot 的 payload
                if (allocatedSlot.tryAssignPayload(multiTaskSlot)) {
                    return SlotSharingManager.MultiTaskSlotLocality.of(multiTaskSlot, poolSlotAndLocality.getLocality());
                } else {
                    multiTaskSlot.release(new FlinkException("Could not assign payload to allocated slot " +
                            allocatedSlot.getAllocationId() + '.'));
                }
            }
        }

        if (multiTaskSlotLocality != null) {
            //如果都不符合 Locality 偏好,或者 SlotPool 中没有可用的 slot 了
            // prefer slot sharing group slots over unused slots
            if (optionalPoolSlotAndLocality.isPresent()) {
                slotPool.releaseSlot(
                        allocatedSlotRequestId,
                        new FlinkException("Locality constraint is not better fulfilled by allocated slot."));
            }
            return multiTaskSlotLocality;
        }

        //到这里,说明 1)slotSharingManager 中没有符合要求的 root MultiTaskSlot && 2)slotPool 中没有可用的 slot 了
        if (allowQueuedScheduling) {
            //先检查 slotSharingManager 中是不是还有没完成 slot 分配的  root MultiTaskSlot
            // there is no slot immediately available --> check first for uncompleted slots at the slot sharing group
            SlotSharingManager.MultiTaskSlot multiTaskSlot = slotSharingManager.getUnresolvedRootSlot(groupId);

            if (multiTaskSlot == null) {
                //如果没有,就需要 slotPool 向 RM 请求新的 slot 了
                // it seems as if we have to request a new slot from the resource manager, this is always the last resort!!!
                final CompletableFuture<PhysicalSlot> slotAllocationFuture = slotPool.requestNewAllocatedSlot(
                        allocatedSlotRequestId,
                        slotProfile.getResourceProfile(),
                        allocationTimeout);

                //请求分配后,就是同样的流程的,创建一个 root MultiTaskSlot,并作为新分配的 AllocatedSlot 的负载
                multiTaskSlot = slotSharingManager.createRootSlot(
                        multiTaskSlotRequestId,
                        slotAllocationFuture,
                        allocatedSlotRequestId);

                slotAllocationFuture.whenComplete(
                        (PhysicalSlot allocatedSlot, Throwable throwable) -> {
                            final SlotSharingManager.TaskSlot taskSlot = slotSharingManager.getTaskSlot(multiTaskSlotRequestId);

                            if (taskSlot != null) {
                                // still valid
                                if (!(taskSlot instanceof SlotSharingManager.MultiTaskSlot) || throwable != null) {
                                    taskSlot.release(throwable);
                                } else {
                                    if (!allocatedSlot.tryAssignPayload(((SlotSharingManager.MultiTaskSlot) taskSlot))) {
                                        taskSlot.release(new FlinkException("Could not assign payload to allocated slot " +
                                                allocatedSlot.getAllocationId() + '.'));
                                    }
                                }
                            } else {
                                slotPool.releaseSlot(
                                        allocatedSlotRequestId,
                                        new FlinkException("Could not find task slot with " + multiTaskSlotRequestId + '.'));
                            }
                        });
            }

            return SlotSharingManager.MultiTaskSlotLocality.of(multiTaskSlot, Locality.UNKNOWN);
        }

        throw new NoResourceAvailableException("Could not allocate a shared slot for " + groupId + '.');
    }
}

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值