Flink源码篇 No.9-任务提交之注册Slot(per-job on yarn)

第1章 简介

接上一篇文章,启动TaskManager之后;本篇文章介绍TaskManager向ResourceManager注册Slot,然后提供给JobManager。

第2章 具体步骤

2.1 启动TaskExecutor

 org.apache.flink.runtime.taskexecutor.TaskExecutor#startTaskExecutorServices

private void startTaskExecutorServices() throws Exception {
	try {
		// start by connecting to the ResourceManager
		// TODO taskManager向ResourceManager发起连接
		resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());

		// tell the task slot table who's responsible for the task slot actions
		taskSlotTable.start(new SlotActionsImpl(), getMainThreadExecutor());

		// start the job leader service
		jobLeaderService.start(getAddress(), getRpcService(), haServices, new JobLeaderListenerImpl());

		fileCache = new FileCache(taskManagerConfiguration.getTmpDirectories(), blobCacheService.getPermanentBlobService());
	} catch (Exception e) {
		handleStartTaskExecutorServicesException(e);
	}
}

 2.2 TM与RM建立连接

org.apache.flink.runtime.leaderretrieval.LeaderRetrievalService#start

我们看ZooKeeperLeaderRetrievalService的实现类

org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#start

@Override
public void start(LeaderRetrievalListener listener) throws Exception {
	Preconditions.checkNotNull(listener, "Listener must not be null.");
	Preconditions.checkState(leaderListener == null, "ZooKeeperLeaderRetrievalService can " +
			"only be started once.");

	LOG.info("Starting ZooKeeperLeaderRetrievalService {}.", retrievalPath);

	synchronized (lock) {
		leaderListener = listener;

		// TODO 添加监听器
		client.getUnhandledErrorListenable().addListener(this);
		cache.getListenable().addListener(this);
		cache.start();

		client.getConnectionStateListenable().addListener(connectionStateListener);

		running = true;
	}
}

 最终会执行org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#retrieveLeaderInformationFromZooKeeper

private void retrieveLeaderInformationFromZooKeeper() {
	synchronized (lock) {
		if (running) {
			try {
				// ...

				// TODO 通知leader地址
				notifyIfNewLeaderAddress(leaderAddress, leaderSessionID);
			} catch (Exception e) {
				leaderListener.handleError(new Exception("Could not handle node changed event.", e));
				ExceptionUtils.checkInterrupted(e);
			}
		} else {
			LOG.debug("Ignoring node change notification since the service has already been stopped.");
		}
	}
}

org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService#notifyIfNewLeaderAddress

@GuardedBy("lock")
private void notifyIfNewLeaderAddress(String newLeaderAddress, UUID newLeaderSessionID) {
	if (!(Objects.equals(newLeaderAddress, lastLeaderAddress) &&
			Objects.equals(newLeaderSessionID, lastLeaderSessionID))) {
		// ...
		
		// TODO 通知Leader的地址
		leaderListener.notifyLeaderAddress(newLeaderAddress, newLeaderSessionID);
	}
}

org.apache.flink.runtime.leaderretrieval.LeaderRetrievalListener#notifyLeaderAddress

实现类最终还是在TaskExecutor

org.apache.flink.runtime.taskexecutor.TaskExecutor.ResourceManagerLeaderListener#notifyLeaderAddress

@Override
public void notifyLeaderAddress(final String leaderAddress, final UUID leaderSessionID) {
	// TODO 获得新的RM地址
	runAsync(
		() -> notifyOfNewResourceManagerLeader(
			leaderAddress,
			ResourceManagerId.fromUuidOrNull(leaderSessionID)));
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#notifyOfNewResourceManagerLeader

private void notifyOfNewResourceManagerLeader(String newLeaderAddress, ResourceManagerId newResourceManagerId) {
	resourceManagerAddress = createResourceManagerAddress(newLeaderAddress, newResourceManagerId);
	// TODO 连接RM
	reconnectToResourceManager(new FlinkException(String.format("ResourceManager leader changed to new address %s", resourceManagerAddress)));
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#reconnectToResourceManager

private void reconnectToResourceManager(Exception cause) {
	closeResourceManagerConnection(cause);
	startRegistrationTimeout();
	// TODO 尝试连接RM
	tryConnectToResourceManager();
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#tryConnectToResourceManager

private void tryConnectToResourceManager() {
	if (resourceManagerAddress != null) {
		// TODO 连接RM
		connectToResourceManager();
	}
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#connectToResourceManager

private void connectToResourceManager() {
	assert(resourceManagerAddress != null);
	assert(establishedResourceManagerConnection == null);
	assert(resourceManagerConnection == null);

	log.info("Connecting to ResourceManager {}.", resourceManagerAddress);

	final TaskExecutorRegistration taskExecutorRegistration = new TaskExecutorRegistration(
		getAddress(),
		getResourceID(),
		unresolvedTaskManagerLocation.getDataPort(),
		JMXService.getPort().orElse(-1),
		hardwareDescription,
		memoryConfiguration,
		taskManagerConfiguration.getDefaultSlotResourceProfile(),
		taskManagerConfiguration.getTotalResourceProfile()
	);

	// TODO 注意,注册成功后会执行TaskExecutorToResourceManagerConnection中的回调onRegistrationSuccess
	resourceManagerConnection =
		new TaskExecutorToResourceManagerConnection(
			log,
			getRpcService(),
			taskManagerConfiguration.getRetryingRegistrationConfiguration(),
			resourceManagerAddress.getAddress(),
			resourceManagerAddress.getResourceManagerId(),
			getMainThreadExecutor(),
			new ResourceManagerRegistrationListener(),
			taskExecutorRegistration);
	// TODO 开始连接
	resourceManagerConnection.start();
}

start启动了RPC的注册连接,连接成功执行TaskExecutorToResourceManagerConnection中的onRegistrationSuccess回调

org.apache.flink.runtime.taskexecutor.TaskExecutorToResourceManagerConnection#onRegistrationSuccess

@Override
protected void onRegistrationSuccess(TaskExecutorRegistrationSuccess success) {
	log.info("Successful registration at resource manager {} under registration id {}.",
		getTargetAddress(), success.getRegistrationId());

	// TODO 注册成功后
	registrationListener.onRegistrationSuccess(this, success);
}

org.apache.flink.runtime.registration.RegistrationConnectionListener#onRegistrationSuccess的实现类ResourceManagerRegistrationListener实际上是TaskExecutor的一个内部类。

org.apache.flink.runtime.taskexecutor.TaskExecutor.ResourceManagerRegistrationListener#onRegistrationSuccess

@Override
public void onRegistrationSuccess(TaskExecutorToResourceManagerConnection connection, TaskExecutorRegistrationSuccess success) {
	final ResourceID resourceManagerId = success.getResourceManagerId();
	final InstanceID taskExecutorRegistrationId = success.getRegistrationId();
	final ClusterInformation clusterInformation = success.getClusterInformation();
	final ResourceManagerGateway resourceManagerGateway = connection.getTargetGateway();

	runAsync(
		() -> {
			// filter out outdated connections
			//noinspection ObjectEquality
			if (resourceManagerConnection == connection) {
				try {
					// TODO TM建立与RM的连接
					establishResourceManagerConnection(
						resourceManagerGateway,
						resourceManagerId,
						taskExecutorRegistrationId,
						clusterInformation);
				} catch (Throwable t) {
					log.error("Establishing Resource Manager connection in Task Executor failed", t);
				}
			}
		});
}

2.3 向RM注册Slot

 org.apache.flink.runtime.taskexecutor.TaskExecutor#establishResourceManagerConnection

private void establishResourceManagerConnection(
		ResourceManagerGateway resourceManagerGateway,
		ResourceID resourceManagerResourceId,
		InstanceID taskExecutorRegistrationId,
		ClusterInformation clusterInformation) {

	// TODO 发送请求slot信息
	final CompletableFuture<Acknowledge> slotReportResponseFuture = resourceManagerGateway.sendSlotReport(
		getResourceID(),
		taskExecutorRegistrationId,
		taskSlotTable.createSlotReport(getResourceID()),
		taskManagerConfiguration.getTimeout());

	// ...
}

org.apache.flink.runtime.resourcemanager.ResourceManagerGateway#sendSlotReport的实现方法:

org.apache.flink.runtime.resourcemanager.ResourceManager#sendSlotReport

@Override
public CompletableFuture<Acknowledge> sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, Time timeout) {
	final WorkerRegistration<WorkerType> workerTypeWorkerRegistration = taskExecutors.get(taskManagerResourceId);

	if (workerTypeWorkerRegistration.getInstanceID().equals(taskManagerRegistrationId)) {
		// TODO RM中的slotManager注册TM
		if (slotManager.registerTaskManager(workerTypeWorkerRegistration, slotReport)) {
			onWorkerRegistered(workerTypeWorkerRegistration.getWorker());
		}
		return CompletableFuture.completedFuture(Acknowledge.get());
	} else {
		return FutureUtils.completedExceptionally(new ResourceManagerException(String.format("Unknown TaskManager registration id %s.", taskManagerRegistrationId)));
	}
}

org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager#registerTaskManager的实现方法:

org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#registerTaskManager

@Override
public boolean registerTaskManager(final TaskExecutorConnection taskExecutorConnection, SlotReport initialSlotReport) {
	checkInit();

	LOG.debug("Registering TaskManager {} under {} at the SlotManager.", taskExecutorConnection.getResourceID().getStringWithMetadata(), taskExecutorConnection.getInstanceID());

	// we identify task managers by their instance id
	if (taskManagerRegistrations.containsKey(taskExecutorConnection.getInstanceID())) {
		reportSlotStatus(taskExecutorConnection.getInstanceID(), initialSlotReport);
		return false;
	} else {
		// ...
		// next register the new slots
		for (SlotStatus slotStatus : initialSlotReport) {
			// TODO 注册Slot
			registerSlot(
				slotStatus.getSlotID(),
				slotStatus.getAllocationID(),
				slotStatus.getJobID(),
				slotStatus.getResourceProfile(),
				taskExecutorConnection);
		}

		return true;
	}

}

org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#registerSlot

private void registerSlot(
		SlotID slotId,
		AllocationID allocationId,
		JobID jobId,
		ResourceProfile resourceProfile,
		TaskExecutorConnection taskManagerConnection) {

	// TODO 如果slots中已经存在,先根据slotId移除旧的slot
	if (slots.containsKey(slotId)) {
		// remove the old slot first
		removeSlot(
			slotId,
			new SlotManagerException(
				String.format(
					"Re-registration of slot %s. This indicates that the TaskExecutor has re-connected.",
					slotId)));
	}

	// TODO 创建并注册新的Slot
	final TaskManagerSlot slot = createAndRegisterTaskManagerSlot(slotId, resourceProfile, taskManagerConnection);

	final PendingTaskManagerSlot pendingTaskManagerSlot;

	if (allocationId == null) {
		// TODO 待定的slot
		pendingTaskManagerSlot = findExactlyMatchingPendingTaskManagerSlot(resourceProfile);
	} else {
		pendingTaskManagerSlot = null;
	}

	if (pendingTaskManagerSlot == null) {
		// TODO 更新slot
		updateSlot(slotId, allocationId, jobId);
	} else {
		pendingSlots.remove(pendingTaskManagerSlot.getTaskManagerSlotId());
		final PendingSlotRequest assignedPendingSlotRequest = pendingTaskManagerSlot.getAssignedPendingSlotRequest();

		// TODO 分配挂起的请求为空
		if (assignedPendingSlotRequest == null) {
			// TODO 当作空闲的slot处理
			handleFreeSlot(slot);
		} else {
			// TODO 取消挂起的TM slot
			assignedPendingSlotRequest.unassignPendingTaskManagerSlot();
			// TODO 分配slot
			allocateSlot(slot, assignedPendingSlotRequest);
		}
	}
}

到这里TM向RM中注册slot就完成了! 

2.4 RM通知TM注册信息

RM注册完Slot后,需要返回注册信息给TM。

org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl#allocateSlot

private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
	// ...

	// TODO 在所有当前注册的TM中获取当前实例
	TaskManagerRegistration taskManagerRegistration = taskManagerRegistrations.get(instanceID);

	if (taskManagerRegistration == null) {
		throw new IllegalStateException("Could not find a registered task manager for instance id " +
			instanceID + '.');
	}

	// TODO 标记为已使用
	taskManagerRegistration.markUsed();

	// RPC call to the task manager
	// TODO 通知TM,提供slot给JM,供执行job
	CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
		slotId,
		pendingSlotRequest.getJobId(),
		allocationId,
		pendingSlotRequest.getResourceProfile(),
		pendingSlotRequest.getTargetAddress(),
		resourceManagerId,
		taskManagerRequestTimeout);

	// ...
}

org.apache.flink.runtime.taskexecutor.TaskExecutorGateway#requestSlot的实现方法

org.apache.flink.runtime.taskexecutor.TaskExecutor#requestSlot

@Override
public CompletableFuture<Acknowledge> requestSlot(
	final SlotID slotId,
	final JobID jobId,
	final AllocationID allocationId,
	final ResourceProfile resourceProfile,
	final String targetAddress,
	final ResourceManagerId resourceManagerId,
	final Time timeout) {
	// ...
	
	try {
		// TODO 根据RM分配成功后的指令,分配自己的slot
		allocateSlot(
			slotId,
			jobId,
			allocationId,
			resourceProfile);
	} catch (SlotAllocationException sae) {
		return FutureUtils.completedExceptionally(sae);
	}
	
	// ...

	if (job.isConnected()) {
		// TODO 提供slot给JobManager
		offerSlotsToJobManager(jobId);
	}

	return CompletableFuture.completedFuture(Acknowledge.get());
}

TM在收到RM返回的信息后,先对自己内部的slot信息进行响应的分配处理,然后再将slot信息提供给JM。

2.5 TM提供Slot给JM

org.apache.flink.runtime.taskexecutor.TaskExecutor#offerSlotsToJobManager

private void offerSlotsToJobManager(final JobID jobId) {
	jobTable
		.getConnection(jobId)
		.ifPresent(this::internalOfferSlotsToJobManager);
}

org.apache.flink.runtime.taskexecutor.TaskExecutor#internalOfferSlotsToJobManager

private void internalOfferSlotsToJobManager(JobTable.Connection jobManagerConnection) {
	final JobID jobId = jobManagerConnection.getJobId();

	if (taskSlotTable.hasAllocatedSlots(jobId)) {
		// ...

		// TODO 连接jobMaster(jobManager),提供slot
		CompletableFuture<Collection<SlotOffer>> acceptedSlotsFuture = jobMasterGateway.offerSlots(
			getResourceID(),
			reservedSlots,
			taskManagerConfiguration.getTimeout());

		acceptedSlotsFuture.whenCompleteAsync(
			handleAcceptedSlotOffers(jobId, jobMasterGateway, jobMasterId, reservedSlots),
			getMainThreadExecutor());
	} else {
		log.debug("There are no unassigned slots for the job {}.", jobId);
	}
}

这里开始通过RPC请求JM,向JM提供Slot。

org.apache.flink.runtime.jobmaster.JobMasterGateway#offerSlots的实现方法

org.apache.flink.runtime.jobmaster.JobMaster#offerSlots

@Override
public CompletableFuture<Collection<SlotOffer>> offerSlots(
		final ResourceID taskManagerId,
		final Collection<SlotOffer> slots,
		final Time timeout) {

	// ...
	// TODO jobManger中的slotpool提供slot
	return CompletableFuture.completedFuture(
		slotPool.offerSlots(
			taskManagerLocation,
			rpcTaskManagerGateway,
			slots));
}

向JM中的slotpool提供slot

org.apache.flink.runtime.jobmaster.slotpool.SlotPool#offerSlots的实现方法:

org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#offerSlots

@Override
public Collection<SlotOffer> offerSlots(
		TaskManagerLocation taskManagerLocation,
		TaskManagerGateway taskManagerGateway,
		Collection<SlotOffer> offers) {

	ArrayList<SlotOffer> result = new ArrayList<>(offers.size());

	// TODO 提供slot
	for (SlotOffer offer : offers) {
		if (offerSlot(
			taskManagerLocation,
			taskManagerGateway,
			offer)) {

			result.add(offer);
		}
	}

	return result;
}

org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#offerSlot

boolean offerSlot(
		final TaskManagerLocation taskManagerLocation,
		final TaskManagerGateway taskManagerGateway,
		final SlotOffer slotOffer) {

	// ...

	// TODO 分配slot
	final AllocatedSlot allocatedSlot = new AllocatedSlot(
		allocationID,
		taskManagerLocation,
		slotOffer.getSlotIndex(),
		slotOffer.getResourceProfile(),
		taskManagerGateway);

	// use the slot to fulfill pending request, in requested order
	// TODO 使用slot来完成挂起的请求
	tryFulfillSlotRequestOrMakeAvailable(allocatedSlot);

	// we accepted the request in any case. slot will be released after it idled for
	// too long and timed out
	return true;
}

org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl#tryFulfillSlotRequestOrMakeAvailable

private void tryFulfillSlotRequestOrMakeAvailable(AllocatedSlot allocatedSlot) {
	Preconditions.checkState(!allocatedSlot.isUsed(), "Provided slot is still in use.");

	// TODO 挂起的请求
	final PendingRequest pendingRequest = findMatchingPendingRequest(allocatedSlot);

	if (pendingRequest != null) {
		log.debug("Fulfilling pending slot request [{}] with slot [{}]",
			pendingRequest.getSlotRequestId(), allocatedSlot.getAllocationId());

		// 移除挂起的请求
		removePendingRequest(pendingRequest.getSlotRequestId());

		// TODO 将挂起的请求添加到分配的slot中
		allocatedSlots.add(pendingRequest.getSlotRequestId(), allocatedSlot);
		pendingRequest.getAllocatedSlotFuture().complete(allocatedSlot);

		// this allocation may become orphan once its corresponding request is removed
		// TODO 获取AllocationId,AllocationId在JM中生成,注册给RM,然后由RM给TM,标记不同的分配
		final Optional<AllocationID> allocationIdOfRequest = pendingRequest.getAllocationId();

		// the allocation id can be null if the request was fulfilled by a slot directly offered
		// by a reconnected TaskExecutor before the ResourceManager is connected
		if (allocationIdOfRequest.isPresent()) {
			maybeRemapOrphanedAllocation(allocationIdOfRequest.get(), allocatedSlot.getAllocationId());
		}
	} else {
		log.debug("Adding slot [{}] to available slots", allocatedSlot.getAllocationId());
		availableSlots.add(allocatedSlot, clock.relativeTimeMillis());
	}
}

JM将TM提供的Slot进行校验和记录。

到这里,整个Slot注册和提供的过程就结束了。Slot注册完之后,下一步JM需要将job提交给TM执行。这部分内容再下一篇文章为您介绍!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

pezynd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值