再次感谢尚硅谷!!!
上一篇写到了提交应用,这继续
调用太多了,每个方法只看主要代码,多余的代码都以…代替
一、创建和启动 JobManager里的组件:Dispatcher、ResourceManager、JobMaster
在上一篇的deployJobCluster方法中有一个getYarnJobClusterEntrypoint(),这是AM 的入口
return deployInternal(
clusterSpecification,
"Flink per-job cluster",
getYarnJobClusterEntrypoint(),
jobGraph,
detached);
protected String getYarnJobClusterEntrypoint() {
return YarnJobClusterEntrypoint.class.getName();
}
进入YarnJobClusterEntrypoint类
其main方法是单个 Flink 作业的 Yarn Application Master Process 的可执行入口点。
public static void main(String[] args) {
... 配置相关的
YarnJobClusterEntrypoint yarnJobClusterEntrypoint = new YarnJobClusterEntrypoint(configuration);
ClusterEntrypoint.runClusterEntrypoint(yarnJobClusterEntrypoint);
}
点runClusterEntrypoint方法到ClusterEntrypoint类的runCluster方法
private void runCluster(...) throws Exception {
/*TODO 初始化服务:Rpc相关*/
initializeServices(configuration, pluginManager);
...
/*TODO 创建和启动 JobManager里的组件:Dispatcher、ResourceManager、JobMaster*/
clusterComponent = dispatcherResourceManagerComponentFactory.create(...);
...
}
点create方法到DefaultDispatcherResourceManagerComponentFactory类
这个就是分析的主体
public DispatcherResourceManagerComponent create(...) throws Exception {
...
/*TODO 创建 ResourceManager:Yarn模式的 ResourceManager*/
resourceManager = resourceManagerFactory.createResourceManager(...);
/*TODO 创建和启动 Dispatcher => Dispatcher会创建和启动JobMaster*/
dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(...);
/*TODO 启动 ResourceManager*/
resourceManager.start();
...
}
1.创建 ResourceManager
点createResourceManager方法直到ActiveResourceManagerFactory类的createResourceManager方法会返回一个ActiveResourceManager对象,ActiveResourceManager类是ResourceManager类的子类
public ResourceManager<WorkerType> createResourceManager(...) throws Exception {
return new ActiveResourceManager<>(...);
}
2.创建和启动 Dispatcher => Dispatcher会创建和启动JobMaster
回到DefaultDispatcherResourceManagerComponentFactory类点createDispatcherRunner方法到DefaultDispatcherGatewayServiceFactory类
public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(...) {
/*TODO 创建Dispatcher*/
dispatcher = dispatcherFactory.createDispatcher(...);
/*TODO 启动 Dispatcher,接着看 onStart()*/
dispatcher.start();
...
}
去dispatcher的onStart方法(到Dispatcher类)
public void onStart() throws Exception {
/*TODO 启动 dispatcher服务*/
startDispatcherServices();
/*TODO 启动JobMaster*/
startRecoveredJobs();
}
点进startRecoveredJobs方法里直到createJobManagerRunner方法
CompletableFuture<JobManagerRunner> createJobManagerRunner(...) {
/*TODO 创建JobMaster */
JobManagerRunner runner = jobManagerRunnerFactory.createJobManagerRunner(...);
/*TODO 启动JobMaster*/
runner.start();
}
1)创建JobMaste
点createJobManagerRunner方法到DefaultJobMasterServiceFactory类的createJobMasterService方法,返回一个JobMaster对象
public JobMaster createJobMasterService(...) throws Exception {
return new JobMaster(...);
}
2)启动JobMaster
点start方法直到JobMaster类的startJobExecution方法
private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
...
/*TODO 真正启动JobMaster服务*/
startJobMasterServices();
/*TODO 重置和启动调度器*/
resetAndStartScheduler();
}
private void startJobMasterServices() throws Exception {
/*TODO 启动心跳服务:taskmanager、resourcemanager*/
startHeartbeatServices();
/*TODO 启动 slotpool*/
slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());
...
// job is ready to go, try to establish connection with resource manager
// - activate leader retrieval for the resource manager
// - on notification of the leader, the connection will be established and
// the slot pool will start requesting slots
/*TODO 与ResourceManager建立连接,slotpool开始请求资源*/
resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
}
a.启动心跳服务
private void startHeartbeatServices() {
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new TaskManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
resourceManagerHeartbeatManager = heartbeatServices.createHeartbeatManager(
resourceId,
new ResourceManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
}
b.启动 slotpool
c.与ResourceManager建立连接,slotpool开始请求资源
点resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());到RegisteredRpcConnection类start方法
public void start() {
...
/*TODO 创建注册对象*/
final RetryingRegistration<F, G, S> newRegistration = createNewRegistration();
/*TODO 开始注册,注册成功之后,调用 onRegistrationSuccess()*/
newRegistration.startRegistration();
}
(1)创建注册对象
点createNewRegistration方法到JobMaster类的generateRegistration方法
protected RetryingRegistration<ResourceManagerId, ResourceManagerGateway, JobMasterRegistrationSuccess> generateRegistration() {
return gateway.registerJobManager(
jobMasterId,
jobManagerResourceID,
jobManagerRpcAddress,
jobID,
timeout);
}
(2)开始注册,注册成功之后,调用 onRegistrationSuccess()
①注册,点startRegistration方法到RetryingRegistration类,再点register方法,再调用invokeRegistration方法,这个方法是JobMaster类的一个匿名内部类里的invokeRegistration方法
protected CompletableFuture<RegistrationResponse> invokeRegistration(
ResourceManagerGateway gateway, ResourceManagerId fencingToken, long timeoutMillis) {
Time timeout = Time.milliseconds(timeoutMillis);
return gateway.registerJobManager(
jobMasterId,
jobManagerResourceID,
jobManagerRpcAddress,
jobID,
timeout);
}
②注册成功之后,到JobMaster类的onRegistrationSuccess方法,然后再调用establishResourceManagerConnection方法
private void establishResourceManagerConnection(final JobMasterRegistrationSuccess success) {
...
/*TODO slotpool连接到ResourceManager,请求资源*/
slotPool.connectToResourceManager(resourceManagerGateway);
...
}
点connectToResourceManager方法到SlotPoolImpl类,再调用requestSlotFromResourceManager方法,再点点点到ResourceManager类的requestSlot方法
public CompletableFuture<Acknowledge> requestSlot(...) {
...
/*TODO ResourceManager内部的 slotManager去向 Yarn的ResourceManager申请资源*/
slotManager.registerSlotRequest(slotRequest);
...
}
再点registerSlotRequest方法到SlotManagerImpl类的internalRequestSlot方法
private void internalRequestSlot(...) throws ResourceManagerException {
...
() -> fulfillPendingSlotRequestWithPendingTaskManagerSlot(pendingSlotRequest);
}
// 再点到fulfillPendingSlotRequestWithPendingTaskManagerSlot
private void fulfillPendingSlotRequestWithPendingTaskManagerSlot(...) throws ResourceManagerException {
...
pendingTaskManagerSlotOptional = allocateResource(resourceProfile);
...
}
private Optional<PendingTaskManagerSlot> allocateResource(ResourceProfile requestedSlotResourceProfile) {
...
if (!resourceActions.allocateResource(defaultWorkerResourceSpec)) {
// resource cannot be allocated
return Optional.empty();
}
...
}
再点allocateResource到ResourceManager类的内部类ResourceActionsImpl的allocateResource方法
public boolean allocateResource(WorkerResourceSpec workerResourceSpec) {
validateRunsInMainThread();
return startNewWorker(workerResourceSpec);
}
再点startNewWorker方法直到到ActiveResourceManager类的requestNewWorker方法
3.启动 ResourceManager
回到DefaultDispatcherResourceManagerComponentFactory类
分析这个resourceManager.start();
去ResourceManger类的onStart方法
public final void onStart() throws Exception {
startResourceManagerServices();
}
点startResourceManagerServices方法
private void startResourceManagerServices() throws Exception {
...
/*TODO 创建了Yarn的RM和NM的客户端,初始化并启动*/
initialize();
/*TODO 通过选举服务,启动ResourceManager*/
leaderElectionService.start(this);
...
}
1)创建了Yarn的RM和NM的客户端,初始化并启动
点initialize方法到YarnResourceManagerDriver类的initializeInternal方法
protected void initializeInternal() throws Exception {
/*TODO 创建Yarn的ResourceManager的客户端,并且初始化和启动*/
resourceManagerClient = yarnResourceManagerClientFactory.createResourceManagerClient(
yarnHeartbeatIntervalMillis,
yarnContainerEventHandler);
resourceManagerClient.init(yarnConfig);
resourceManagerClient.start();
...
/*TODO 创建yarn的 NodeManager的客户端,并且初始化和启动*/
nodeManagerClient = yarnNodeManagerClientFactory.createNodeManagerClient(yarnContainerEventHandler);
nodeManagerClient.init(yarnConfig);
nodeManagerClient.start();
}
2)启动ResourceManager
回到ResourceManger类的startResourceManagerServices方法
点start到本类的startServicesOnLeadership方法
private void startServicesOnLeadership() {
/*TODO 启动心跳服务:TaskManager、JobMaster*/
startHeartbeatServices();
/*TODO 启动slotManager*/
slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
...
}
a.启动心跳服务
private void startHeartbeatServices() { // 启动心跳服务
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new TaskManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new JobManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
}
b.启动slotManager
二、启动TaskManger
我们现在是基于找YarnTaskExecutorRunner类
此类是在 YARN 容器中运行 TaskExecutor 的可执行入口点。
public static void main(String[] args) {
EnvironmentInformation.logEnvironmentInfo(LOG, "YARN TaskExecutor runner", args);
SignalHandler.register(LOG);
JvmShutdownSafeguard.installAsShutdownHook(LOG);
runTaskManagerSecurely(args);
}
点runTaskManagerSecurely方法直到TaskExecutorToServiceAdapter类的start方法
public void start() {
/*TODO 通过Rpc服务,启动 TaskExecutor,找 它的 onStart()方法*/
taskExecutor.start();
}
到TaskExecutor的onStart方法
public void onStart() throws Exception {
...
/*TODO 启动 TaskExecutor服务*/
startTaskExecutorServices();
...
}
点startTaskExecutorServices方法直到RegisteredRpcConnection类的start方法
// 和上面的jobmaster的流程一样,用的一个抽象类RetryingRegistration
public void start() {
...
/*TODO 创建注册对象*/
final RetryingRegistration<F, G, S> newRegistration = createNewRegistration();
/*TODO 开始注册,注册成功之后,调用 onRegistrationSuccess()*/
newRegistration.startRegistration();
}
①注册,点startRegistration方法到RetryingRegistration类,再点register方法,再调用invokeRegistration方法,这个方法是TaskExecutorToResourceManagerConnection类的内部类ResourceManagerRegistration的一个方法
protected CompletableFuture<RegistrationResponse> invokeRegistration(
ResourceManagerGateway resourceManager, ResourceManagerId fencingToken, long timeoutMillis) throws Exception {
Time timeout = Time.milliseconds(timeoutMillis);
return resourceManager.registerTaskExecutor(
taskExecutorRegistration,
timeout);
}
②注册成功之后,在TaskExecutorToResourceManagerConnection类中找onRegistrationSuccess方法,点点点到TaskExecutor类的内部类ResourceManagerRegistrationListener的establishResourceManagerConnection方法
private void establishResourceManagerConnection(...) {
final CompletableFuture<Acknowledge> slotReportResponseFuture = resourceManagerGateway.sendSlotReport(...);
...
}
点sendSlotReport方法到ResourceManager类
public CompletableFuture<Acknowledge> sendSlotReport(...) {
...
slotManager.registerTaskManager(workerTypeWorkerRegistration, slotReport)
...
}
点registerTaskManager方法到SlotManagerImpl类
public boolean registerTaskManager(...) {
...
// next register the new slots
for (SlotStatus slotStatus : initialSlotReport) {
registerSlot(
slotStatus.getSlotID(),
slotStatus.getAllocationID(),
slotStatus.getJobID(),
slotStatus.getResourceProfile(),
taskExecutorConnection);
}
...
}
点registerSlot方法
private void registerSlot(...) {
...
/*TODO 创建和注册 新的这些 slot*/
final TaskManagerSlot slot = createAndRegisterTaskManagerSlot(slotId, resourceProfile, taskManagerConnection);
/*TODO 分配slot*/
if (assignedPendingSlotRequest == null) {
/*TODO 表示 挂起的请求都已经满足了,你暂时没事*/
handleFreeSlot(slot);
} else {
/*TODO 表示 你要被分配给某个请求*/
assignedPendingSlotRequest.unassignPendingTaskManagerSlot();
allocateSlot(slot, assignedPendingSlotRequest);
}
}
点allocateSlot方法
private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
...
TaskExecutorGateway gateway = taskExecutorConnection.getTaskExecutorGateway();
// RPC call to the task manager
/*TODO 分配完之后,通知 TM提供 slot给 JM*/
CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
slotId,
pendingSlotRequest.getJobId(),
allocationId,
pendingSlotRequest.getResourceProfile(),
pendingSlotRequest.getTargetAddress(),
resourceManagerId,
taskManagerRequestTimeout);
...
}
点requestSlot方法到TaskExecutor类
public CompletableFuture<Acknowledge> requestSlot(
...
/*TODO 根据 RM的命令,分配自己的slot*/
allocateSlot(
slotId,
jobId,
allocationId,
resourceProfile);
...
/*TODO 向JobManager提供 slot*/
offerSlotsToJobManager(jobId);
...
}
a.分配自己的slot,点allocateSlot方法到TaskSlotTableImpl类
b.向JobManager提供 slot,点offerSlotsToJobManager方法直到本类的internalOfferSlotsToJobManager方法
private void internalOfferSlotsToJobManager(JobTable.Connection jobManagerConnection) {
...
final JobMasterGateway jobMasterGateway = jobManagerConnection.getJobManagerGateway();
...
CompletableFuture<Collection<SlotOffer>> acceptedSlotsFuture = jobMasterGateway.offerSlots(
getResourceID(),
reservedSlots,
taskManagerConfiguration.getTimeout());
...
}
点offerSlots方法到JobMaster类
public CompletableFuture<Collection<SlotOffer>> offerSlots(...) {
...
return CompletableFuture.completedFuture(
slotPool.offerSlots(
taskManagerLocation,
rpcTaskManagerGateway,
slots));
}
再点offerSlots到SlotPoolImpl类
public Collection<SlotOffer> offerSlots(...) {
ArrayList<SlotOffer> result = new ArrayList<>(offers.size());
for (SlotOffer offer : offers) {
if (offerSlot(
taskManagerLocation,
taskManagerGateway,
offer)) {
result.add(offer);
}
}
return result;
}
点offerSlot方法
boolean offerSlot(
final TaskManagerLocation taskManagerLocation,
final TaskManagerGateway taskManagerGateway,
final SlotOffer slotOffer) {
...
* @param taskManagerLocation location from where the offer comes from
* @param taskManagerGateway TaskManager gateway
* @param slotOffer the offered slot
* @return True if we accept the offering
}