Flink源码分析（一）RPC通信和JobManager启动

雪碧没有冰块

已于 2023-11-19 23:42:58 修改

阅读量316

点赞数

文章标签： flink 大数据

于 2023-10-02 00:36:44 首次发布

本文链接：https://blog.csdn.net/weixin_37172178/article/details/133467892

版权

写在前面

1. Flink RPC详解

Flink使用Akka+Netty框架实现RPC通信，之前在spark框架源码剖析过程中已经对Akka实现RPC通信过程有所介绍，这里不做过多描述。相关概念说明如下：

ActorSystem是管理Actor生命周期的组件，Actor是负责进行通信的组件。
每一个Actor都有一个MailBox，别的Actor发送给它的消息都首先存储在MailBox中，通过这种方式可以实现异步通信。
每个Actor都是单线程的处理方式，不断地从MailBox中拉取消息进行执行处理，所以对于Actor的消息，不适合调用会阻塞的处理方法。
Actor可以改变自身的状态，可以接受消息，也可以发送消息，还可以生成新的Actor。
每一个ActorSystem和Actor都在启动的时候会给定一个name。
如果一个Actor要和另一个Actor进行通信，则必须先获取对方Actor的ActorRef对象，然后通过该对象发送消息即可。
通过tell发送异步消息，不接收响应，通过ask发送异步消息，得到Future返回，通过异步返回到返回处理结果。

Flink中RPC实现主要是在flink-runtime模块下的org.apache.flink.runtime.rpc包中，涉及到的API主要是以下四个：

RpcGateway：路由，RPC的老祖宗，各种其他RPC组件，都是RPCGateway的子类
RpcServer：RpcService和RpcEndpoint之间粘合层
RpcEndpoint：业务逻辑载体，对应Actor的封装
RpcService：对应ActorSystem的封装

注：RpcEndpoint下面有四个比较重要的子类：TaskExecutor、Dispatcher、JobMaster、ResourceManager

RpcEndpoint

public abstract class RpcEndpoint implements RpcGateway, AutoCloseableAsync{
	// 只要当前的RpcEndpoint被实例化成功之后，调用onStart()方法，不是直接调用
	protected void onStart() throws Exception {}
	
	//当前RpcEndpoint需要被销毁的时候，在销毁之前，调用这个方法执行一次
	public final CompletableFuture<Void> internalCallOnStop()
	
	protected RpcEndpoint(final RpcService rpcService, final String endpointId) {
		//启动RPC服务
		//12.1、启动ResourceManager的RPC服务端，接收TaskManager的汇报信息
		this.rpcServer = rpcService.startServer(this);
	}
}

2. Flink 集群启动脚本分析

Flink集群的启动脚本在：flink-dist子项目中，位于flink-bin下的bin目录，启动脚本为start-cluster.sh。该脚本会首先调用config.sh来获取masters和workers，masters的配置信息位于conf/master文件中，workers的配置信息位于conf/workers中。
在start-cluster.sh中，分别通过执行jobmanager.sh和taskmanager.sh分别启动JobManager和TaskManager。
在jobmanager.sh和taskmanager.sh中，通过调用flink-daemon.sh来启动JVM进程，具体来说：JobManager启动参数为standalonesession，实现类是：org.apache.flink.container.entrypointorg.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint；
TaskManager启动参数为taskexecutor，实现类是：org.apache.flink.runtime.taskexecutor.TaskManagerRunner。

3. Flink主节点JobManager启动分析

JobManager是Flink集群的主节点，它包含四个重要的组件：

ResourceManager：Flink的集群资源管理器，只有一个，完成对slot的管理与申请等资源事项。
Dispatcher：负责接收用户提交的JobGraph，然后启动一个JobManager，类似与Yarn集群中的AppMaster角色。
JobManager：负责一个具体的job的执行，在一个集群中，可以存在多个JobManager同时执行，类似于Spark Job中的Driver角色。新版本中的JobMaster
WebMonitorEndpoint：维护多个handler，如果客户端使用flink run的方式提交一个job到flink集群，由WebMonitorEndpoint来接收后，决定使用哪一个handler来执行处理。

总的来说，Flink集群的主节点运行ResourceManager和Dispatcher，当客户端提交一个job到集群中，Dispatcher拉起一个JobManager来负责这个job中task的执行，执行过程中所需要的资源，则通过JobManager向ResourceManager申请。

注：在Flink的心跳机制中，和其他集群不一样：
1、ResourceManager发送心跳给从节点TaskManager
2、从节点收到心跳信息之后，返回相应

StandaloneSessionClusterEntrypoint

public class StandaloneSessionClusterEntrypoint extends SessionClusterEntrypoint {

	public static void main(String[] args){
	//1、注册钩子函数，当集群出问题之前，可以关闭各种组件
	JvmShutdownSafeguard.installAsShutdownHook(LOG);
	
	//2、解析flink配置文件：flink-conf.yaml
	Configuration configuration = loadConfiguration(entrypointClusterConfiguration);
	
	//3、创建StandaloneSessionClusterEntrypoint对象
	StandaloneSessionClusterEntrypoint entrypoint = new StandaloneSessionClusterEntrypoint(configuration);

	//4、这个方法接受的父类是ClusterEntrypoint，其它几种启动方式也是通过这种方法
	ClusterEntrypoint.runClusterEntrypoint(entrypoint);
	}
	
}

ClusterEntrypoint

public abstract class ClusterEntrypoint implements AutoCloseableAsync, FatalErrorHandler {

	public static void runClusterEntrypoint(ClusterEntrypoint clusterEntrypoint){
		//5、启动主节点，JobManager
		clusterEntrypoint.startCluster();
	}

	public void startCluster() throws ClusterEntrypointException {
		//6、PluginManager负责管理集群插件，这些插件是单独使用的类加载器加载的，从而不干扰Flink的依赖关系
		PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);

		/*
		7、根据初始化配置文件系统
		1.本地 local 客户端的时候会用 JobGraph——>JobGraphFile，√
		2.HDFS FileSystem(DistributedFileSystem)
		3.封装对象 HadoopFileSystem，里面包装了Hdfs的FileSystem对象，√
		*/
		configureFileSystems(configuration, pluginManager);
		runCluster(configuration, pluginManager);
	}

	private void runCluster(Configuration configuration, PluginManager pluginManager) throws Exception{
	/*
	8、初始化master节点要使用到的一些服务
	1.commonRpcService：基于Akka的RpcService实现。Rpc服务启动参与者来接收从RpcGateway调用RPC
	2.haServices：提高对高可用性所需要服务的访问注册，分布式计数器和领导人选举
	3.blobServer：负责监听传入的请求，生成线程来处理这些请求
	4.heartbeatServices：提供心跳所需的所有服务，包括创建心跳接收器和心跳发送者
	5.metricRegistry：跟踪已注册的metric，用来连接MetricGroup和MetricReporter
	6.archivedExecutionGraphStore：存储执行图ExecutionGraph的可序列化形式
	*/
	initializeServices(configuration, pluginManager);

	/*
	9、内部初始化了四大工厂实例
	1.DispatcherRunnerFactory
	2.ResourceManagerFactory
	3.RestEndpointFactory
	4.返回值：DispatcherResourceManagerComponentFactory，内部包含了这三个工厂实例，就是三个成员变量
	*/
	final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);
	
	/*
	10、创建并启动三个关键组件：Dispatcher、ResourceManager、WebMonitorEndpoint
	*/
	clusterComponent = dispatcherResourceManagerComponentFactory.create(...);
	}

	protected void initializeServices(Configuration configuration, PluginManager pluginManager) throws Exception {
	/*
	8.1、commonRpcService其实是一个基于akka得到ActorSystem，基于TCP的RPC服务，端口：6123
	1、初始化ActorSystem
	2、启动Actor
	*/
	commonRpcService = AkkaRpcServiceUtils.createRemoteRpcService(...);

	//8.2、初始化ioExecutor，默认启动的线程数量是cpu核数*4
	ioExecutor = Executors.newFixedThreadPool(
						ClusterEntrypointUtils.getPoolSize(configuration),
						new ExecutorThreadFactory("cluster-io"));
	//8.3、haServices = ZooKeeperHaServices
	haServices = createHaServices(configuration, ioExecutor);
	
	//8.4、初始化一个BlobServer，管理大文件的上传，如用户作业的jar包、TM上传的log文件
	//Blob指Binary Large Object
	blobServer = new BlobServer(configuration, haServices.createBlobStore());
	blobServer.start();

	/*
	8.5、初始化一个心跳服务
	在主节点中，其它角色的心跳服务，都是建立在heartbeatServices基础之上。
	需要心跳服务的角色，通过heartbeatServices提供一个HeartBeatImpl，完成心跳
	*/
	heartbeatServices = createHeartbeatServices(configuration);

	/*
	8.6、metrics（性能监控）相关的服务
	1.metricQueryServiceRpcService 也是一个ActorSystem
	2.用来跟踪已注册的metric
	*/
    metricRegistry = createMetricRegistry(configuration, pluginManager);
    final RpcService metricQueryServiceRpcService = MetricUtils.startRemoteMetricsRpcService(configuration, commonRpcService.getAddress());
	metricRegistry.startQueryService(metricQueryServiceRpcService, null);
	
	/*
	8.7、archivedExecutionGraphStore：存储Execution Graph服务，有两种实现方式
	1.MemoryArchivedExecutionGraphStore，基于内存缓存
	2.FileArchivedExecutionGraphStore，持久化到文件系统，内存中也会缓存，默认缓存方式
	这些服务会在DispatcherResourceManagerComponent对象时使用
	*/
	archivedExecutionGraphStore = createSerializableExecutionGraphStore(configuration, commonRpcService.getScheduledExecutor());
	}
	
}

DefaultDispatcherResourceManagerComponentFactory

public class DefaultDispatcherResourceManagerComponentFactory implements DispatcherResourceManagerComponentFactory {
	
	@Override
	public DispatcherResourceManagerComponent create(...){
	//11、创建WebMonitorEndpoint实例，webMonitorEndpoint = DispatcherRestEndpoint
	webMonitorEndpoint = restEndpointFactory.createRestEndpoint(...);
	webMonitorEndpoint.start();	

	//12、创建StandaloneResourceManager示例，
	resourceManager = resourceManagerFactory.createResourceManager(...);
	
	//13、创建并启动Dispatcher，以前使用dispatcher.start()启动
	dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(...);
	resourceManager.start();
	}

}

Supervisor

private static final class Supervisor implements AutoCloseableAsync {

	//8.1.1、Supervisor是对Actor的封装
	private Supervisor(ActorRef actor, ExecutorService terminationFutureExecutor) {
		this.actor = actor;
		this.terminationFutureExecutor = terminationFutureExecutor;
	}

}

HighAvailabilityServicesUtils

public class HighAvailabilityServicesUtils {

	public static HighAvailabilityServices createHighAvailabilityServices(...){
		//8.2.1、获取HA模式，在flink-conf.yaml配置文件中，配置high-availability = zookeeper
		HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
		
		switch (highAvailabilityMode) {
			case ZOOKEEPER:
			    //8.2.2、创建BlobStoreService
				BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);
				//8.2.3、创建ZooKeeperHaServices，包装了一个Zookeeper实例对象，通过Curator框架实现
				return new ZooKeeperHaServices(...);
		}
	}
	
}

RestServerEndpoint

public abstract class RestServerEndpoint implements AutoCloseableAsync {

	public final void start() throws Exception {
		//11.1、初始化各种handler，包括：JobSubmitHandler
		handlers = initializeHandlers(restAddressFuture);
		
		//11.2、按照RestHandlerUrlComparator将handlers进行排序
		Collections.sort(handlers,RestHandlerUrlComparator.INSTANCE);

		//11.3、启动Netty服务端
		ChannelInitializer<SocketChannel> initializer = new ChannelInitializer<SocketChannel>() {...}
		...
		//至此，主节点上的WebMonitorEndpoint组件的Netty服务端启动完毕。在客户端提交任务的时候，其会启动相应的Netty的客户端
		state = State.RUNNING;
		
		//11.4、启动WebMonitorEndpoint服务
		startInternal();
		};

	}

}

DispatcherRestEndpoint

public class DispatcherRestEndpoint extends WebMonitorEndpoint<DispatcherGateway> {
	
	protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
		//11.1.1、父类WebMonitorEndpoint中初始化众多handler
		List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = super.initializeHandlers(localAddressFuture);
		
		//11.1.3、添加JobSubmitHandler，任务提交处理器
		handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
	}
	
}

WebMonitorEndpoint

public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint implements LeaderContender, JsonArchivist {
	protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
		/*
		11.1.2、初始化一个ArrayList容器
		ChannelInboundHandler：channelRead0()方法，会自动被Netty去调用执行，入栈处理器
		channelRead0()的底层，最终调用的是handler.handleRequest()方法
		客户端提交job的后，WebMonitorEndpoint接收到，交由JobSubmitHandler执行，最终执行请求的是handleRequest()
		*/
		ArrayList<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = new ArrayList<>(30);
        
        //这些handler的作用，对应到Flink web业务的rest服务，可以把handler理解为servlet
        ///jobs/:jobid
		handlers.add(Tuple2.of(JobManagerLogFileHeader.getInstance(), jobManagerLogFileHandler));
		handlers.add(Tuple2.of(JobManagerStdoutFileHeader.getInstance(), jobManagerStdoutFileHandler));
		handlers.add(Tuple2.of(JobManagerCustomLogHeaders.getInstance(), jobManagerCustomLogHandler));
		handlers.add(Tuple2.of(JobManagerLogListHeaders.getInstance(), jobManagerLogListHandler));
		...
	}
	
	public void startInternal() throws Exception {
		/*
		11.4.1、ZooKeeperLeaderElectionService执行选举，Dispatcher和ResourceManager也会执行选举，从而触发服务的启动
		1.选举成功，调用leaderElectionService.isLeader()
		2.选举失败，调用leaderElectionService.notLeader()
		*/
		leaderElectionService.start(this);
		//11.4.2、开启定时任务
		startExecutionGraphCacheCleanupTask();
	}

	private void startExecutionGraphCacheCleanupTask() {
		/*
		11.4.2.1、最终执行的方法是executionGraphCache.cleanup()，清理那些执行完成的executionGraph
		cachedExecutionGraphs.values().removeIf((ExecutionGraphEntry entry) -> currentTime >= entry.getTTL());
		*/
		executionGraphCleanupTask = executor.scheduleWithFixedDelay(
			executionGraphCache::cleanup,
			cleanupInterval,
			cleanupInterval,
			TimeUnit.MILLISECONDS);
	}

}

ZooKeeperLeaderElectionService

public class ZooKeeperLeaderElectionService implements LeaderLatchListener... {

	/*
	Zookeeper的API框架cruator机制：
	当前这个类是LeaderLatchListener的子类，
	所以当选举成功的时候，会自动调用isLeader()方法，否则调用notLeader()方法
	*/
	public void start(LeaderContender contender) throws Exception {
		leaderContender = contender;
		leaderLatch.addListener(this);
		//11.4.1.2、执行选举
		leaderLatch.start();
	}
	
	public void isLeader() {
		/*
		11.4.1.3、成为leader后
		leaderElectionService.start(this);
		leaderContender = this = WebMonitorEndpoint
		
		其它组件启动时leaderContender = ResourceManager/DefaultDispatcherRunner
		*/
		leaderContender.grantLeadership(issuedLeaderSessionID);
	}
	
}

AkkaRpcService

public class AkkaRpcService implements RpcService {

	public <C extends RpcEndpoint & RpcGateway> RpcServer startServer(C rpcEndpoint){
		//12.2、获取hostname和port
		final String akkaAddress = AkkaUtils.getAkkaURL(actorSystem, actorRef);
		...
		
		//12.3、定义接口处理器
		final InvocationHandler akkaInvocationHandler;

		//12.4、通过代理的方式创建一个RpcServer
		RpcServer server = (RpcServer) Proxy.newProxyInstance(...akkaInvocationHandler);
		
	}
}

ResourceManager

public abstract class ResourceManager {

	//12.5、执行onStart()方法，开启ResourceManager服务
	private void startResourceManagerServices() throws Exception {
		//12.6、进行选举，选举成功，调用leaderElectionService.isLeader()
		leaderElectionService.start(this);
	}

	
	public void grantLeadership(final UUID newLeaderSessionID) {
		//12.7、异步调用tryAcceptLeadership(...)方法
		acceptLeadershipFuture = clearStateFuture.thenComposeAsync(
		(ignored) -> tryAcceptLeadership(newLeaderSessionID), 
		getUnfencedMainThreadExecutor());
	}

	protected void startServicesOnLeadership() {
		//12.8、开启心跳服务
		startHeartbeatServices();

		//12.13、启动SlotManager
		slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
	}
	
	private void startHeartbeatServices() {
		//12.8.1、提供和TaskManager心跳相关的服务，关心TaskManager的死活
		taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
		
		//12.8.2、提供和JobManager相关的服务，每一个Job都会启动的一个主控程序
		jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
	}
}

HeartbeatManagerSenderImpl

public class HeartbeatManagerSenderImpl<I, O> ... implements Runnable {
		
	HeartbeatManagerSenderImpl(...) {
	//12.9、调度当前类实例的run()方法执行
	mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);
	}

	public void run() {
		//12.10、实现循环执行控制参数
		if (!stopped) {
			//12.11、发送心跳信息
			requestHeartbeat(heartbeatMonitor);
			
			//12.12、实现循环
			getMainThreadExecutor().schedule(this, heartbeatPeriod, TimeUnit.MILLISECONDS);
		}
	}
	
	/*
	12.11.1、发送心跳信息详解
	HeartbeatMonitor：管理所有的心跳目标对象，如果从节点返回心跳响应，则会被加入到HeartbeatMonitor
	heartbeatTarget：集群中启动的从节点，TaskExecutor
	*/
	private void requestHeartbeat(HeartbeatMonitor<O> heartbeatMonitor) {
		heartbeatTarget.requestHeartbeat(getOwnResourceID(), payload);
	}


}

SlotManagerImpl

public class SlotManagerImpl implements SlotManager {
	
	public void start(...) {
		//12.13.1、开启定时任务checkTaskManagerTimeouts，检查TaskManager的心跳
		taskManagerTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
		
		//12.13.1、开启定时任务checkTaskManagerTimeouts，检查SlotRequest 超时处理
		slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
	}	

}

DefaultDispatcherRunner

public final class DefaultDispatcherRunner implements DispatcherRunner, LeaderContender {

	public static DispatcherRunner create(...) throws Exception {
		//13.1、创建DefaultDispatcherRunner
		final DefaultDispatcherRunner dispatcherRunner = new DefaultDispatcherRunner();
		//13.2、开启DefaultDispatcherRunner的生命周期，leaderElectionService为选举服务
		return DispatcherRunnerLeaderElectionLifecycleManager.createFor(dispatcherRunner, leaderElectionService);
	}
}

DispatcherRunnerLeaderElectionLifecycleManager

final class DispatcherRunnerLeaderElectionLifecycleManager implements DispatcherRunner {
	
	private DispatcherRunnerLeaderElectionLifecycleManager(...) throws Exception {
		/*
		13.3、leaderElectionService.start(this);
		leaderElectionService内部的选举对象leaderContender是DefaultDispatcherRunner
		*/
		leaderElectionService.start(dispatcherRunner);
	}

	//13.4、选举完成成为主节点后...
	public void grantLeadership(UUID leaderSessionID) {
		runActionIfRunning(() -> startNewDispatcherLeaderProcess(leaderSessionID));
	}

	//13.5、调用DispatcherLeaderProcess的start()方法
	private void startNewDispatcherLeaderProcess(UUID leaderSessionID) {
		//停掉已有的DispatcherLeaderProcess
		stopDispatcherLeaderProcess();
		
		//创建新的DispatcherLeaderProcess
		final DispatcherLeaderProcess =...;
		
		//newDispatcherLeaderProcess::start
		FutureUtils.assertNoException(
		previousDispatcherLeaderProcessTerminationFuture.thenRun(
		newDispatcherLeaderProcess::start
		));
	}	
}

AbstractDispatcherLeaderProcess

public abstract class AbstractDispatcherLeaderProcess implements DispatcherLeaderProcess {

	private void startInternal() {
		log.info("Start {}.", getClass().getSimpleName());
		//13.6、DispatcherLeaderProcess已经启动，改变状态
		state = State.RUNNING;
		onStart();
	}
	
}

SessionDispatcherLeaderProcess

public class SessionDispatcherLeaderProcess ... {
	protected void onStart() {
		//13.7、开启服务：启动JobGraphStore，一个用来存储JobGraph的存储组件
		startServices();
		
		//13.8、开始创建Dispatcher
		onGoingRecoveryOperation = recoverJobsAsync()
			.thenAccept(this::createDispatcherIfRunning)
			.handle(this::onErrorIfRunning);
	}
}

DefaultDispatcherGatewayServiceFactory

class DefaultDispatcherGatewayServiceFactory implements ... {

	public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(...) {
		//13.9、创建Dispatcher
		dispatcher = dispatcherFactory.createDispatcher(...);
		//13.12、Dispatcher启动后，发送一个hello消息给自己，说明启动成功
		dispatcher.start();
	}
}

Dispatcher

public abstract class Dispatcher ... {
	
	//13.10、执行onStart()方法
	public void onStart() throws Exception {
		//启动Dispatcher服务
		startDispatcherServices();
		//13.11、引导程序初始化，把所有中断的job恢复执行
		dispatcherBootstrap.initialize(...);
	}

	//*/13.11.2、客户端提交job的时候，由Dispatcher接收到提交执行
	private CompletableFuture<Void> runJob(JobGraph jobGraph) {
		//提交任务 == start JobManagerRunner，封装了一个JobManager
		return jobManagerRunnerFuture
			.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner))
			...
	}
	
}

DefaultDispatcherBootstrap

public class DefaultDispatcherBootstrap extends AbstractDispatcherBootstrap {

	public void initialize(...) {
		/*
		13.11.1、recoveredJobs：待恢复的job
		AbstractDispatcherBootstrap底层：dispatcher.runRecoveredJob(recoveredJob);
		*/
		launchRecoveredJobGraphs(dispatcher, recoveredJobs);
	
		//13.11.2、恢复之后，清空
		recoveredJobs.clear();
	}

}