Flink源码分析(一)RPC通信和JobManager启动

写在前面

1. Flink RPC详解

Flink使用Akka+Netty框架实现RPC通信,之前在spark框架源码剖析过程中已经对Akka实现RPC通信过程有所介绍,这里不做过多描述。相关概念说明如下:

  • ActorSystem是管理Actor生命周期的组件,Actor是负责进行通信的组件。
  • 每一个Actor都有一个MailBox,别的Actor发送给它的消息都首先存储在MailBox中,通过这种方式可以实现异步通信。
  • 每个Actor都是单线程的处理方式,不断地从MailBox中拉取消息进行执行处理,所以对于Actor的消息,不适合调用会阻塞的处理方法。
  • Actor可以改变自身的状态,可以接受消息,也可以发送消息,还可以生成新的Actor。
  • 每一个ActorSystem和Actor都在启动的时候会给定一个name。
  • 如果一个Actor要和另一个Actor进行通信,则必须先获取对方Actor的ActorRef对象,然后通过该对象发送消息即可。
  • 通过tell发送异步消息,不接收响应,通过ask发送异步消息,得到Future返回,通过异步返回到返回处理结果。

Flink中RPC实现主要是在flink-runtime模块下的org.apache.flink.runtime.rpc包中,涉及到的API主要是以下四个:

  • RpcGateway:路由,RPC的老祖宗,各种其他RPC组件,都是RPCGateway的子类
  • RpcServer:RpcService和RpcEndpoint之间粘合层
  • RpcEndpoint:业务逻辑载体,对应Actor的封装
  • RpcService:对应ActorSystem的封装
    RpcEndpoint子类关系图

注:RpcEndpoint下面有四个比较重要的子类:TaskExecutor、Dispatcher、JobMaster、ResourceManager

RpcEndpoint

public abstract class RpcEndpoint implements RpcGateway, AutoCloseableAsync{
	// 只要当前的RpcEndpoint被实例化成功之后,调用onStart()方法,不是直接调用
	protected void onStart() throws Exception {}
	
	//当前RpcEndpoint需要被销毁的时候,在销毁之前,调用这个方法执行一次
	public final CompletableFuture<Void> internalCallOnStop()
	
	protected RpcEndpoint(final RpcService rpcService, final String endpointId) {
		//启动RPC服务
		//12.1、启动ResourceManager的RPC服务端,接收TaskManager的汇报信息
		this.rpcServer = rpcService.startServer(this);
	}
}

2. Flink 集群启动脚本分析

Flink集群的启动脚本在:flink-dist子项目中,位于flink-bin下的bin目录,启动脚本为start-cluster.sh。该脚本会首先调用config.sh来获取masters和workers,masters的配置信息位于conf/master文件中,workers的配置信息位于conf/workers中。
在start-cluster.sh中,分别通过执行jobmanager.shtaskmanager.sh分别启动JobManager和TaskManager。
在jobmanager.sh和taskmanager.sh中,通过调用flink-daemon.sh来启动JVM进程,具体来说:JobManager启动参数为standalonesession,实现类是:org.apache.flink.container.entrypointorg.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint;
TaskManager启动参数为taskexecutor,实现类是:org.apache.flink.runtime.taskexecutor.TaskManagerRunner。

3. Flink主节点JobManager启动分析

JobManager是Flink集群的主节点,它包含四个重要的组件:

  • ResourceManager:Flink的集群资源管理器,只有一个,完成对slot的管理与申请等资源事项。
  • Dispatcher:负责接收用户提交的JobGraph,然后启动一个JobManager,类似与Yarn集群中的AppMaster角色。
  • JobManager:负责一个具体的job的执行,在一个集群中,可以存在多个JobManager同时执行,类似于Spark Job中的Driver角色。新版本中的JobMaster
  • WebMonitorEndpoint:维护多个handler,如果客户端使用flink run的方式提交一个job到flink集群,由WebMonitorEndpoint来接收后,决定使用哪一个handler来执行处理。

总的来说,Flink集群的主节点运行ResourceManager和Dispatcher,当客户端提交一个job到集群中,Dispatcher拉起一个JobManager来负责这个job中task的执行,执行过程中所需要的资源,则通过JobManager向ResourceManager申请。

:在Flink的心跳机制中,和其他集群不一样:
1、ResourceManager发送心跳给从节点TaskManager
2、从节点收到心跳信息之后,返回相应

StandaloneSessionClusterEntrypoint

public class StandaloneSessionClusterEntrypoint extends SessionClusterEntrypoint {

	public static void main(String[] args){
	//1、注册钩子函数,当集群出问题之前,可以关闭各种组件
	JvmShutdownSafeguard.installAsShutdownHook(LOG);
	
	//2、解析flink配置文件:flink-conf.yaml
	Configuration configuration = loadConfiguration(entrypointClusterConfiguration);
	
	//3、创建StandaloneSessionClusterEntrypoint对象
	StandaloneSessionClusterEntrypoint entrypoint = new StandaloneSessionClusterEntrypoint(configuration);

	//4、这个方法接受的父类是ClusterEntrypoint,其它几种启动方式也是通过这种方法
	ClusterEntrypoint.runClusterEntrypoint(entrypoint);
	}
	
}

ClusterEntrypoint

public abstract class ClusterEntrypoint implements AutoCloseableAsync, FatalErrorHandler {

	public static void runClusterEntrypoint(ClusterEntrypoint clusterEntrypoint){
		//5、启动主节点,JobManager
		clusterEntrypoint.startCluster();
	}

	public void startCluster() throws ClusterEntrypointException {
		//6、PluginManager负责管理集群插件,这些插件是单独使用的类加载器加载的,从而不干扰Flink的依赖关系
		PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);

		/*
		7、根据初始化配置文件系统
		1.本地 local 客户端的时候会用 JobGraph——>JobGraphFile,√
		2.HDFS FileSystem(DistributedFileSystem)
		3.封装对象 HadoopFileSystem,里面包装了Hdfs的FileSystem对象,√
		*/
		configureFileSystems(configuration, pluginManager);
		runCluster(configuration, pluginManager);
	}

	private void runCluster(Configuration configuration, PluginManager pluginManager) throws Exception{
	/*
	8、初始化master节点要使用到的一些服务
	1.commonRpcService:基于Akka的RpcService实现。Rpc服务启动参与者来接收从RpcGateway调用RPC
	2.haServices:提高对高可用性所需要服务的访问注册,分布式计数器和领导人选举
	3.blobServer:负责监听传入的请求,生成线程来处理这些请求
	4.heartbeatServices:提供心跳所需的所有服务,包括创建心跳接收器和心跳发送者
	5.metricRegistry:跟踪已注册的metric,用来连接MetricGroup和MetricReporter
	6.archivedExecutionGraphStore:存储执行图ExecutionGraph的可序列化形式
	*/
	initializeServices(configuration, pluginManager);

	/*
	9、内部初始化了四大工厂实例
	1.DispatcherRunnerFactory
	2.ResourceManagerFactory
	3.RestEndpointFactory
	4.返回值:DispatcherResourceManagerComponentFactory,内部包含了这三个工厂实例,就是三个成员变量
	*/
	final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);
	
	/*
	10、创建并启动三个关键组件:Dispatcher、ResourceManager、WebMonitorEndpoint
	*/
	clusterComponent = dispatcherResourceManagerComponentFactory.create(...);
	}

	protected void initializeServices(Configuration configuration, PluginManager pluginManager) throws Exception {
	/*
	8.1、commonRpcService其实是一个基于akka得到ActorSystem,基于TCP的RPC服务,端口:6123
	1、初始化ActorSystem
	2、启动Actor
	*/
	commonRpcService = AkkaRpcServiceUtils.createRemoteRpcService(...);

	//8.2、初始化ioExecutor,默认启动的线程数量是cpu核数*4
	ioExecutor = Executors.newFixedThreadPool(
						ClusterEntrypointUtils.getPoolSize(configuration),
						new ExecutorThreadFactory("cluster-io"));
	//8.3、haServices = ZooKeeperHaServices
	haServices = createHaServices(configuration, ioExecutor);
	
	//8.4、初始化一个BlobServer,管理大文件的上传,如用户作业的jar包、TM上传的log文件
	//Blob指Binary Large Object
	blobServer = new BlobServer(configuration, haServices.createBlobStore());
	blobServer.start();

	/*
	8.5、初始化一个心跳服务
	在主节点中,其它角色的心跳服务,都是建立在heartbeatServices基础之上。
	需要心跳服务的角色,通过heartbeatServices提供一个HeartBeatImpl,完成心跳
	*/
	heartbeatServices = createHeartbeatServices(configuration);

	/*
	8.6、metrics(性能监控)相关的服务
	1.metricQueryServiceRpcService 也是一个ActorSystem
	2.用来跟踪已注册的metric
	*/
    metricRegistry = createMetricRegistry(configuration, pluginManager);
    final RpcService metricQueryServiceRpcService = MetricUtils.startRemoteMetricsRpcService(configuration, commonRpcService.getAddress());
	metricRegistry.startQueryService(metricQueryServiceRpcService, null);
	
	/*
	8.7、archivedExecutionGraphStore:存储Execution Graph服务,有两种实现方式
	1.MemoryArchivedExecutionGraphStore,基于内存缓存
	2.FileArchivedExecutionGraphStore,持久化到文件系统,内存中也会缓存,默认缓存方式
	这些服务会在DispatcherResourceManagerComponent对象时使用
	*/
	archivedExecutionGraphStore = createSerializableExecutionGraphStore(configuration, commonRpcService.getScheduledExecutor());
	}
	
}

DefaultDispatcherResourceManagerComponentFactory

public class DefaultDispatcherResourceManagerComponentFactory implements DispatcherResourceManagerComponentFactory {
	
	@Override
	public DispatcherResourceManagerComponent create(...){
	//11、创建WebMonitorEndpoint实例,webMonitorEndpoint = DispatcherRestEndpoint
	webMonitorEndpoint = restEndpointFactory.createRestEndpoint(...);
	webMonitorEndpoint.start();	

	//12、创建StandaloneResourceManager示例,
	resourceManager = resourceManagerFactory.createResourceManager(...);
	
	//13、创建并启动Dispatcher,以前使用dispatcher.start()启动
	dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(...);
	resourceManager.start();
	}

}

Supervisor

private static final class Supervisor implements AutoCloseableAsync {

	//8.1.1、Supervisor是对Actor的封装
	private Supervisor(ActorRef actor, ExecutorService terminationFutureExecutor) {
		this.actor = actor;
		this.terminationFutureExecutor = terminationFutureExecutor;
	}

}

HighAvailabilityServicesUtils

public class HighAvailabilityServicesUtils {

	public static HighAvailabilityServices createHighAvailabilityServices(...){
		//8.2.1、获取HA模式,在flink-conf.yaml配置文件中,配置high-availability = zookeeper
		HighAvailabilityMode highAvailabilityMode = HighAvailabilityMode.fromConfig(configuration);
		
		switch (highAvailabilityMode) {
			case ZOOKEEPER:
			    //8.2.2、创建BlobStoreService
				BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);
				//8.2.3、创建ZooKeeperHaServices,包装了一个Zookeeper实例对象,通过Curator框架实现
				return new ZooKeeperHaServices(...);
		}
	}
	
}

RestServerEndpoint

public abstract class RestServerEndpoint implements AutoCloseableAsync {

	public final void start() throws Exception {
		//11.1、初始化各种handler,包括:JobSubmitHandler
		handlers = initializeHandlers(restAddressFuture);
		
		//11.2、按照RestHandlerUrlComparator将handlers进行排序
		Collections.sort(handlers,RestHandlerUrlComparator.INSTANCE);

		//11.3、启动Netty服务端
		ChannelInitializer<SocketChannel> initializer = new ChannelInitializer<SocketChannel>() {...}
		...
		//至此,主节点上的WebMonitorEndpoint组件的Netty服务端启动完毕。在客户端提交任务的时候,其会启动相应的Netty的客户端
		state = State.RUNNING;
		
		//11.4、启动WebMonitorEndpoint服务
		startInternal();
		};

	}

}

DispatcherRestEndpoint

public class DispatcherRestEndpoint extends WebMonitorEndpoint<DispatcherGateway> {
	
	protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
		//11.1.1、父类WebMonitorEndpoint中初始化众多handler
		List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = super.initializeHandlers(localAddressFuture);
		
		//11.1.3、添加JobSubmitHandler,任务提交处理器
		handlers.add(Tuple2.of(jobSubmitHandler.getMessageHeaders(), jobSubmitHandler));
	}
	
}

WebMonitorEndpoint

public class WebMonitorEndpoint<T extends RestfulGateway> extends RestServerEndpoint implements LeaderContender, JsonArchivist {
	protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(final CompletableFuture<String> localAddressFuture) {
		/*
		11.1.2、初始化一个ArrayList容器
		ChannelInboundHandler:channelRead0()方法,会自动被Netty去调用执行,入栈处理器
		channelRead0()的底层,最终调用的是handler.handleRequest()方法
		客户端提交job的后,WebMonitorEndpoint接收到,交由JobSubmitHandler执行,最终执行请求的是handleRequest()
		*/
		ArrayList<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = new ArrayList<>(30);
        
        //这些handler的作用,对应到Flink web业务的rest服务,可以把handler理解为servlet
        ///jobs/:jobid
		handlers.add(Tuple2.of(JobManagerLogFileHeader.getInstance(), jobManagerLogFileHandler));
		handlers.add(Tuple2.of(JobManagerStdoutFileHeader.getInstance(), jobManagerStdoutFileHandler));
		handlers.add(Tuple2.of(JobManagerCustomLogHeaders.getInstance(), jobManagerCustomLogHandler));
		handlers.add(Tuple2.of(JobManagerLogListHeaders.getInstance(), jobManagerLogListHandler));
		...
	}
	
	public void startInternal() throws Exception {
		/*
		11.4.1、ZooKeeperLeaderElectionService执行选举,Dispatcher和ResourceManager也会执行选举,从而触发服务的启动
		1.选举成功,调用leaderElectionService.isLeader()
		2.选举失败,调用leaderElectionService.notLeader()
		*/
		leaderElectionService.start(this);
		//11.4.2、开启定时任务
		startExecutionGraphCacheCleanupTask();
	}

	private void startExecutionGraphCacheCleanupTask() {
		/*
		11.4.2.1、最终执行的方法是executionGraphCache.cleanup(),清理那些执行完成的executionGraph
		cachedExecutionGraphs.values().removeIf((ExecutionGraphEntry entry) -> currentTime >= entry.getTTL());
		*/
		executionGraphCleanupTask = executor.scheduleWithFixedDelay(
			executionGraphCache::cleanup,
			cleanupInterval,
			cleanupInterval,
			TimeUnit.MILLISECONDS);
	}

}

ZooKeeperLeaderElectionService

public class ZooKeeperLeaderElectionService implements LeaderLatchListener... {

	/*
	Zookeeper的API框架cruator机制:
	当前这个类是LeaderLatchListener的子类,
	所以当选举成功的时候,会自动调用isLeader()方法,否则调用notLeader()方法
	*/
	public void start(LeaderContender contender) throws Exception {
		leaderContender = contender;
		leaderLatch.addListener(this);
		//11.4.1.2、执行选举
		leaderLatch.start();
	}
	
	public void isLeader() {
		/*
		11.4.1.3、成为leader后
		leaderElectionService.start(this);
		leaderContender = this = WebMonitorEndpoint
		
		其它组件启动时leaderContender = ResourceManager/DefaultDispatcherRunner
		*/
		leaderContender.grantLeadership(issuedLeaderSessionID);
	}
	
}

AkkaRpcService

public class AkkaRpcService implements RpcService {

	public <C extends RpcEndpoint & RpcGateway> RpcServer startServer(C rpcEndpoint){
		//12.2、获取hostname和port
		final String akkaAddress = AkkaUtils.getAkkaURL(actorSystem, actorRef);
		...
		
		//12.3、定义接口处理器
		final InvocationHandler akkaInvocationHandler;

		//12.4、通过代理的方式创建一个RpcServer
		RpcServer server = (RpcServer) Proxy.newProxyInstance(...akkaInvocationHandler);
		
	}
}

ResourceManager

public abstract class ResourceManager {

	//12.5、执行onStart()方法,开启ResourceManager服务
	private void startResourceManagerServices() throws Exception {
		//12.6、进行选举,选举成功,调用leaderElectionService.isLeader()
		leaderElectionService.start(this);
	}

	
	public void grantLeadership(final UUID newLeaderSessionID) {
		//12.7、异步调用tryAcceptLeadership(...)方法
		acceptLeadershipFuture = clearStateFuture.thenComposeAsync(
		(ignored) -> tryAcceptLeadership(newLeaderSessionID), 
		getUnfencedMainThreadExecutor());
	}

	protected void startServicesOnLeadership() {
		//12.8、开启心跳服务
		startHeartbeatServices();

		//12.13、启动SlotManager
		slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
	}
	
	private void startHeartbeatServices() {
		//12.8.1、提供和TaskManager心跳相关的服务,关心TaskManager的死活
		taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
		
		//12.8.2、提供和JobManager相关的服务,每一个Job都会启动的一个主控程序
		jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(...);
	}
}

HeartbeatManagerSenderImpl

public class HeartbeatManagerSenderImpl<I, O> ... implements Runnable {
		
	HeartbeatManagerSenderImpl(...) {
	//12.9、调度当前类实例的run()方法执行
	mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);
	}

	public void run() {
		//12.10、实现循环执行控制参数
		if (!stopped) {
			//12.11、发送心跳信息
			requestHeartbeat(heartbeatMonitor);
			
			//12.12、实现循环
			getMainThreadExecutor().schedule(this, heartbeatPeriod, TimeUnit.MILLISECONDS);
		}
	}
	
	/*
	12.11.1、发送心跳信息详解
	HeartbeatMonitor:管理所有的心跳目标对象,如果从节点返回心跳响应,则会被加入到HeartbeatMonitor
	heartbeatTarget:集群中启动的从节点,TaskExecutor
	*/
	private void requestHeartbeat(HeartbeatMonitor<O> heartbeatMonitor) {
		heartbeatTarget.requestHeartbeat(getOwnResourceID(), payload);
	}


}

SlotManagerImpl

public class SlotManagerImpl implements SlotManager {
	
	public void start(...) {
		//12.13.1、开启定时任务checkTaskManagerTimeouts,检查TaskManager的心跳
		taskManagerTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
		
		//12.13.1、开启定时任务checkTaskManagerTimeouts,检查SlotRequest 超时处理
		slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(...);
	}	

}

DefaultDispatcherRunner

public final class DefaultDispatcherRunner implements DispatcherRunner, LeaderContender {

	public static DispatcherRunner create(...) throws Exception {
		//13.1、创建DefaultDispatcherRunner
		final DefaultDispatcherRunner dispatcherRunner = new DefaultDispatcherRunner();
		//13.2、开启DefaultDispatcherRunner的生命周期,leaderElectionService为选举服务
		return DispatcherRunnerLeaderElectionLifecycleManager.createFor(dispatcherRunner, leaderElectionService);
	}
}

DispatcherRunnerLeaderElectionLifecycleManager

final class DispatcherRunnerLeaderElectionLifecycleManager implements DispatcherRunner {
	
	private DispatcherRunnerLeaderElectionLifecycleManager(...) throws Exception {
		/*
		13.3、leaderElectionService.start(this);
		leaderElectionService内部的选举对象leaderContender是DefaultDispatcherRunner
		*/
		leaderElectionService.start(dispatcherRunner);
	}

	//13.4、选举完成成为主节点后...
	public void grantLeadership(UUID leaderSessionID) {
		runActionIfRunning(() -> startNewDispatcherLeaderProcess(leaderSessionID));
	}

	//13.5、调用DispatcherLeaderProcess的start()方法
	private void startNewDispatcherLeaderProcess(UUID leaderSessionID) {
		//停掉已有的DispatcherLeaderProcess
		stopDispatcherLeaderProcess();
		
		//创建新的DispatcherLeaderProcess
		final DispatcherLeaderProcess =...;
		
		//newDispatcherLeaderProcess::start
		FutureUtils.assertNoException(
		previousDispatcherLeaderProcessTerminationFuture.thenRun(
		newDispatcherLeaderProcess::start
		));
	}	
}

AbstractDispatcherLeaderProcess

public abstract class AbstractDispatcherLeaderProcess implements DispatcherLeaderProcess {

	private void startInternal() {
		log.info("Start {}.", getClass().getSimpleName());
		//13.6、DispatcherLeaderProcess已经启动,改变状态
		state = State.RUNNING;
		onStart();
	}
	
}

SessionDispatcherLeaderProcess

public class SessionDispatcherLeaderProcess ... {
	protected void onStart() {
		//13.7、开启服务:启动JobGraphStore,一个用来存储JobGraph的存储组件
		startServices();
		
		//13.8、开始创建Dispatcher
		onGoingRecoveryOperation = recoverJobsAsync()
			.thenAccept(this::createDispatcherIfRunning)
			.handle(this::onErrorIfRunning);
	}
}

DefaultDispatcherGatewayServiceFactory

class DefaultDispatcherGatewayServiceFactory implements ... {

	public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(...) {
		//13.9、创建Dispatcher
		dispatcher = dispatcherFactory.createDispatcher(...);
		//13.12、Dispatcher启动后,发送一个hello消息给自己,说明启动成功
		dispatcher.start();
	}
}

Dispatcher

public abstract class Dispatcher ... {
	
	//13.10、执行onStart()方法
	public void onStart() throws Exception {
		//启动Dispatcher服务
		startDispatcherServices();
		//13.11、引导程序初始化,把所有中断的job恢复执行
		dispatcherBootstrap.initialize(...);
	}

	//*/13.11.2、客户端提交job的时候,由Dispatcher接收到提交执行
	private CompletableFuture<Void> runJob(JobGraph jobGraph) {
		//提交任务 == start JobManagerRunner,封装了一个JobManager
		return jobManagerRunnerFuture
			.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner))
			...
	}
	
}

DefaultDispatcherBootstrap

public class DefaultDispatcherBootstrap extends AbstractDispatcherBootstrap {

	public void initialize(...) {
		/*
		13.11.1、recoveredJobs:待恢复的job
		AbstractDispatcherBootstrap底层:dispatcher.runRecoveredJob(recoveredJob);
		*/
		launchRecoveredJobGraphs(dispatcher, recoveredJobs);
	
		//13.11.2、恢复之后,清空
		recoveredJobs.clear();
	}

}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值