Apache Flink源码分析---JobManager启动流程

最新推荐文章于 2024-07-24 11:58:44 发布

wandy0211

最新推荐文章于 2024-07-24 11:58:44 发布

阅读量1.7k

点赞数

分类专栏： flink原理及源码分析

本文链接：https://blog.csdn.net/wjandy0211/article/details/108854292

版权

flink原理及源码分析专栏收录该内容

18 篇文章 0 订阅 ¥69.90 ¥99.00

订阅专栏

Apache Flink的启动关键在于JobManager和TaskManager，本文聚焦JobManager的启动流程。从StandaloneSessionClusterEntrypoint开始，涵盖配置启动、JobManager入口、启动过程，以及详细的服务初始化，包括commonRpcService作为Akka actorSystem的TCP RPC服务，haServices提供高可用性（如无HA、ZooKeeper HA和自定义HA），blobServer负责存储和清理blob，heartbeatServices和metricQueryServiceRpcService用于心跳和服务查询，还有archivedExecutionGraphStore、Dispatcher、ResourceManager和WebMonitorEndpoint等组件的角色和功能。

摘要由CSDN通过智能技术生成

Flink启动主要是启动JobManager进程和TaskManager进程，本章我们总结一下JobManager的启动流程：

JobManager启动流程：

启动类是org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint

配置JobManager启动：

JobManager入口：

public static void main(String[] args) {
		// startup checks and logging
		//提供对JVM执行环境的访问的实用程序类，如执行用户(getHadoopUser())、启动选项或JVM版本。
		EnvironmentInformation.logEnvironmentInfo(LOG, StandaloneSessionClusterEntrypoint.class.getSimpleName(), args);
		SignalHandler.register(LOG);
		JvmShutdownSafeguard.installAsShutdownHook(LOG);

		EntrypointClusterConfiguration entrypointClusterConfiguration = null;
		final CommandLineParser<EntrypointClusterConfiguration> commandLineParser = new CommandLineParser<>(new EntrypointClusterConfigurationParserFactory());

		try {
			entrypointClusterConfiguration = commandLineParser.parse(args);
		} catch (FlinkParseException e) {
			LOG.error("Could not parse command line arguments {}.", args, e);
			commandLineParser.printHelp(StandaloneSessionClusterEntrypoint.class.getSimpleName());
			System.exit(1);
		}
        //解析配置参数
		Configuration configuration = loadConfiguration(entrypointClusterConfiguration);
		//构造StandaloneSessionClusterEntrypoint对象，独立会话集群的入口
		StandaloneSessionClusterEntrypoint entrypoint = new StandaloneSessionClusterEntrypoint(configuration);
		//启动会话集群
		ClusterEntrypoint.runClusterEntrypoint(entrypoint);
	}

JobManager启动：

public void startCluster() throws ClusterEntrypointException {
		LOG.info("Starting {}.", getClass().getSimpleName());

		try {
			//PluginManager负责管理集群插件，这些插件是使用单独的类加载器加载的，
			// 以便它们的依赖关系，不要干扰Flink的依赖关系。
			PluginManager pluginManager = PluginUtils.createPluginManagerFromRootFolder(configuration);
			//根据配置初始化文件系统
			configureFileSystems(configuration, pluginManager);
            //安全配置Context
			SecurityContext securityContext = installSecurityContext(configuration);
            //启动jobmanager服务
			securityContext.runSecured((Callable<Void>) () -> {
				runCluster(configuration, pluginManager);

				return null;
			});
		} catch (Throwable t)

初始化服务：

private void runCluster(Configuration configuration, PluginManager pluginManager) throws Exception {
		synchronized (lock) {
			//初始化服务
			//commonRpcService: 基于Akka的RpcService实现。RPC服务启动Akka参与者来接收从RpcGateway调用RPC
			//haServices: 提供对高可用性所需的所有服务的访问注册，分布式计数器和领导人选举
			//blobServer: 负责侦听传入的请求生成线程来处理这些请求。它还负责创建要存储的目录结构blob或临时缓存它们
			//heartbeatServices: 提供心跳所需的所有服务。这包括创建心跳接收器和心跳发送者。
			//metricRegistry:  跟踪所有已注册的Metric，它作为连接MetricGroup和MetricReporter
			//archivedExecutionGraphStore:  存储执行图ExecutionGraph的可序列化形式。
			initializeServices(configuration, pluginManager);

			// write host information into configuration   将jobmanager的信息写入道配置文件
			configuration.setString(JobManagerOptions.ADDRESS, commonRpcService.getAddress());
			configuration.setInteger(JobManagerOptions.PORT, commonRpcService.getPort());

			final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);
            //创建dispatcherResourceManagerComponent，包含三大组件
			// Dispatcher: 负责用于接收作业提交，持久化它们，生成要执行的作业管理器任务，并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。
			// ResourceManager:负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽
			// WebMonitorEndpoint:服务于web前端Rest调用的Rest端点
			clusterComponent = dispatcherResourceManagerComponentFactory.create(
				configuration,
				ioExecutor,
				commonRpcService,
				haServices,
				blobServer,
				heartbeatServices,
				metricRegistry,
				archivedExecutionGraphStore,
				new RpcMetricQueryServiceRetriever(metricRegistry.getMetricQueryServiceRpcService()),
				this);

			clusterComponent.getShutDownFuture().whenComplete(
				(ApplicationStatus applicationStatus, Throwable throwable) -> {
					if (throwable != null) {
						shutDownAsync(
							ApplicationStatus.UNKNOWN,
							ExceptionUtils.stringifyException(throwable),
							false);
					} else {
						// This is the general shutdown path. If a separate more specific shutdown was
						// already triggered, this will do nothing
						shutDownAsync(
							applicationStatus,
							null,
							true);
					}
				});
		}
	}

服务详解：

commonRpcService：

基于Akka的RpcService实现。RPC服务启动Akka参与者来接收从RpcGateway调用RPC

commonRpcService其实是一个基于akka得actorSystem，其实就是一个tcp的rpc服务，端口为：6123。它的主要配置如下：

Config(SimpleConfigObject(
{"akka":{
   "actor":{
     "default-dispatcher":{
                           "executor":"fork-join-executor",
			   "fork-join-executor":{"parallelism-factor":2,
			                          "parallelism-max":64,
						  "parallelism-min":8
						  },
			    "throughput":15
			    },
       "guardian-supervisor-strategy":"org.apache.flink.runtime.akka.EscalatingSupervisorStrategy",
       "provider":"akka.remote.RemoteActorRefProvider",
        "supervisor-dispatcher":{
	                         "executor":"thread-pool-executor",
				  "thread-pool-executor":{
				                          "core-pool-size-max":1,
							   "core-pool-size-min":1
							   },"type":"Dispatcher"
				
                                },
	"warn-about-java-serializer-usage":"off"},
	"daemonic":"off",
	"jvm-exit-on-fatal-error":"on",
	"log-config-on-start":"off",
	"log-dead-letters":"off",
	"log-dead-letters-during-shutdown":"off",
	"loggers":["akka.event.slf4j.Slf4jLogger"],
	"logging-filter":"akka.event.slf4j.Slf4jLoggingFilter",
	"loglevel":"ERROR",
	"remote":{ 
	           "log-remote-lifecycle-events":"off",
	           "netty":{
		             "tcp":{
			            "bind-hostname":"0.0.0.0",
				    "bind-port":6123,
				    "client-socket-worker-pool":{
				                                  "pool-size-factor":1,
								   "pool-size-max":2,
								   "pool-size-min":1
								   },
                                    "connection-timeout":"20000ms",
				    "hostname":"localhost",
				    "maximum-frame-size":"10485760b",
				    "port":6123,
				    "server-socket-worker-pool":{
				                                 "pool-size-factor":1,
								 "pool-size-max":2,
								 "pool-size-min":1
								 },	
                                     "tcp-nodelay":"on",
				     "transport-class":"akka.remote.transport.netty.NettyTransport"
				     }
		             },
                 "retry-gate-closed-for":"50 ms",
		 "startup-timeout":"100000ms",
		 "transport-failure-detector":{
		                                 "acceptable-heartbeat-pause":"6000000ms",
						 "heartbeat-interval":"1000000ms","threshold":300
						}
		},
	 "serialize-messages":"off",
	 "stdout-loglevel":"OFF"
	}}))

haServices：

提供对高可用性所需的所有服务的访问注册，分布式计数器和领导人选举

haServices服务创建的代码如下：

switch (highAvailabilityMode) {
			case NONE:
				final Tuple2<String, Integer> hostnamePort = getJobManagerAddress(configuration);

				final String resourceManagerRpcUrl = AkkaRpcServiceUtils.getRpcUrl(
					hostnamePort.f0,
					hostnamePort.f1,
					AkkaRpcServiceUtils.createWildcardName(ResourceManager.RESOURCE_MANAGER_NAME),
					addressResolution,
					configuration);
				final String dispatcherRpcUrl = AkkaRpcServiceUtils.getRpcUrl(
					hostnamePort.f0,
					hostnamePort.f1,
					AkkaRpcServiceUtils.createWildcardName(Dispatcher.DISPATCHER_NAME),
					addressResolution,
					configuration);
				final String webMonitorAddress = getWebMonitorAddress(
					configuration,
					addressResolution);

				return new StandaloneHaServices(
					resourceManagerRpcUrl,
					dispatcherRpcUrl,
					webMonitorAddress);
			case ZOOKEEPER:
				BlobStoreService blobStoreService = BlobUtils.createBlobStoreFromConfig(configuration);

				return new ZooKeeperHaServices(
					ZooKeeperUtils.startCuratorFramework(configuration),
					executor,
					configuration,
					blobStoreService);

			case FACTORY_CLASS:
				return createCustomHAServices(configuration, executor);

可以看出，HA服务创建有三种：

1. none: 表示没有ha，实现服务是StandaloneHaServices

public class StandaloneHaServices extends AbstractNonHaServices {

	/** The fix address of the ResourceManager. */
	private final String resourceManagerAddress;

	/** The fix address of the Dispatcher. */
	private final String dispatcherAddress;

	private final String clusterRestEndpointAddress;

	/**
	 * Creates a new services class for the fix pre-defined leaders.
	 *
	 * @param resourceManagerAddress    The fix address of the ResourceManager
	 * @param clusterRestEndpointAddress
	 */
	public StandaloneHaServices(
			String resourceManagerAddress,
			String dispatcherAddress,
			String clusterRestEndpointAddress) {
		this.resourceManagerAddress = checkNotNull(resourceManagerAddress, "resourceManagerAddress");
		this.dispatcherAddress = checkNotNull(dispatcherAddress, "dispatcherAddress");
		this.clusterRestEndpointAddress = checkNotNull(clusterRestEndpointAddress, clusterRestEndpointAddress);
	}

2. zookeeper：通过zookeeper实现的ha服务，实现服务是ZooKeeperHaServices

public ZooKeeperHaServices(
			CuratorFramework client,
			Executor executor,
			Configuration configuration,
			BlobStoreService blobStoreService) {
		this.client = checkNotNull(client);
		this.executor = checkNotNull(executor);
		this.configuration = checkNotNull(configuration);
		this.runningJobsRegistry = new ZooKeeperRunningJobsRegistry(client, configuration);

		this.blobStoreService = checkNotNull(blobStoreService);
	}

3. factory_class: 自定义实现的ha, 通过highAvailabilityServicesFactory工厂创建

blobServer：

负责侦听传入的请求生成线程来处理这些请求。它还负责创建要存储的目录结构blob或临时缓存它们

blobServer的构造函数：

public BlobServer(Configuration config, BlobStore blobStore) throws IOException {
		this.blobServiceConfiguration = checkNotNull(config);
		this.blobStore = checkNotNull(blobStore);
		this.readWriteLock = new ReentrantReadWriteLock();

		// configure and create the storage directory
		this.storageDir = BlobUtils.initLocalStorageDirectory(config);
		LOG.info("Created BLOB server storage directory {}", storageDir);

		// configure the maximum number of concurrent connections
		final int maxConnections = config.getInteger(BlobServerOptions.FETCH_CONCURRENT);
		if (maxConnections >= 1) {
			this.maxConnections = maxConnections;
		}
		else {
			LOG.warn("Invalid value for maximum connections in BLOB server: {}. Using default value of {}",
					maxConnections, BlobServerOptions.FETCH_CONCURRENT.defaultValue());
			this.maxConnections = BlobServerOptions.FETCH_CONCURRENT.defaultValue();
		}

		// configure the backlog of connections
		int backlog = config.getInteger(BlobServerOptions.FETCH_BACKLOG);
		if (backlog < 1) {
			LOG.warn("Invalid value for BLOB connection backlog: {}. Using default value of {}",
					backlog, BlobServerOptions.FETCH_BACKLOG.defaultValue());
			backlog = BlobServerOptions.FETCH_BACKLOG.defaultValue();
		}

		// Initializing the clean up task
        //初始化定时任务，用来清理TTL过期的blob
		this.cleanupTimer = new Timer(true);

		this.cleanupInterval = config.getLong(BlobServerOptions.CLEANUP_INTERVAL) * 1000;
		this.cleanupTimer
			.schedule(new TransientBlobCleanupTask(blobExpiryTimes, readWriteLock.writeLock(),
				storageDir, LOG), cleanupInterval, cleanupInterval);

		this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, getClass().getSimpleName(), LOG);

		//  ----------------------- start the server -------------------
        //构造一个serverSocket
		final String serverPortRange = config.getString(BlobServerOptions.PORT);
		final Iterator<Integer> ports = NetUtils.getPortRangeFromString(serverPortRange);
         
  
		final ServerSocketFactory socketFactory;
		if (SSLUtils.isInternalSSLEnabled(config) && config.getBoolean(BlobServerOptions.SSL_ENABLED)) {
			try {
				socketFactory = SSLUtils.createSSLServerSocketFactory(config);
			}
			catch (Exception e) {
				throw new IOException("Failed to initialize SSL for the blob server", e);
			}
		}
		else {
			socketFactory = ServerSocketFactory.getDefault();
		}

		final int finalBacklog = backlog;
		final String bindHost = config.getOptional(JobManagerOptions.BIND_HOST).orElseGet(NetUtils::getWildcardIPAddress);

		this.serverSocket = NetUtils.createSocketFromPorts(ports,
				(port) -> socketFactory.createServerSocket(port, finalBacklog, InetAddress.getByName(bindHost)));

		if (serverSocket == null) {
			throw new IOException("Unable to open BLOB Server in specified port range: " + serverPortRange);
		}

		// start the server thread
		setName("BLOB Server listener at " + getPort());
		setDaemon(true);

		if (LOG.isInfoEnabled()) {
			LOG.info("Started BLOB server at {}:{} - max concurrent requests: {} - max backlog: {}",
					serverSocket.getInetAddress().getHostAddress(), getPort(), maxConnections, backlog);
		}
	}

构造函数主要做了两件事：

1.初始化定时任务，用来清理TTL过期的blob

2.构造一个serverSocket

启动blobserver:

blobServer = new BlobServer(configuration, haServices.createBlobStore());
			blobServer.start();

@Override
	public void run() {
		try {
			while (!this.shutdownRequested.get()) {
				BlobServerConnection conn = new BlobServerConnection(serverSocket.accept(), this);
				try {
					synchronized (activeConnections) {
						while (activeConnections.size() >= maxConnections) {
							activeConnections.wait(2000);
						}
						activeConnections.add(conn);
					}

					conn.start();
					conn = null;
				}
				finally {
					if (conn != null) {
						conn.close();
						synchronized (activeConnections) {
							activeConnections.remove(conn);
						}
					}
				}
			}
		}
		catch (Throwable t) {
			if (!this.shutdownRequested.get()) {
				LOG.error("BLOB server stopped working. Shutting down", t);

				try {
					close();
				} catch (Throwable closeThrowable) {
					LOG.error("Could not properly close the BlobServer.", closeThrowable);
				}
			}
		}
	}

heartbeatServices：

提供心跳所需的所有服务。这包括创建心跳接收器和心跳发送者。

metricQueryServiceRpcService：

跟踪所有已注册的Metric，它作为连接MetricGroup和MetricReporter

public static RpcService startRemoteMetricsRpcService(Configuration configuration, String hostname) throws Exception {
		final String portRange = configuration.getString(MetricOptions.QUERY_SERVICE_PORT);

		return startMetricRpcService(configuration, AkkaRpcServiceUtils.remoteServiceBuilder(configuration, hostname, portRange));
	}

metricQueryServiceRpcService也是一个actorSystem:

archivedExecutionGraphStore：

存储执行图ExecutionGraph的可序列化形式。

@Override
	protected ArchivedExecutionGraphStore createSerializableExecutionGraphStore(
			Configuration configuration,
			ScheduledExecutor scheduledExecutor) throws IOException {
		final File tmpDir = new File(ConfigurationUtils.parseTempDirectories(configuration)[0]);

		final Time expirationTime =  Time.seconds(configuration.getLong(JobManagerOptions.JOB_STORE_EXPIRATION_TIME));
		final int maximumCapacity = configuration.getInteger(JobManagerOptions.JOB_STORE_MAX_CAPACITY);
		final long maximumCacheSizeBytes = configuration.getLong(JobManagerOptions.JOB_STORE_CACHE_SIZE);

		return new FileArchivedExecutionGraphStore(
			tmpDir,
			expirationTime,
			maximumCapacity,
			maximumCacheSizeBytes,
			scheduledExecutor,
			Ticker.systemTicker());
	}

public FileArchivedExecutionGraphStore(
			File rootDir,
			Time expirationTime,
			int maximumCapacity,
			long maximumCacheSizeBytes,
			ScheduledExecutor scheduledExecutor,
			Ticker ticker) throws IOException {

		final File storageDirectory = initExecutionGraphStorageDirectory(rootDir);

		LOG.info(
			"Initializing {}: Storage directory {}, expiration time {}, maximum cache size {} bytes.",
			FileArchivedExecutionGraphStore.class.getSimpleName(),
			storageDirectory,
			expirationTime.toMilliseconds(),
			maximumCacheSizeBytes);
        //存储地址
		this.storageDir = Preconditions.checkNotNull(storageDirectory);
		Preconditions.checkArgument(
			storageDirectory.exists() && storageDirectory.isDirectory(),
			"The storage directory must exist and be a directory.");
		//缓存cache
		this.jobDetailsCache = CacheBuilder.newBuilder()
			.expireAfterWrite(expirationTime.toMilliseconds(), TimeUnit.MILLISECONDS)
			.maximumSize(maximumCapacity)
			.removalListener(
				(RemovalListener<JobID, JobDetails>) notification -> deleteExecutionGraphFile(notification.getKey()))
			.ticker(ticker)
			.build();
        //LoadingCache<JobID, ArchivedExecutionGraph>
		this.archivedExecutionGraphCache = CacheBuilder.newBuilder()
			.maximumWeight(maximumCacheSizeBytes)
			.weigher(this::calculateSize)
			.build(new CacheLoader<JobID, ArchivedExecutionGraph>() {
				@Override
				public ArchivedExecutionGraph load(JobID jobId) throws Exception {
					return loadExecutionGraph(jobId);
				}});

		this.cleanupFuture = scheduledExecutor.scheduleWithFixedDelay(
			jobDetailsCache::cleanUp,
			expirationTime.toMilliseconds(),
			expirationTime.toMilliseconds(),
			TimeUnit.MILLISECONDS);

		this.shutdownHook = ShutdownHookUtil.addShutdownHook(this, getClass().getSimpleName(), LOG);

		this.numFinishedJobs = 0;
		this.numFailedJobs = 0;
		this.numCanceledJobs = 0;
	}

DispatcherResourceManagerComponent:

public DispatcherResourceManagerComponent create(
			Configuration configuration,
			Executor ioExecutor,
			RpcService rpcService,
			HighAvailabilityServices highAvailabilityServices,
			BlobServer blobServer,
			HeartbeatServices heartbeatServices,
			MetricRegistry metricRegistry,
			ArchivedExecutionGraphStore archivedExecutionGraphStore,
			MetricQueryServiceRetriever metricQueryServiceRetriever,
			FatalErrorHandler fatalErrorHandler) throws Exception {

        //检索当前leader并进行通知一个倾听者的服务
		LeaderRetrievalService dispatcherLeaderRetrievalService = null;
		//检索当前leader并进行通知一个倾听者的服务
		LeaderRetrievalService resourceManagerRetrievalService = null;
		//服务于web前端Rest调用的Rest端点。
		WebMonitorEndpoint<?> webMonitorEndpoint = null;
		//ResourceManager实现。资源管理器负责资源的分配和记帐
		ResourceManager<?> resourceManager = null;
		//封装Dispatcher如何执行的
		DispatcherRunner dispatcherRunner = null;

		try {
			dispatcherLeaderRetrievalService = highAvailabilityServices.getDispatcherLeaderRetriever();

			resourceManagerRetrievalService = highAvailabilityServices.getResourceManagerLeaderRetriever();

            //LeaderGatewayRetriever检索和存储leading {@link RpcGateway}。
			final LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever = new RpcGatewayRetriever<>(
				rpcService,
				DispatcherGateway.class,
				DispatcherId::fromUuid,
				10,
				Time.milliseconds(50L));

            //LeaderGatewayRetriever检索和存储leading {@link RpcGateway}。
			final LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever = new RpcGatewayRetriever<>(
				rpcService,
				ResourceManagerGateway.class,
				ResourceManagerId::fromUuid,
				10,
				Time.milliseconds(50L));

			final ScheduledExecutorService executor = WebMonitorEndpoint.createExecutorService(
				configuration.getInteger(RestOptions.SERVER_NUM_THREADS),
				configuration.getInteger(RestOptions.SERVER_THREAD_PRIORITY),
				"DispatcherRestEndpoint");

			final long updateInterval = configuration.getLong(MetricOptions.METRIC_FETCHER_UPDATE_INTERVAL);
			//MetricFetcher可用于从JobManager和所有注册的taskmanager获取指标
			final MetricFetcher metricFetcher = updateInterval == 0
				? VoidMetricFetcher.INSTANCE
				: MetricFetcherImpl.fromConfiguration(
					configuration,
					metricQueryServiceRetriever,
					dispatcherGatewayRetriever,
					executor);

			webMonitorEndpoint = restEndpointFactory.createRestEndpoint(
				configuration,
				dispatcherGatewayRetriever,
				resourceManagerGatewayRetriever,
				blobServer,
				executor,
				metricFetcher,
				highAvailabilityServices.getClusterRestEndpointLeaderElectionService(),
				fatalErrorHandler);

			log.debug("Starting Dispatcher REST endpoint.");
			webMonitorEndpoint.start();

			final String hostname = RpcUtils.getHostname(rpcService);

			resourceManager = resourceManagerFactory.createResourceManager(
				configuration,
				ResourceID.generate(),
				rpcService,
				highAvailabilityServices,
				heartbeatServices,
				fatalErrorHandler,
				new ClusterInformation(hostname, blobServer.getPort()),
				webMonitorEndpoint.getRestBaseUrl(),
				metricRegistry,
				hostname);

			final HistoryServerArchivist historyServerArchivist = HistoryServerArchivist.createHistoryServerArchivist(configuration, webMonitorEndpoint, ioExecutor);

			final PartialDispatcherServices partialDispatcherServices = new PartialDispatcherServices(
				configuration,
				highAvailabilityServices,
				resourceManagerGatewayRetriever,
				blobServer,
				heartbeatServices,
				() -> MetricUtils.instantiateJobManagerMetricGroup(metricRegistry, hostname),
				archivedExecutionGraphStore,
				fatalErrorHandler,
				historyServerArchivist,
				metricRegistry.getMetricQueryServiceGatewayRpcAddress());

			log.debug("Starting Dispatcher.");
			dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(
				highAvailabilityServices.getDispatcherLeaderElectionService(),
				fatalErrorHandler,
				new HaServicesJobGraphStoreFactory(highAvailabilityServices),
				ioExecutor,
				rpcService,
				partialDispatcherServices);

			log.debug("Starting ResourceManager.");
			resourceManager.start();

			resourceManagerRetrievalService.start(resourceManagerGatewayRetriever);
			dispatcherLeaderRetrievalService.start(dispatcherGatewayRetriever);

			return new DispatcherResourceManagerComponent(
				dispatcherRunner,
				resourceManager,
				dispatcherLeaderRetrievalService,
				resourceManagerRetrievalService,
				webMonitorEndpoint);

		} catch (Exception exception) {
			// clean up all started components
			if (dispatcherLeaderRetrievalService != null) {
				try {
					dispatcherLeaderRetrievalService.stop();
				} catch (Exception e) {
					exception = ExceptionUtils.firstOrSuppressed(e, exception);
				}
			}

			if (resourceManagerRetrievalService != null) {
				try {
					resourceManagerRetrievalService.stop();
				} catch (Exception e) {
					exception = ExceptionUtils.firstOrSuppressed(e, exception);
				}
			}

			final Collection<CompletableFuture<Void>> terminationFutures = new ArrayList<>(3);

			if (webMonitorEndpoint != null) {
				terminationFutures.add(webMonitorEndpoint.closeAsync());
			}

			if (resourceManager != null) {
				terminationFutures.add(resourceManager.closeAsync());
			}

			if (dispatcherRunner != null) {
				terminationFutures.add(dispatcherRunner.closeAsync());
			}

			final FutureUtils.ConjunctFuture<Void> terminationFuture = FutureUtils.completeAll(terminationFutures);

			try {
				terminationFuture.get();
			} catch (Exception e) {
				exception = ExceptionUtils.firstOrSuppressed(e, exception);
			}

			throw new FlinkException("Could not create the DispatcherResourceManagerComponent.", exception);
		}
	}

dispatcherResourceManagerComponent，包含6个服务：
1. Dispatcher: 负责用于接收作业提交，持久化它们，生成要执行的作业管理器任务，并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。
2. ResourceManager:负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽
3. WebMonitorEndpoint:服务于web前端Rest调用的Rest端点
4.dispatcherLeaderRetrievalService:检索当前dispatcher leader并进行通知一个倾听者的服务:dispatcherGatewayRetriever
5.resourceManagerRetrievalService:检索当前resourceManager leader并进行通知一个倾听者的服务:resourceManagerGatewayRetriever
6.partialDispatcherServices

Dispatcher：

负责用于接收作业提交，持久化它们，生成要执行的作业管理器任务，并在主任务失败时恢复它们。此外,它知道关于Flink会话集群的状态。

ResourceManager：

负责资源的分配和记帐。registerJobManager(JobMasterId, ResourceID, String, JobID, Time)负责注册jobmaster, requestSlot(JobMasterId, SlotRequest, Time)从资源管理器请求一个槽

WebMonitorEndpoint：

服务于web前端Rest调用的Rest端点

wandy0211

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录