Flink本地启动模式用户代码逻辑转换逻辑追踪

本文详细追踪了Flink在本地启动模式下,从ExecutionEnvironment开始,用户代码如何通过DataStream API转化为Transformation,进一步生成StreamGraph和JobGraph。重点介绍了StreamOperatorFactory、StreamOperator在用户计算逻辑中的作用,以及JobManager和TaskManager的角色。最后,阐述了Task运行时的数据处理流程。
摘要由CSDN通过智能技术生成

Flink本地启动模式用户代码逻辑转换逻辑追踪

本文主要是追踪了一下ide启动一个LocalStreamEnvironment的代码执行流程。

测试代码取自flink的LocalStreamEnvironmentITCase类中,代码如下

@Test
	public void testRunIsolatedJob() throws Exception {
   
		LocalStreamEnvironment env = new LocalStreamEnvironment();
		assertEquals(1, env.getParallelism());

		addSmallBoundedJob(env, 3);
		env.execute();
	}
ExecutionEnvironment部分

首先是执行用户调用的DataStream中的各种map reduce agg等操作,实际上最后都是调用都是先构造相应逻辑的Transformation,然后调用调用了getExecutionEnvironment().addOperator()将Transformation加入到当前ExecutionEnvironment中

以addSink为例

public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction) {
   

		// read the output type of the input Transform to coax out errors about MissingTypeInfo
		transformation.getOutputType();

		// configure the type if needed
		if (sinkFunction instanceof InputTypeConfigurable) {
   
			((InputTypeConfigurable) sinkFunction).setInputType(getType(), getExecutionConfig());
		}

        // 构造相应的Transformation
		StreamSink<T> sinkOperator = new StreamSink<>(clean(sinkFunction));

		DataStreamSink<T> sink = new DataStreamSink<>(this, sinkOperator);

        // 将Transformation加入ExecutionEnvironment
		getExecutionEnvironment().addOperator(sink.getTransformation());
		return sink;
	}

接下来就是Transformation -> StreamGraph
该段逻辑在调用execute方法中被调用,主要逻辑在StreamGraphGenerator.generate方法

public StreamGraph generate() {
   
		streamGraph = new StreamGraph(executionConfig, checkpointConfig);
		streamGraph.setStateBackend(stateBackend);
		streamGraph.setChaining(chaining);
		streamGraph.setScheduleMode(scheduleMode);
		streamGraph.setUserArtifacts(userArtifacts);
		streamGraph.setTimeCharacteristic(timeCharacteristic);
		streamGraph.setJobName(jobName);
		streamGraph.setBlockingConnectionsBetweenChains(blockingConnectionsBetweenChains);

		alreadyTransformed = new HashMap<>();

        // 核心转换逻辑
		for (Transformation<?> transformation: transformations) {
   
			transform(transformation);
		}

		final StreamGraph builtStreamGraph = streamGraph;

		alreadyTransformed.clear();
		alreadyTransformed = null;
		streamGraph = null;

		return builtStreamGraph;
	}

transform方法如下

private Collection<Integer> transform(Transformation<?> transform) {
   

		if (alreadyTransformed.containsKey(transform)) {
   
			return alreadyTransformed.get(transform);
		}

		LOG.debug("Transforming " + transform);

		if (transform.getMaxParallelism() <= 0) {
   

			// if the max parallelism hasn't been set, then first use the job wide max parallelism
			// from the ExecutionConfig.
			int globalMaxParallelismFromConfig = executionConfig.getMaxParallelism();
			if (globalMaxParallelismFromConfig > 0) {
   
				transform.setMaxParallelism(globalMaxParallelismFromConfig);
			}
		}

		// call at least once to trigger exceptions about MissingTypeInfo
		transform.getOutputType();

		Collection<Integer> transformedIds;
        // 根据各种类型的transform进行相应的处理
		if (transform instanceof OneInputTransformation<?, ?>) {
   
			transformedIds = transformOneInputTransform((OneInputTransformation<?, ?>) transform);
		} else if (transform instanceof TwoInputTransformation<?, ?, ?>) {
   
			transformedIds = transformTwoInputTransform((TwoInputTransformation<?, ?, ?>) transform);
		} else if (transform instanceof SourceTransformation<?>) {
   
			transformedIds = transformSource((SourceTransformation<?>) transform);
		} else if (transform instanceof SinkTransformation<?>) {
   
			transformedIds = transformSink((SinkTransformation<?>) transform);
		} else if (transform instanceof UnionTransformation<?>) {
   
			transformedIds = transformUnion((UnionTransformation<?>) transform);
		} else if (transform instanceof SplitTransformation<?>) {
   
			transformedIds = transformSplit((SplitTransformation<?>) transform);
		} else if (transform instanceof SelectTransformation<?>) {
   
			transformedIds = transformSelect((SelectTransformation<?>) transform);
		} else if (transform instanceof FeedbackTransformation<?>) {
   
			transformedIds = transformFeedback((FeedbackTransformation<?>) transform);
		} else if (transform instanceof CoFeedbackTransformation<?>) {
   
			transformedIds = transformCoFeedback((CoFeedbackTransformation<?>) transform);
		} else if (transform instanceof PartitionTransformation<?>) {
   
			transformedIds = transformPartition((PartitionTransformation<?>) transform);
		} else if (transform instanceof SideOutputTransformation<?>) {
   
			transformedIds = transformSideOutput((SideOutputTransformation<?>) transform);
		} else {
   
			throw new IllegalStateException("Unknown transformation: " + transform);
		}

		// need this check because the iterate transformation adds itself before
		// transforming the feedback edges
		if (!alreadyTransformed.containsKey(transform)) {
   
			alreadyTransformed.put(transform, transformedIds);
		}

		if (transform.getBufferTimeout() >= 0) {
   
			streamGraph.setBufferTimeout(transform.getId(), transform.getBufferTimeout());
		} else {
   
			streamGraph.setBufferTimeout(transform.getId(), defaultBufferTimeout);
		}

		if (transform.getUid() != null) {
   
			streamGraph.setTransformationUID(transform.getId(), transform.getUid());
		}
		if (transform.getUserProvidedNodeHash() != null) {
   
			streamGraph.setTransformationUserHash(transform.getId(), transform.getUserProvidedNodeHash());
		}

		if (!streamGraph.getExecutionConfig().hasAutoGeneratedUIDsEnabled()) {
   
			if (transform.getUserProvidedNodeHash() == null && transform.getUid() == null) {
   
				throw new IllegalStateException("Auto generated UIDs have been disabled " +
					"but no UID or hash has been assigned to operator " + transform.getName());
			}
		}

		if (transform.getMinResources() != null && transform.getPreferredResources() != null) {
   
			streamGraph.setResources(transform.getId(), transform.getMinResources(), transform.getPreferredResources());
		}

		return transformedIds;
	}

挑其中一个转换看一下,大致逻辑应该大同小异

private <IN, OUT> Collection<Integer> transformOneInputTransform(OneInputTransformation<IN, OUT> transform) {
   
        // 递归尝试去转换上层的input的Transformation
		Collection<Integer> inputIds = transform(transform.getInput());

		// the recursive call might have already transformed this
		if (alreadyTransformed.containsKey(transform)) {
   
			return alreadyTransformed.get(transform);
		}

		String slotSharingGroup = determineSlotSharingGroup(transform.getSlotSharingGroup(), inputIds);

        // 用户逻辑的转换主要就在这一块了核心就是getOperatorFactory
		streamGraph.addOperator(transform.getId(),
				slotSharingGroup,
				transform.getCoLocationGroupKey(),
				transform.getOperatorFactory(),
				transform.getInputType(),
				transform.getOutputType(),
				transform.getName());

		if (transform.getStateKeySelector() != null) {
   
			TypeSerializer<?> keySerializer = transform.getStateKeyType().createSerializer(executionConfig);
			streamGraph.setOneInputStateKey(transform.getId(), transform.getStateKeySelector(), keySerializer);
		}

		int parallelism = transform.getParallelism() != ExecutionConfig.PARALLELISM_DEFAULT ?
			transform.getParallelism() : executionConfig.getParallelism();
		streamGraph.setParallelism(transform.getId(), parallelism);
		streamGraph.setMaxParallelism(transform.getId(), transform.getMaxParallelism());

		for (Integer inputId: inputIds) {
   
			streamGraph.addEdge(inputId, transform.getId(), 0);
		}

		return Collections.singleton(transform.getId());
	}

StreamOperator和StreamOperatorFactory

上面说了,核心是getOperatorFactory方法,这个方法返回的是一个StreamOperatorFactory对象,StreamOperatorFactory是一个接口,其中最重要的方法就是

/**
	 * Create
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值