







































细分的话还包含: LocalStreamEnvironment、RemoteStreamEnvironment、StreamContextEnvironment、StreamPlanEnvironment




  1. LocalEnvironment:本地执行环境,在单jvm环境下模拟运行flink集群,用于本地的开发测试
  2. RemoteEnvironment:在远端部署的flink集群的执行环境
  3. CollectionEnvironment:集合数据集模式执行环境,允许以连续的本地集合数据运行flink程序
  4. OptimizerPlanEnvironment:不会创建执行环境,只会创建执行计划
  5. ContextEnvironment:用于在客户端上远程执行



  1. RuntimeEnvironment:运行时环境,在task开始执行时初始化
  2. DummyEnvironment:用于测试的运行时环境
  3. MockEnvironment:单元测试的运行时环境




  1. StreamingRuntimeContext:用于流式计算的上下文
  2. DistributedRuntimeUDFContext:在运行时自定义函数所在的批处理算子创建,dataset批处理中使用
  3. RuntimeUDFContext:在批处理应用的自定义函数中使用





  1. StreamRecord:业务数据,也可以认为是一个事件
  2. watermark:水位线-时间戳,将告诉算子早于水位线的数据均已到达,可以触发计算窗口或者定时器
  3. StreamStatus:数据流状态,用于告知task是否继续接收上游的数据,在数据源算子中生成,沿着dataflow向下游传递;状态包括:
    1. IDLE:闲置
    2. ACTIVE:活动
  4. LatencyMarker:用于监控数据处理延迟,在数据源算子中生成,沿着dataflow向下游传递但会绕过业务逻辑,最终在sink中估算整体耗时






  1. SourceTransformation:物理,Flink作业的起点,不存在输入因此不会出现实际意义的转换;一个作业可以有多个SourceTransformation
  2. SinkTransformation:物理,Flink作业的终点,将数据输出到外部存储,其不会再有下游转换;一个作业可以有多个SourceTransformation
  3. OneInputTransformation:物理,单输入单输出转换
  4. TwoInputTransformation:物理,双输入单输出转换
  5. SplitTransformation:虚拟,按条件将单DataStream拆分为多数据流,使用OutputSelector;并不会真正的做数据转换,只是做上下游的衔接
  6. SelectTransformation:虚拟,与SplitTransformation配合使用,为其选择切分后的DataStream
  7. PartitionTransformation:虚拟,根据输入的StreamPartitioner对数据流做分区选择,只是做上下游的衔接
  8. UnionTransformation:虚拟,将上游多个DataStream合并为一个,要求输入的多条DataStream结构一致








  1. setup :实例化operator,初始化包括:环境、时间范围、注册监控等
  2. open :它的实现通常包含了operator的初始化逻辑;算子在执行该方法后,才会执行function的数据处理。
  3. close :该方法在所有的元素都进入到operator被处理之后调用,会保证计算后的缓存数据向下游发送
  4. dispose :该方法在operator生命周期的最后阶段执行,主要用于回收资源


  1. 状态存储
  2. 触发checkpoint后,保存快照
  3. 快照保存到外部存储
  4. 作业失败的时候,负责从快照中恢复状态





public interface OneInputStreamOperator<IN, OUT> extends StreamOperator<OUT> {

	 * Processes one element that arrived at this operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	void processElement(StreamRecord<IN> element) throws Exception;

	 * Processes a {@link Watermark}.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	 * @see org.apache.flink.streaming.api.watermark.Watermark
	void processWatermark(Watermark mark) throws Exception;

	void processLatencyMarker(LatencyMarker latencyMarker) throws Exception;


public interface TwoInputStreamOperator<IN1, IN2, OUT> extends StreamOperator<OUT> {

	 * Processes one element that arrived on the first input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	void processElement1(StreamRecord<IN1> element) throws Exception;

	 * Processes one element that arrived on the second input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	void processElement2(StreamRecord<IN2> element) throws Exception;

	 * Processes a {@link Watermark} that arrived on the first input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	 * @see org.apache.flink.streaming.api.watermark.Watermark
	void processWatermark1(Watermark mark) throws Exception;

	 * Processes a {@link Watermark} that arrived on the second input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	 * @see org.apache.flink.streaming.api.watermark.Watermark
	void processWatermark2(Watermark mark) throws Exception;

	 * Processes a {@link LatencyMarker} that arrived on the first input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	 * @see org.apache.flink.streaming.runtime.streamrecord.LatencyMarker
	void processLatencyMarker1(LatencyMarker latencyMarker) throws Exception;

	 * Processes a {@link LatencyMarker} that arrived on the second input of this two-input operator.
	 * This method is guaranteed to not be called concurrently with other methods of the operator.
	 * @see org.apache.flink.streaming.runtime.streamrecord.LatencyMarker
	void processLatencyMarker2(LatencyMarker latencyMarker) throws Exception;




  1. 顺序输出模式:保证输出的数据与输入数据的顺序一致,但会增加延迟、降低算子吞吐量;内部是一个队列保证先收到的数据先输出,即使后续数据先得到回执也会等待
  2. 无序输出模式:先处理完得到回执的数据先输出,但不保证顺序,但延迟更低、吞吐量更高







  1. SourceFunction:负责从外部读取数据,其所在的算子是起始点,不会有上游算子
  2. SinkFunction:负责将数据写入到外部存储,其所在的算子是终点,不会有下游算子
  3. Function:负责数据的处理,因此会同时有上游算子与下游算子;出于简单有效的考虑,设计与算子类似UDF也只分为单流输入与双流输入两种


在DataStream API中看函数的层次分为3层,由高到底的封装分别为:

  1. Function:无状态、UDF接口;在使用时无需关系底层概念,只需要实现业务逻辑即可
  2. RichFunction:UDF接口+状态+生命周期;可以实现open、close方法来管理初始化与清理释放等动作;可以get/setRuntimeContext来得到运行时环境的参数,这可能是非常有用的
  3. ProcessFunction:UDF接口+状态+生命周期+触发器



  1. Keyed与Non-Keyed的区别是,Keyed的函数只能应用与KeyedStream
  2. Co与Non-Co的区别是,Co函数是双流输入









  1. processElement:只能使用ReadOnlyContext只读上下文;这是因为在广播状态下,要求所有的算子上的广播状态完全一致,如果允许修改可能就导致状态可能不一致而出现不可预测的异常;另一方面平行算子无法通讯,因此在设计上也做不到广播更新。
  2. processBroadcastElement:支持使用可读写的上下文Context 
 * A function to be applied to a
 * {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream BroadcastConnectedStream} that
 * connects {@link org.apache.flink.streaming.api.datastream.BroadcastStream BroadcastStream}, i.e. a stream
 * with broadcast state, with a <b>non-keyed</b> {@link org.apache.flink.streaming.api.datastream.DataStream DataStream}.
 * <p>The stream with the broadcast state can be created using the
 * {@link org.apache.flink.streaming.api.datastream.DataStream#broadcast(MapStateDescriptor[])}
 * stream.broadcast(MapStateDescriptor)} method.
 * <p>The user has to implement two methods:
 * <ol>
 *     <li>the {@link #processBroadcastElement(Object, Context, Collector)} which will be applied to
 *     each element in the broadcast side
 *     <li> and the {@link #processElement(Object, ReadOnlyContext, Collector)} which will be applied to the
 *     non-broadcasted/keyed side.
 * </ol>
 * <p>The {@code processElementOnBroadcastSide()} takes as argument (among others) a context that allows it to
 * read/write to the broadcast state, while the {@code processElement()} has read-only access to the broadcast state.
 * @param <IN1> The input type of the non-broadcast side.
 * @param <IN2> The input type of the broadcast side.
 * @param <OUT> The output type of the operator.
public abstract class BroadcastProcessFunction<IN1, IN2, OUT> extends BaseBroadcastProcessFunction {

	private static final long serialVersionUID = 8352559162119034453L;

	 * This method is called for each element in the (non-broadcast)
	 * {@link org.apache.flink.streaming.api.datastream.DataStream data stream}.
	 * <p>This function can output zero or more elements using the {@link Collector} parameter,
	 * query the current processing/event time, and also query and update the local keyed state.
	 * Finally, it has <b>read-only</b> access to the broadcast state.
	 * The context is only valid during the invocation of this method, do not store it.
	 * @param value The stream element.
	 * @param ctx A {@link ReadOnlyContext} that allows querying the timestamp of the element,
	 *            querying the current processing/event time and updating the broadcast state.
	 *            The context is only valid during the invocation of this method, do not store it.
	 * @param out The collector to emit resulting elements to
	 * @throws Exception The function may throw exceptions which cause the streaming program
	 *                   to fail and go into recovery.
	public abstract void processElement(final IN1 value, final ReadOnlyContext ctx, final Collector<OUT> out) throws Exception;

	 * This method is called for each element in the
	 * {@link org.apache.flink.streaming.api.datastream.BroadcastStream broadcast stream}.
	 * <p>This function can output zero or more elements using the {@link Collector} parameter,
	 * query the current processing/event time, and also query and update the internal
	 * {@link org.apache.flink.api.common.state.BroadcastState broadcast state}. These can be done
	 * through the provided {@link Context}.
	 * The context is only valid during the invocation of this method, do not store it.
	 * @param value The stream element.
	 * @param ctx A {@link Context} that allows querying the timestamp of the element,
	 *            querying the current processing/event time and updating the broadcast state.
	 *            The context is only valid during the invocation of this method, do not store it.
	 * @param out The collector to emit resulting elements to
	 * @throws Exception The function may throw exceptions which cause the streaming program
	 *                   to fail and go into recovery.
	public abstract void processBroadcastElement(final IN2 value, final Context ctx, final Collector<OUT> out) throws Exception;

	 * A {@link BaseBroadcastProcessFunction.Context context} available to the broadcast side of
	 * a {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream}.
	public abstract class Context extends BaseBroadcastProcessFunction.Context {}

	 * A {@link BaseBroadcastProcessFunction.Context context} available to the non-keyed side of
	 * a {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream} (if any).
	public abstract class ReadOnlyContext extends BaseBroadcastProcessFunction.ReadOnlyContext {}




public interface AsyncFunction<IN, OUT> extends Function, Serializable {

	 * Trigger async operation for each stream input.
	 * @param input element coming from an upstream task
	 * @param resultFuture to be completed with the result data
	 * @exception Exception in case of a user code error. An exception will make the task fail and
	 * trigger fail-over process.
	void asyncInvoke(IN input, ResultFuture<OUT> resultFuture) throws Exception;

	 * {@link AsyncFunction#asyncInvoke} timeout occurred.
	 * By default, the result future is exceptionally completed with a timeout exception.
	 * @param input element coming from an upstream task
	 * @param resultFuture to be completed with the result data
	default void timeout(IN input, ResultFuture<OUT> resultFuture) throws Exception {
			new TimeoutException("Async function call has timed out."));






  1. 生命周期:一般的实现类都会集成AbstractRichFunction,所以可以包含生命周期中的:open、close、cancel3个方法
  2. 数据读取:可以根据不同的外部存储实现持续的数据读取,如:kafka
  3. 数据发送:没啥好说的
  4. 水位线的生成并向下游发送
  5. 空闲标记:如果未读取到数据,则标记task为空闲,会向下游发送Idel,阻止水位线向下游的传递
 * Base interface for all stream data sources in Flink. The contract of a stream source
 * is the following: When the source should start emitting elements, the {@link #run} method
 * is called with a {@link SourceContext} that can be used for emitting elements.
 * The run method can run for as long as necessary. The source must, however, react to an
 * invocation of {@link #cancel()} by breaking out of its main loop.
 * <h3>CheckpointedFunction Sources</h3>
 * <p>Sources that also implement the {@link org.apache.flink.streaming.api.checkpoint.CheckpointedFunction}
 * interface must ensure that state checkpointing, updating of internal state and emission of
 * elements are not done concurrently. This is achieved by using the provided checkpointing lock
 * object to protect update of state and emission of elements in a synchronized block.
 * <p>This is the basic pattern one should follow when implementing a checkpointed source:
 * <pre>{@code
 *  public class ExampleCountSource implements SourceFunction<Long>, CheckpointedFunction {
 *      private long count = 0L;
 *      private volatile boolean isRunning = true;
 *      private transient ListState<Long> checkpointedCount;
 *      public void run(SourceContext<T> ctx) {
 *          while (isRunning && count < 1000) {
 *              // this synchronized block ensures that state checkpointing,
 *              // internal state updates and emission of elements are an atomic operation
 *              synchronized (ctx.getCheckpointLock()) {
 *                  ctx.collect(count);
 *                  count++;
 *              }
 *          }
 *      }
 *      public void cancel() {
 *          isRunning = false;
 *      }
 *      public void initializeState(FunctionInitializationContext context) {
 *          this.checkpointedCount = context
 *              .getOperatorStateStore()
 *              .getListState(new ListStateDescriptor<>("count", Long.class));
 *          if (context.isRestored()) {
 *              for (Long count : this.checkpointedCount.get()) {
 *                  this.count = count;
 *              }
 *          }
 *      }
 *      public void snapshotState(FunctionSnapshotContext context) {
 *          this.checkpointedCount.clear();
 *          this.checkpointedCount.add(count);
 *      }
 * }
 * }</pre>
 * <h3>Timestamps and watermarks:</h3>
 * Sources may assign timestamps to elements and may manually emit watermarks.
 * However, these are only interpreted if the streaming program runs on
 * {@link TimeCharacteristic#EventTime}. On other time characteristics
 * ({@link TimeCharacteristic#IngestionTime} and {@link TimeCharacteristic#ProcessingTime}),
 * the watermarks from the source function are ignored.
 * <h3>Gracefully Stopping Functions</h3>
 * Functions may additionally implement the {@link org.apache.flink.api.common.functions.StoppableFunction}
 * interface. "Stopping" a function, in contrast to "canceling" means a graceful exit that leaves the
 * state and the emitted elements in a consistent state.
 * <p>When a source is stopped, the executing thread is not interrupted, but expected to leave the
 * {@link #run(SourceContext)} method in reasonable time on its own, preserving the atomicity
 * of state updates and element emission.
 * @param <T> The type of the elements produced by this source.
 * @see org.apache.flink.api.common.functions.StoppableFunction
 * @see org.apache.flink.streaming.api.TimeCharacteristic
public interface SourceFunction<T> extends Function, Serializable {

	 * Starts the source. Implementations can use the {@link SourceContext} emit
	 * elements.
	 * <p>Sources that implement {@link org.apache.flink.streaming.api.checkpoint.CheckpointedFunction}
	 * must lock on the checkpoint lock (using a synchronized block) before updating internal
	 * state and emitting elements, to make both an atomic operation:
	 * <pre>{@code
	 *  public class ExampleCountSource implements SourceFunction<Long>, CheckpointedFunction {
	 *      private long count = 0L;
	 *      private volatile boolean isRunning = true;
	 *      private transient ListState<Long> checkpointedCount;
	 *      public void run(SourceContext<T> ctx) {
	 *          while (isRunning && count < 1000) {
	 *              // this synchronized block ensures that state checkpointing,
	 *              // internal state updates and emission of elements are an atomic operation
	 *              synchronized (ctx.getCheckpointLock()) {
	 *                  ctx.collect(count);
	 *                  count++;
	 *              }
	 *          }
	 *      }
	 *      public void cancel() {
	 *          isRunning = false;
	 *      }
	 *      public void initializeState(FunctionInitializationContext context) {
	 *          this.checkpointedCount = context
	 *              .getOperatorStateStore()
	 *              .getListState(new ListStateDescriptor<>("count", Long.class));
	 *          if (context.isRestored()) {
	 *              for (Long count : this.checkpointedCount.get()) {
	 *                  this.count = count;
	 *              }
	 *          }
	 *      }
	 *      public void snapshotState(FunctionSnapshotContext context) {
	 *          this.checkpointedCount.clear();
	 *          this.checkpointedCount.add(count);
	 *      }
	 * }
	 * }</pre>
	 * @param ctx The context to emit elements to and for accessing locks.
	void run(SourceContext<T> ctx) throws Exception;

	 * Cancels the source. Most sources will have a while loop inside the
	 * {@link #run(SourceContext)} method. The implementation needs to ensure that the
	 * source will break out of that loop after this method is called.
	 * <p>A typical pattern is to have an {@code "volatile boolean isRunning"} flag that is set to
	 * {@code false} in this method. That flag is checked in the loop condition.
	 * <p>When a source is canceled, the executing thread will also be interrupted
	 * (via {@link Thread#interrupt()}). The interruption happens strictly after this
	 * method has been called, so any interruption handler can rely on the fact that
	 * this method has completed. It is good practice to make any flags altered by
	 * this method "volatile", in order to guarantee the visibility of the effects of
	 * this method to any interruption handler.
	void cancel();

	// ------------------------------------------------------------------------
	//  source context
	// ------------------------------------------------------------------------

	 * Interface that source functions use to emit elements, and possibly watermarks.
	 * @param <T> The type of the elements produced by the source.
	@Public // Interface might be extended in the future with additional methods.
	interface SourceContext<T> {

		 * Emits one element from the source, without attaching a timestamp. In most cases,
		 * this is the default way of emitting elements.
		 * <p>The timestamp that the element will get assigned depends on the time characteristic of
		 * the streaming program:
		 * <ul>
		 *     <li>On {@link TimeCharacteristic#ProcessingTime}, the element has no timestamp.</li>
		 *     <li>On {@link TimeCharacteristic#IngestionTime}, the element gets the system's
		 *         current time as the timestamp.</li>
		 *     <li>On {@link TimeCharacteristic#EventTime}, the element will have no timestamp initially.
		 *         It needs to get a timestamp (via a {@link TimestampAssigner}) before any time-dependent
		 *         operation (like time windows).</li>
		 * </ul>
		 * @param element The element to emit
		void collect(T element);

		 * Emits one element from the source, and attaches the given timestamp. This method
		 * is relevant for programs using {@link TimeCharacteristic#EventTime}, where the
		 * sources assign timestamps themselves, rather than relying on a {@link TimestampAssigner}
		 * on the stream.
		 * <p>On certain time characteristics, this timestamp may be ignored or overwritten.
		 * This allows programs to switch between the different time characteristics and behaviors
		 * without changing the code of the source functions.
		 * <ul>
		 *     <li>On {@link TimeCharacteristic#ProcessingTime}, the timestamp will be ignored,
		 *         because processing time never works with element timestamps.</li>
		 *     <li>On {@link TimeCharacteristic#IngestionTime}, the timestamp is overwritten with the
		 *         system's current time, to realize proper ingestion time semantics.</li>
		 *     <li>On {@link TimeCharacteristic#EventTime}, the timestamp will be used.</li>
		 * </ul>
		 * @param element The element to emit
		 * @param timestamp The timestamp in milliseconds since the Epoch
		void collectWithTimestamp(T element, long timestamp);

		 * Emits the given {@link Watermark}. A Watermark of value {@code t} declares that no
		 * elements with a timestamp {@code t' <= t} will occur any more. If further such
		 * elements will be emitted, those elements are considered <i>late</i>.
		 * <p>This method is only relevant when running on {@link TimeCharacteristic#EventTime}.
		 * On {@link TimeCharacteristic#ProcessingTime},Watermarks will be ignored. On
		 * {@link TimeCharacteristic#IngestionTime}, the Watermarks will be replaced by the
		 * automatic ingestion time watermarks.
		 * @param mark The Watermark to emit
		void emitWatermark(Watermark mark);

		 * Marks the source to be temporarily idle. This tells the system that this source will
		 * temporarily stop emitting records and watermarks for an indefinite amount of time. This
		 * is only relevant when running on {@link TimeCharacteristic#IngestionTime} and
		 * {@link TimeCharacteristic#EventTime}, allowing downstream tasks to advance their
		 * watermarks without the need to wait for watermarks from this source while it is idle.
		 * <p>Source functions should make a best effort to call this method as soon as they
		 * acknowledge themselves to be idle. The system will consider the source to resume activity
		 * again once {@link SourceContext#collect(T)}, {@link SourceContext#collectWithTimestamp(T, long)},
		 * or {@link SourceContext#emitWatermark(Watermark)} is called to emit elements or watermarks from the source.
		void markAsTemporarilyIdle();

		 * Returns the checkpoint lock. Please refer to the class-level comment in
		 * {@link SourceFunction} for details about how to write a consistent checkpointed
		 * source.
		 * @return The object to use as the lock
		Object getCheckpointLock();

		 * This method is called by the system to shut down the context.
		void close();


  1. NonTimestampContext:无时间,将全部元素的时间戳set为-1,这意味着永远不向下游发送水位线
  2. WatermarkContext:带时间,定义了与Watermark相关的行为
    1. 管理当前的StreamStatus,并向下游传递
    2. 空闲检查,当超过设定的事件间隔仍未收到数据或者水位线时,将task置为空闲
  3. AutomaticWatermarkContext:使用Ingestion time的时候,会自动生成水位线;原理是使用定时器(WatermarkEmittingTask),其触发时间=(作业启动时间戳+水位线周期*n),并持续的向下游发送水位线
  4. ManualWatermarkContext:使用event time的时候,不产生水位线,而是向下游透传上游传递来的水位线

		private AutomaticWatermarkContext(

			long now = this.timeService.getCurrentProcessingTime();
			this.nextWatermarkTimer = this.timeService.registerTimer(now + watermarkInterval,
				new WatermarkEmittingTask(this.timeService, checkpointLock, output));


		private class WatermarkEmittingTask implements ProcessingTimeCallback {

			private final ProcessingTimeService timeService;
			private final Object lock;
			private final Output<StreamRecord<T>> output;

			private WatermarkEmittingTask(
					ProcessingTimeService timeService,
					Object checkpointLock,
					Output<StreamRecord<T>> output) {
				this.timeService = timeService;
				this.lock = checkpointLock;
				this.output = output;

			public void onProcessingTime(long timestamp) {
				final long currentTime = timeService.getCurrentProcessingTime();

				synchronized (lock) {
					// we should continue to automatically emit watermarks if we are active
					if (streamStatusMaintainer.getStreamStatus().isActive()) {
						if (idleTimeout != -1 && currentTime - lastRecordTime > idleTimeout) {
							// if we are configured to detect idleness, piggy-back the idle detection check on the
							// watermark interval, so that we may possibly discover idle sources faster before waiting
							// for the next idle check to fire

							// no need to finish the next check, as we are now idle.
						} else if (currentTime > nextWatermarkTime) {
							// align the watermarks across all machines. this will ensure that we
							// don't have watermarks that creep along at different intervals because
							// the machine clocks are out of sync
							final long watermarkTime = currentTime - (currentTime % watermarkInterval);

							output.emitWatermark(new Watermark(watermarkTime));
							nextWatermarkTime = watermarkTime + watermarkInterval;

				long nextWatermark = currentTime + watermarkInterval;
				nextWatermarkTimer = this.timeService.registerTimer(
						nextWatermark, new WatermarkEmittingTask(this.timeService, lock, output));







public interface CheckpointedFunction {

	 * This method is called when a snapshot for a checkpoint is requested. This acts as a hook to the function to
	 * ensure that all state is exposed by means previously offered through {@link FunctionInitializationContext} when
	 * the Function was initialized, or offered now by {@link FunctionSnapshotContext} itself.
	 * @param context the context for drawing a snapshot of the operator
	 * @throws Exception
	void snapshotState(FunctionSnapshotContext context) throws Exception;

	 * This method is called when the parallel function instance is created during distributed
	 * execution. Functions typically set up their state storing data structures in this method.
	 * @param context the context for initializing the operator
	 * @throws Exception
	void initializeState(FunctionInitializationContext context) throws Exception;



public interface ListCheckpointed<T extends Serializable> {

	 * Gets the current state of the function. The state must reflect the result of all prior
	 * invocations to this function.
	 * <p>The returned list should contain one entry for redistributable unit of state. See
	 * the {@link ListCheckpointed class docs} for an illustration how list-style state
	 * redistribution works.
	 * <p>As special case, the returned list may be null or empty (if the operator has no state)
	 * or it may contain a single element (if the operator state is indivisible).
	 * @param checkpointId The ID of the checkpoint - a unique and monotonously increasing value.
	 * @param timestamp The wall clock timestamp when the checkpoint was triggered by the master.
	 * @return The operator state in a list of redistributable, atomic sub-states.
	 *         Should not return null, but empty list instead.
	 * @throws Exception Thrown if the creation of the state object failed. This causes the
	 *                   checkpoint to fail. The system may decide to fail the operation (and trigger
	 *                   recovery), or to discard this checkpoint attempt and to continue running
	 *                   and to try again with the next checkpoint attempt.
	List<T> snapshotState(long checkpointId, long timestamp) throws Exception;

	 * Restores the state of the function or operator to that of a previous checkpoint.
	 * This method is invoked when the function is executed after a failure recovery.
	 * The state list may be empty if no state is to be recovered by the particular parallel instance
	 * of the function.
	 * <p>The given state list will contain all the <i>sub states</i> that this parallel
	 * instance of the function needs to handle. Refer to the  {@link ListCheckpointed class docs}
	 * for an illustration how list-style state redistribution works.
	 * <p><b>Important:</b> When implementing this interface together with {@link RichFunction},
	 * then the {@code restoreState()} method is called before {@link RichFunction#open(Configuration)}.
	 * @param state The state to be restored as a list of atomic sub-states.
	 * @throws Exception Throwing an exception in this method causes the recovery to fail.
	 *                   The exact consequence depends on the configured failure handling strategy,
	 *                   but typically the system will re-attempt the recovery, or try recovering
	 *                   from a different checkpoint.
	void restoreState(List<T> state) throws Exception;







 * The {@link ChannelSelector} determines to which logical channels a record
 * should be written to.
 * @param <T> the type of record which is sent through the attached output gate
public interface ChannelSelector<T extends IOReadableWritable> {

	 * Returns the logical channel indexes, to which the given record should be
	 * written.
	 * @param record      the record to the determine the output channels for
	 * @param numChannels the total number of output channels which are attached to respective output gate
	 * @return a (possibly empty) array of integer numbers which indicate the indices of the output channels through
	 * which the record shall be forwarded
	int[] selectChannels(T record, int numChannels);


  1. partitionCustom:DataStream的自定义分区,为每个原始选择目标分区,它将生成一个新的DataStream
  2. ForwardPartitioner:上游算子数据直接转发给下游算子,它将生成一个新的DataStream
  3. ShufflePartitioner:随机的选择
  4. RebalancePartitioner:轮训的方式向下游发送数据,避免数据倾斜
  5. RescalePartitioner:根据上下游task的数量进行分区
  6. BroadcastPartitioner:广播方式
  7. KeyGroupStreamPartitioner:KeyedStream根据key的分组进行分区







 * A statistically unique identification number.
public class AbstractID implements Comparable<AbstractID>, java.io.Serializable {

	private static final long serialVersionUID = 1L;

	private static final Random RND = new Random();

	/** The size of a long in bytes. */
	private static final int SIZE_OF_LONG = 8;

	/** The size of the ID in byte. */
	public static final int SIZE = 2 * SIZE_OF_LONG;

	// ------------------------------------------------------------------------

	/** The upper part of the actual ID. */
	protected final long upperPart;

	/** The lower part of the actual ID. */
	protected final long lowerPart;

	/** The memoized value returned by toString(). */
	private transient String toString;

	// --------------------------------------------------------------------------------------------

	 * Constructs a new ID with a specific bytes value.
	public AbstractID(byte[] bytes) {
		if (bytes == null || bytes.length != SIZE) {
			throw new IllegalArgumentException("Argument bytes must by an array of " + SIZE + " bytes");

		this.lowerPart = byteArrayToLong(bytes, 0);
		this.upperPart = byteArrayToLong(bytes, SIZE_OF_LONG);

	 * Constructs a new abstract ID.
	 * @param lowerPart the lower bytes of the ID
	 * @param upperPart the higher bytes of the ID
	public AbstractID(long lowerPart, long upperPart) {
		this.lowerPart = lowerPart;
		this.upperPart = upperPart;

	 * Copy constructor: Creates a new abstract ID from the given one.
	 * @param id the abstract ID to copy
	public AbstractID(AbstractID id) {
		if (id == null) {
			throw new IllegalArgumentException("Id must not be null.");
		this.lowerPart = id.lowerPart;
		this.upperPart = id.upperPart;

	 * Constructs a new random ID from a uniform distribution.
	public AbstractID() {
		this.lowerPart = RND.nextLong();
		this.upperPart = RND.nextLong();

	// --------------------------------------------------------------------------------------------

	 * Gets the lower 64 bits of the ID.
	 * @return The lower 64 bits of the ID.
	public long getLowerPart() {
		return lowerPart;

	 * Gets the upper 64 bits of the ID.
	 * @return The upper 64 bits of the ID.
	public long getUpperPart() {
		return upperPart;

	 * Gets the bytes underlying this ID.
	 * @return The bytes underlying this ID.
	public byte[] getBytes() {
		byte[] bytes = new byte[SIZE];
		longToByteArray(lowerPart, bytes, 0);
		longToByteArray(upperPart, bytes, SIZE_OF_LONG);
		return bytes;

	// --------------------------------------------------------------------------------------------
	//  Standard Utilities
	// --------------------------------------------------------------------------------------------

	public boolean equals(Object obj) {
		if (obj == this) {
			return true;
		} else if (obj != null && obj.getClass() == getClass()) {
			AbstractID that = (AbstractID) obj;
			return that.lowerPart == this.lowerPart && that.upperPart == this.upperPart;
		} else {
			return false;

	public int hashCode() {
		return ((int)  this.lowerPart) ^
				((int) (this.lowerPart >>> 32)) ^
				((int)  this.upperPart) ^
				((int) (this.upperPart >>> 32));

	public String toString() {
		if (this.toString == null) {
			final byte[] ba = new byte[SIZE];
			longToByteArray(this.lowerPart, ba, 0);
			longToByteArray(this.upperPart, ba, SIZE_OF_LONG);

			this.toString = StringUtils.byteToHexString(ba);

		return this.toString;

	public int compareTo(AbstractID o) {
		int diff1 = Long.compare(this.upperPart, o.upperPart);
		int diff2 = Long.compare(this.lowerPart, o.lowerPart);
		return diff1 == 0 ? diff2 : diff1;

	// --------------------------------------------------------------------------------------------
	//  Conversion Utilities
	// --------------------------------------------------------------------------------------------

	 * Converts the given byte array to a long.
	 * @param ba the byte array to be converted
	 * @param offset the offset indicating at which byte inside the array the conversion shall begin
	 * @return the long variable
	private static long byteArrayToLong(byte[] ba, int offset) {
		long l = 0;

		for (int i = 0; i < SIZE_OF_LONG; ++i) {
			l |= (ba[offset + SIZE_OF_LONG - 1 - i] & 0xffL) << (i << 3);

		return l;

	 * Converts a long to a byte array.
	 * @param l the long variable to be converted
	 * @param ba the byte array to store the result the of the conversion
	 * @param offset offset indicating at what position inside the byte array the result of the conversion shall be stored
	private static void longToByteArray(long l, byte[] ba, int offset) {
		for (int i = 0; i < SIZE_OF_LONG; ++i) {
			final int shift = i << 3; // i * 8
			ba[offset + SIZE_OF_LONG - 1 - i] = (byte) ((l & (0xffL << shift)) >>> shift);



  • 面向开发者的抽象
  • 核心运行时的抽象
    • 数据流与操作抽象
    • 数据转换抽象
    • 算子、函数、数据分区的抽象
    • 数据IO的抽象
  • 1
  • 1
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


