目录
导读
源码的阅读我一般会从顶层与核心抽象开始,切勿一来就钻到具体的代码中,一个成熟的框架少则百万行的代码,不可能读完;只有先从架构设计与顶层抽象开始,才能掌握其设计要领,需要知道方法如何实现、算法如何实现时,才能快速找到位置,详细的看;这才是看源码的精髓。
环境对象
Flink的环境对象可以分为3种:开发时执行环境对象-StreamExecutionEnvironment、运行时执行环境对象-ExecutionEnvironment、运行时上下文对象
StreamExecutionEnvironment
是Flink开发时的入口(位于flink-streaming-java内),表示流式计算job的执行环境,包括了:job开发入口、数据源接口、DataStream生成与转换接口、Sink数据接口、job配置接口、job启动入口。
细分的话还包含: LocalStreamEnvironment、RemoteStreamEnvironment、StreamContextEnvironment、StreamPlanEnvironment
ExecutionEnvironment
是运行时job级别的环境对象(位于flink-java内),是从StreamExecutionEnvironment衍生出来的。启动job时,会从StreamExecutionEnvironment中抽取出需要的上下文数据,根据job的不同情况选择不同的运行时执行环境对象
- LocalEnvironment:本地执行环境,在单jvm环境下模拟运行flink集群,用于本地的开发测试
- RemoteEnvironment:在远端部署的flink集群的执行环境
- CollectionEnvironment:集合数据集模式执行环境,允许以连续的本地集合数据运行flink程序
- OptimizerPlanEnvironment:不会创建执行环境,只会创建执行计划
- ContextEnvironment:用于在客户端上远程执行
Environment
flink的运行时环境,作为接口定义了运行时task所需要的配置
- RuntimeEnvironment:运行时环境,在task开始执行时初始化
- DummyEnvironment:用于测试的运行时环境
- MockEnvironment:单元测试的运行时环境
RuntimeContext
是function运行时的上下文,每个function实例都会有一个runtimecontext对象,可以用RichFunction.getRuntimeContext()得到该对象
- StreamingRuntimeContext:用于流式计算的上下文
- DistributedRuntimeUDFContext:在运行时自定义函数所在的批处理算子创建,dataset批处理中使用
- RuntimeUDFContext:在批处理应用的自定义函数中使用
数据流元素
StreamElement
包括不同用途的4类元素,在执行层面上会被序列化为二进制数据流,在算子总会反序列化出来,进行处理
- StreamRecord:业务数据,也可以认为是一个事件
- watermark:水位线-时间戳,将告诉算子早于水位线的数据均已到达,可以触发计算窗口或者定时器
- StreamStatus:数据流状态,用于告知task是否继续接收上游的数据,在数据源算子中生成,沿着dataflow向下游传递;状态包括:
- IDLE:闲置
- ACTIVE:活动
- LatencyMarker:用于监控数据处理延迟,在数据源算子中生成,沿着dataflow向下游传递但会绕过业务逻辑,最终在sink中估算整体耗时
数据转换
Transformation
是现结DataStream与Flink内核的结构,DataStream面向开发、transformation面向Flink内核;在数据处理时,DataStream流水线会被转换为transformation流水线
transformation可以分为物理与虚拟2大类
- SourceTransformation:物理,Flink作业的起点,不存在输入因此不会出现实际意义的转换;一个作业可以有多个SourceTransformation
- SinkTransformation:物理,Flink作业的终点,将数据输出到外部存储,其不会再有下游转换;一个作业可以有多个SourceTransformation
- OneInputTransformation:物理,单输入单输出转换
- TwoInputTransformation:物理,双输入单输出转换
- SplitTransformation:虚拟,按条件将单DataStream拆分为多数据流,使用OutputSelector;并不会真正的做数据转换,只是做上下游的衔接
- SelectTransformation:虚拟,与SplitTransformation配合使用,为其选择切分后的DataStream
- PartitionTransformation:虚拟,根据输入的StreamPartitioner对数据流做分区选择,只是做上下游的衔接
- UnionTransformation:虚拟,将上游多个DataStream合并为一个,要求输入的多条DataStream结构一致
transformation作为中介,会将StreamTask、算子工厂构建好,算子作为UDF执行容器。
算子
StreamOperator
是流式计算的算子,一个算子就是一个计算步骤,而真正的计算则是算子中包含的function;
DataStream与DataSet有着2套不同的算子体系,未来的发展趋势是批流议题,本文只讨论DataStream的算子。
算子生命周期
- setup :实例化
operator,初始化包括:环境、时间范围、注册监控等
- open :它的实现通常包含了
operator
的初始化逻辑;算子在执行该方法后,才会执行function的数据处理。 - close :该方法在所有的元素都进入到
operator
被处理之后调用,会保证计算后的缓存数据向下游发送 - dispose :该方法在
operator
生命周期的最后阶段执行,主要用于回收资源
状态与容错
- 状态存储
- 触发checkpoint后,保存快照
- 快照保存到外部存储
- 作业失败的时候,负责从快照中恢复状态
数据处理
在数据处理的同时,也会对数据元素中的watermark、LatencyMarker进出处理。
算子根据单输入和双输入定义了2个行为接口:
OneInputStreamOperator
public interface OneInputStreamOperator<IN, OUT> extends StreamOperator<OUT> {
/**
* Processes one element that arrived at this operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*/
void processElement(StreamRecord<IN> element) throws Exception;
/**
* Processes a {@link Watermark}.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
void processWatermark(Watermark mark) throws Exception;
void processLatencyMarker(LatencyMarker latencyMarker) throws Exception;
}
TwoInputStreamOperator
public interface TwoInputStreamOperator<IN1, IN2, OUT> extends StreamOperator<OUT> {
/**
* Processes one element that arrived on the first input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*/
void processElement1(StreamRecord<IN1> element) throws Exception;
/**
* Processes one element that arrived on the second input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*/
void processElement2(StreamRecord<IN2> element) throws Exception;
/**
* Processes a {@link Watermark} that arrived on the first input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
void processWatermark1(Watermark mark) throws Exception;
/**
* Processes a {@link Watermark} that arrived on the second input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
void processWatermark2(Watermark mark) throws Exception;
/**
* Processes a {@link LatencyMarker} that arrived on the first input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.runtime.streamrecord.LatencyMarker
*/
void processLatencyMarker1(LatencyMarker latencyMarker) throws Exception;
/**
* Processes a {@link LatencyMarker} that arrived on the second input of this two-input operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.runtime.streamrecord.LatencyMarker
*/
void processLatencyMarker2(LatencyMarker latencyMarker) throws Exception;
}
异步算子
为了解决与外部系统交互是带来的延迟瓶颈;可以同时发出请求处理回执,不需要阻塞式的等待;关于回调的顺序,支持2种模式:
- 顺序输出模式:保证输出的数据与输入数据的顺序一致,但会增加延迟、降低算子吞吐量;内部是一个队列保证先收到的数据先输出,即使后续数据先得到回执也会等待
- 无序输出模式:先处理完得到回执的数据先输出,但不保证顺序,但延迟更低、吞吐量更高
要说的是,即便是无序模式,也不是完全的没有顺序,还记得watermark吗?flink仍会保证水位线不会超越先到数据;即可以依旧水位线来分割成组,组内是乱序但组与组之间是有序的。
函数
Function
其中自定义函数检查UDF,同时也有很多的内置函数;类型上看大体分为3类:
- SourceFunction:负责从外部读取数据,其所在的算子是起始点,不会有上游算子
- SinkFunction:负责将数据写入到外部存储,其所在的算子是终点,不会有下游算子
- Function:负责数据的处理,因此会同时有上游算子与下游算子;出于简单有效的考虑,设计与算子类似UDF也只分为单流输入与双流输入两种
层次
在DataStream API中看函数的层次分为3层,由高到底的封装分别为:
- Function:无状态、UDF接口;在使用时无需关系底层概念,只需要实现业务逻辑即可
- RichFunction:UDF接口+状态+生命周期;可以实现open、close方法来管理初始化与清理释放等动作;可以get/setRuntimeContext来得到运行时环境的参数,这可能是非常有用的
- ProcessFunction:UDF接口+状态+生命周期+触发器
需要说的是,无状态Function可以无脑使用,但有状态的函数,需要考虑中间结果的保存与恢复。
简单的类图上能看出其差异:
- Keyed与Non-Keyed的区别是,Keyed的函数只能应用与KeyedStream
- Co与Non-Co的区别是,Co函数是双流输入
延迟计算
这个概念是批量一体的一个非常重要的设计
流式计算中数据到抵达会乱序、延迟,为了提高处理效率,使用小批次的计算模式,而不是每个事件都触发一次。
典型场景像Join的定时器,或者window中的watermark。
支持延迟计算的算子都需要继承Triggerable接口,可以实现基于事件时间与处理时间的行为。
广播函数
继承于RichFunction接口、AbstractRichFunction抽象类、BaseBroadcastProcessFunction抽象类。
两大抽象类:BroadcastProcessFunction、KeyedBroadcastProcessFunction,区别在于Keyed的函数只能应用与KeyedStream
- processElement:只能使用ReadOnlyContext只读上下文;这是因为在广播状态下,要求所有的算子上的广播状态完全一致,如果允许修改可能就导致状态可能不一致而出现不可预测的异常;另一方面平行算子无法通讯,因此在设计上也做不到广播更新。
- processBroadcastElement:支持使用可读写的上下文Context
/**
* A function to be applied to a
* {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream BroadcastConnectedStream} that
* connects {@link org.apache.flink.streaming.api.datastream.BroadcastStream BroadcastStream}, i.e. a stream
* with broadcast state, with a <b>non-keyed</b> {@link org.apache.flink.streaming.api.datastream.DataStream DataStream}.
*
* <p>The stream with the broadcast state can be created using the
* {@link org.apache.flink.streaming.api.datastream.DataStream#broadcast(MapStateDescriptor[])}
* stream.broadcast(MapStateDescriptor)} method.
*
* <p>The user has to implement two methods:
* <ol>
* <li>the {@link #processBroadcastElement(Object, Context, Collector)} which will be applied to
* each element in the broadcast side
* <li> and the {@link #processElement(Object, ReadOnlyContext, Collector)} which will be applied to the
* non-broadcasted/keyed side.
* </ol>
*
* <p>The {@code processElementOnBroadcastSide()} takes as argument (among others) a context that allows it to
* read/write to the broadcast state, while the {@code processElement()} has read-only access to the broadcast state.
*
* @param <IN1> The input type of the non-broadcast side.
* @param <IN2> The input type of the broadcast side.
* @param <OUT> The output type of the operator.
*/
@PublicEvolving
public abstract class BroadcastProcessFunction<IN1, IN2, OUT> extends BaseBroadcastProcessFunction {
private static final long serialVersionUID = 8352559162119034453L;
/**
* This method is called for each element in the (non-broadcast)
* {@link org.apache.flink.streaming.api.datastream.DataStream data stream}.
*
* <p>This function can output zero or more elements using the {@link Collector} parameter,
* query the current processing/event time, and also query and update the local keyed state.
* Finally, it has <b>read-only</b> access to the broadcast state.
* The context is only valid during the invocation of this method, do not store it.
*
* @param value The stream element.
* @param ctx A {@link ReadOnlyContext} that allows querying the timestamp of the element,
* querying the current processing/event time and updating the broadcast state.
* The context is only valid during the invocation of this method, do not store it.
* @param out The collector to emit resulting elements to
* @throws Exception The function may throw exceptions which cause the streaming program
* to fail and go into recovery.
*/
public abstract void processElement(final IN1 value, final ReadOnlyContext ctx, final Collector<OUT> out) throws Exception;
/**
* This method is called for each element in the
* {@link org.apache.flink.streaming.api.datastream.BroadcastStream broadcast stream}.
*
* <p>This function can output zero or more elements using the {@link Collector} parameter,
* query the current processing/event time, and also query and update the internal
* {@link org.apache.flink.api.common.state.BroadcastState broadcast state}. These can be done
* through the provided {@link Context}.
* The context is only valid during the invocation of this method, do not store it.
*
* @param value The stream element.
* @param ctx A {@link Context} that allows querying the timestamp of the element,
* querying the current processing/event time and updating the broadcast state.
* The context is only valid during the invocation of this method, do not store it.
* @param out The collector to emit resulting elements to
* @throws Exception The function may throw exceptions which cause the streaming program
* to fail and go into recovery.
*/
public abstract void processBroadcastElement(final IN2 value, final Context ctx, final Collector<OUT> out) throws Exception;
/**
* A {@link BaseBroadcastProcessFunction.Context context} available to the broadcast side of
* a {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream}.
*/
public abstract class Context extends BaseBroadcastProcessFunction.Context {}
/**
* A {@link BaseBroadcastProcessFunction.Context context} available to the non-keyed side of
* a {@link org.apache.flink.streaming.api.datastream.BroadcastConnectedStream} (if any).
*/
public abstract class ReadOnlyContext extends BaseBroadcastProcessFunction.ReadOnlyContext {}
}
异步函数
RichAsyncFunction抽象类实现AsyncFunction接口、继承与AbstractRichFunction获得了声明周期管理和RuntimeContext的访问能力。
AsyncFunction接口定义了2种行为,异步调用行为将结果封装到ResultFuture中,超时处理可以防止资源不释放
public interface AsyncFunction<IN, OUT> extends Function, Serializable {
/**
* Trigger async operation for each stream input.
*
* @param input element coming from an upstream task
* @param resultFuture to be completed with the result data
* @exception Exception in case of a user code error. An exception will make the task fail and
* trigger fail-over process.
*/
void asyncInvoke(IN input, ResultFuture<OUT> resultFuture) throws Exception;
/**
* {@link AsyncFunction#asyncInvoke} timeout occurred.
* By default, the result future is exceptionally completed with a timeout exception.
*
* @param input element coming from an upstream task
* @param resultFuture to be completed with the result data
*/
default void timeout(IN input, ResultFuture<OUT> resultFuture) throws Exception {
resultFuture.completeExceptionally(
new TimeoutException("Async function call has timed out."));
}
}
数据源函数
SourceFunction接口之定义了接口的业务相关行为,一般在使用上会继承下RichSourceFunction或者RichParallelSourceFunction,这2个抽象类则通过继承了AbstractRichFunction获得了Function的生命周期管理与访问RuntimeContext的能力。
这2个抽象类型的区别在于分别是实现了SourceFunction、ParallelSourceFunction,使得RichParallelSourceFunction拥有并行执行的能力
包括如下关键行为:
- 生命周期:一般的实现类都会集成AbstractRichFunction,所以可以包含生命周期中的:open、close、cancel3个方法
- 数据读取:可以根据不同的外部存储实现持续的数据读取,如:kafka
- 数据发送:没啥好说的
- 水位线的生成并向下游发送
- 空闲标记:如果未读取到数据,则标记task为空闲,会向下游发送Idel,阻止水位线向下游的传递
/**
* Base interface for all stream data sources in Flink. The contract of a stream source
* is the following: When the source should start emitting elements, the {@link #run} method
* is called with a {@link SourceContext} that can be used for emitting elements.
* The run method can run for as long as necessary. The source must, however, react to an
* invocation of {@link #cancel()} by breaking out of its main loop.
*
* <h3>CheckpointedFunction Sources</h3>
*
* <p>Sources that also implement the {@link org.apache.flink.streaming.api.checkpoint.CheckpointedFunction}
* interface must ensure that state checkpointing, updating of internal state and emission of
* elements are not done concurrently. This is achieved by using the provided checkpointing lock
* object to protect update of state and emission of elements in a synchronized block.
*
* <p>This is the basic pattern one should follow when implementing a checkpointed source:
*
* <pre>{@code
* public class ExampleCountSource implements SourceFunction<Long>, CheckpointedFunction {
* private long count = 0L;
* private volatile boolean isRunning = true;
*
* private transient ListState<Long> checkpointedCount;
*
* public void run(SourceContext<T> ctx) {
* while (isRunning && count < 1000) {
* // this synchronized block ensures that state checkpointing,
* // internal state updates and emission of elements are an atomic operation
* synchronized (ctx.getCheckpointLock()) {
* ctx.collect(count);
* count++;
* }
* }
* }
*
* public void cancel() {
* isRunning = false;
* }
*
* public void initializeState(FunctionInitializationContext context) {
* this.checkpointedCount = context
* .getOperatorStateStore()
* .getListState(new ListStateDescriptor<>("count", Long.class));
*
* if (context.isRestored()) {
* for (Long count : this.checkpointedCount.get()) {
* this.count = count;
* }
* }
* }
*
* public void snapshotState(FunctionSnapshotContext context) {
* this.checkpointedCount.clear();
* this.checkpointedCount.add(count);
* }
* }
* }</pre>
*
*
* <h3>Timestamps and watermarks:</h3>
* Sources may assign timestamps to elements and may manually emit watermarks.
* However, these are only interpreted if the streaming program runs on
* {@link TimeCharacteristic#EventTime}. On other time characteristics
* ({@link TimeCharacteristic#IngestionTime} and {@link TimeCharacteristic#ProcessingTime}),
* the watermarks from the source function are ignored.
*
* <h3>Gracefully Stopping Functions</h3>
* Functions may additionally implement the {@link org.apache.flink.api.common.functions.StoppableFunction}
* interface. "Stopping" a function, in contrast to "canceling" means a graceful exit that leaves the
* state and the emitted elements in a consistent state.
*
* <p>When a source is stopped, the executing thread is not interrupted, but expected to leave the
* {@link #run(SourceContext)} method in reasonable time on its own, preserving the atomicity
* of state updates and element emission.
*
* @param <T> The type of the elements produced by this source.
*
* @see org.apache.flink.api.common.functions.StoppableFunction
* @see org.apache.flink.streaming.api.TimeCharacteristic
*/
@Public
public interface SourceFunction<T> extends Function, Serializable {
/**
* Starts the source. Implementations can use the {@link SourceContext} emit
* elements.
*
* <p>Sources that implement {@link org.apache.flink.streaming.api.checkpoint.CheckpointedFunction}
* must lock on the checkpoint lock (using a synchronized block) before updating internal
* state and emitting elements, to make both an atomic operation:
*
* <pre>{@code
* public class ExampleCountSource implements SourceFunction<Long>, CheckpointedFunction {
* private long count = 0L;
* private volatile boolean isRunning = true;
*
* private transient ListState<Long> checkpointedCount;
*
* public void run(SourceContext<T> ctx) {
* while (isRunning && count < 1000) {
* // this synchronized block ensures that state checkpointing,
* // internal state updates and emission of elements are an atomic operation
* synchronized (ctx.getCheckpointLock()) {
* ctx.collect(count);
* count++;
* }
* }
* }
*
* public void cancel() {
* isRunning = false;
* }
*
* public void initializeState(FunctionInitializationContext context) {
* this.checkpointedCount = context
* .getOperatorStateStore()
* .getListState(new ListStateDescriptor<>("count", Long.class));
*
* if (context.isRestored()) {
* for (Long count : this.checkpointedCount.get()) {
* this.count = count;
* }
* }
* }
*
* public void snapshotState(FunctionSnapshotContext context) {
* this.checkpointedCount.clear();
* this.checkpointedCount.add(count);
* }
* }
* }</pre>
*
* @param ctx The context to emit elements to and for accessing locks.
*/
void run(SourceContext<T> ctx) throws Exception;
/**
* Cancels the source. Most sources will have a while loop inside the
* {@link #run(SourceContext)} method. The implementation needs to ensure that the
* source will break out of that loop after this method is called.
*
* <p>A typical pattern is to have an {@code "volatile boolean isRunning"} flag that is set to
* {@code false} in this method. That flag is checked in the loop condition.
*
* <p>When a source is canceled, the executing thread will also be interrupted
* (via {@link Thread#interrupt()}). The interruption happens strictly after this
* method has been called, so any interruption handler can rely on the fact that
* this method has completed. It is good practice to make any flags altered by
* this method "volatile", in order to guarantee the visibility of the effects of
* this method to any interruption handler.
*/
void cancel();
// ------------------------------------------------------------------------
// source context
// ------------------------------------------------------------------------
/**
* Interface that source functions use to emit elements, and possibly watermarks.
*
* @param <T> The type of the elements produced by the source.
*/
@Public // Interface might be extended in the future with additional methods.
interface SourceContext<T> {
/**
* Emits one element from the source, without attaching a timestamp. In most cases,
* this is the default way of emitting elements.
*
* <p>The timestamp that the element will get assigned depends on the time characteristic of
* the streaming program:
* <ul>
* <li>On {@link TimeCharacteristic#ProcessingTime}, the element has no timestamp.</li>
* <li>On {@link TimeCharacteristic#IngestionTime}, the element gets the system's
* current time as the timestamp.</li>
* <li>On {@link TimeCharacteristic#EventTime}, the element will have no timestamp initially.
* It needs to get a timestamp (via a {@link TimestampAssigner}) before any time-dependent
* operation (like time windows).</li>
* </ul>
*
* @param element The element to emit
*/
void collect(T element);
/**
* Emits one element from the source, and attaches the given timestamp. This method
* is relevant for programs using {@link TimeCharacteristic#EventTime}, where the
* sources assign timestamps themselves, rather than relying on a {@link TimestampAssigner}
* on the stream.
*
* <p>On certain time characteristics, this timestamp may be ignored or overwritten.
* This allows programs to switch between the different time characteristics and behaviors
* without changing the code of the source functions.
* <ul>
* <li>On {@link TimeCharacteristic#ProcessingTime}, the timestamp will be ignored,
* because processing time never works with element timestamps.</li>
* <li>On {@link TimeCharacteristic#IngestionTime}, the timestamp is overwritten with the
* system's current time, to realize proper ingestion time semantics.</li>
* <li>On {@link TimeCharacteristic#EventTime}, the timestamp will be used.</li>
* </ul>
*
* @param element The element to emit
* @param timestamp The timestamp in milliseconds since the Epoch
*/
@PublicEvolving
void collectWithTimestamp(T element, long timestamp);
/**
* Emits the given {@link Watermark}. A Watermark of value {@code t} declares that no
* elements with a timestamp {@code t' <= t} will occur any more. If further such
* elements will be emitted, those elements are considered <i>late</i>.
*
* <p>This method is only relevant when running on {@link TimeCharacteristic#EventTime}.
* On {@link TimeCharacteristic#ProcessingTime},Watermarks will be ignored. On
* {@link TimeCharacteristic#IngestionTime}, the Watermarks will be replaced by the
* automatic ingestion time watermarks.
*
* @param mark The Watermark to emit
*/
@PublicEvolving
void emitWatermark(Watermark mark);
/**
* Marks the source to be temporarily idle. This tells the system that this source will
* temporarily stop emitting records and watermarks for an indefinite amount of time. This
* is only relevant when running on {@link TimeCharacteristic#IngestionTime} and
* {@link TimeCharacteristic#EventTime}, allowing downstream tasks to advance their
* watermarks without the need to wait for watermarks from this source while it is idle.
*
* <p>Source functions should make a best effort to call this method as soon as they
* acknowledge themselves to be idle. The system will consider the source to resume activity
* again once {@link SourceContext#collect(T)}, {@link SourceContext#collectWithTimestamp(T, long)},
* or {@link SourceContext#emitWatermark(Watermark)} is called to emit elements or watermarks from the source.
*/
@PublicEvolving
void markAsTemporarilyIdle();
/**
* Returns the checkpoint lock. Please refer to the class-level comment in
* {@link SourceFunction} for details about how to write a consistent checkpointed
* source.
*
* @return The object to use as the lock
*/
Object getCheckpointLock();
/**
* This method is called by the system to shut down the context.
*/
void close();
}
}
SourceFunction中的SourceContext:StreamSourceContexts类中包含2大类的SourceContext
- NonTimestampContext:无时间,将全部元素的时间戳set为-1,这意味着永远不向下游发送水位线
- WatermarkContext:带时间,定义了与Watermark相关的行为
- 管理当前的StreamStatus,并向下游传递
- 空闲检查,当超过设定的事件间隔仍未收到数据或者水位线时,将task置为空闲
- AutomaticWatermarkContext:使用Ingestion time的时候,会自动生成水位线;原理是使用定时器(WatermarkEmittingTask),其触发时间=(作业启动时间戳+水位线周期*n),并持续的向下游发送水位线
- ManualWatermarkContext:使用event time的时候,不产生水位线,而是向下游透传上游传递来的水位线
private AutomaticWatermarkContext(
long now = this.timeService.getCurrentProcessingTime();
this.nextWatermarkTimer = this.timeService.registerTimer(now + watermarkInterval,
new WatermarkEmittingTask(this.timeService, checkpointLock, output));
}
private class WatermarkEmittingTask implements ProcessingTimeCallback {
private final ProcessingTimeService timeService;
private final Object lock;
private final Output<StreamRecord<T>> output;
private WatermarkEmittingTask(
ProcessingTimeService timeService,
Object checkpointLock,
Output<StreamRecord<T>> output) {
this.timeService = timeService;
this.lock = checkpointLock;
this.output = output;
}
@Override
public void onProcessingTime(long timestamp) {
final long currentTime = timeService.getCurrentProcessingTime();
synchronized (lock) {
// we should continue to automatically emit watermarks if we are active
if (streamStatusMaintainer.getStreamStatus().isActive()) {
if (idleTimeout != -1 && currentTime - lastRecordTime > idleTimeout) {
// if we are configured to detect idleness, piggy-back the idle detection check on the
// watermark interval, so that we may possibly discover idle sources faster before waiting
// for the next idle check to fire
markAsTemporarilyIdle();
// no need to finish the next check, as we are now idle.
cancelNextIdleDetectionTask();
} else if (currentTime > nextWatermarkTime) {
// align the watermarks across all machines. this will ensure that we
// don't have watermarks that creep along at different intervals because
// the machine clocks are out of sync
final long watermarkTime = currentTime - (currentTime % watermarkInterval);
output.emitWatermark(new Watermark(watermarkTime));
nextWatermarkTime = watermarkTime + watermarkInterval;
}
}
}
long nextWatermark = currentTime + watermarkInterval;
nextWatermarkTimer = this.timeService.registerTimer(
nextWatermark, new WatermarkEmittingTask(this.timeService, lock, output));
}
}
}
输出函数
SinkFunction是个单纯的数据输出函数,没有生命周期管理行为,生命周期由AbstractRichFunction实现。
我们在实现Sink的时候,基本上都是继承RichSinkFunction、TwoPhaseCommitSinkFunction,其中TwoPhaseCommitSinkFunction是Flink实现Exactly-Once语义的关键函数,提供框架级别的Exactly-Once实现,还会与checkpoint机制融合。
检查点函数
负责函数级别的状态保存与恢复,我们一般需要实现CheckpointedFunction、ListCheckpointed接口,状态快照的备份与恢复行为。
CheckpointedFunction:在状态保存之后会调用snapshotState(),可以将状态保存到外部存储;当状态恢复时initializeState可以初始化状态,执行从上一个checkpoint恢复状态的逻辑。
public interface CheckpointedFunction {
/**
* This method is called when a snapshot for a checkpoint is requested. This acts as a hook to the function to
* ensure that all state is exposed by means previously offered through {@link FunctionInitializationContext} when
* the Function was initialized, or offered now by {@link FunctionSnapshotContext} itself.
*
* @param context the context for drawing a snapshot of the operator
* @throws Exception
*/
void snapshotState(FunctionSnapshotContext context) throws Exception;
/**
* This method is called when the parallel function instance is created during distributed
* execution. Functions typically set up their state storing data structures in this method.
*
* @param context the context for initializing the operator
* @throws Exception
*/
void initializeState(FunctionInitializationContext context) throws Exception;
}
ListCheckpointed:则会更加强大,在修改作业并行度时,会提供状态重新分布的支持
public interface ListCheckpointed<T extends Serializable> {
/**
* Gets the current state of the function. The state must reflect the result of all prior
* invocations to this function.
*
* <p>The returned list should contain one entry for redistributable unit of state. See
* the {@link ListCheckpointed class docs} for an illustration how list-style state
* redistribution works.
*
* <p>As special case, the returned list may be null or empty (if the operator has no state)
* or it may contain a single element (if the operator state is indivisible).
*
* @param checkpointId The ID of the checkpoint - a unique and monotonously increasing value.
* @param timestamp The wall clock timestamp when the checkpoint was triggered by the master.
*
* @return The operator state in a list of redistributable, atomic sub-states.
* Should not return null, but empty list instead.
*
* @throws Exception Thrown if the creation of the state object failed. This causes the
* checkpoint to fail. The system may decide to fail the operation (and trigger
* recovery), or to discard this checkpoint attempt and to continue running
* and to try again with the next checkpoint attempt.
*/
List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
/**
* Restores the state of the function or operator to that of a previous checkpoint.
* This method is invoked when the function is executed after a failure recovery.
* The state list may be empty if no state is to be recovered by the particular parallel instance
* of the function.
*
* <p>The given state list will contain all the <i>sub states</i> that this parallel
* instance of the function needs to handle. Refer to the {@link ListCheckpointed class docs}
* for an illustration how list-style state redistribution works.
*
* <p><b>Important:</b> When implementing this interface together with {@link RichFunction},
* then the {@code restoreState()} method is called before {@link RichFunction#open(Configuration)}.
*
* @param state The state to be restored as a list of atomic sub-states.
*
* @throws Exception Throwing an exception in this method causes the recovery to fail.
* The exact consequence depends on the configured failure handling strategy,
* but typically the system will re-attempt the recovery, or try recovering
* from a different checkpoint.
*/
void restoreState(List<T> state) throws Exception;
}
数据分区
Partition
Flink作为流式计算框架,分布式计算是最核心的部分,简单的理解就是吧一个作业切分为子任务,将不同的数据交给不同的Task计算,即每个task计算一部分数据。
StreamPartitioner是数据流分区的抽象接口,它的行为决定了数据分发的模式。
ChannelSelector是负载均衡的关键,所有的数据分区器都实现了它,它的行为决定了负载均衡的模式。
selectChannels方法可以知道下游的通道数量,通道数量在一次作业中是固定的,除非我们修改的并行度。
/**
* The {@link ChannelSelector} determines to which logical channels a record
* should be written to.
*
* @param <T> the type of record which is sent through the attached output gate
*/
public interface ChannelSelector<T extends IOReadableWritable> {
/**
* Returns the logical channel indexes, to which the given record should be
* written.
*
* @param record the record to the determine the output channels for
* @param numChannels the total number of output channels which are attached to respective output gate
* @return a (possibly empty) array of integer numbers which indicate the indices of the output channels through
* which the record shall be forwarded
*/
int[] selectChannels(T record, int numChannels);
}
常用的数据分区方式:
- partitionCustom:DataStream的自定义分区,为每个原始选择目标分区,它将生成一个新的DataStream
- ForwardPartitioner:上游算子数据直接转发给下游算子,它将生成一个新的DataStream
- ShufflePartitioner:随机的选择
- RebalancePartitioner:轮训的方式向下游发送数据,避免数据倾斜
- RescalePartitioner:根据上下游task的数量进行分区
- BroadcastPartitioner:广播方式
- KeyGroupStreamPartitioner:KeyedStream根据key的分组进行分区
连接器
没啥好说的,就是与外部数据产品对接。
分布式ID
分布式框架为了跨网络进行传递数据,需要对各种对象生成序列号。
/**
* A statistically unique identification number.
*/
@PublicEvolving
public class AbstractID implements Comparable<AbstractID>, java.io.Serializable {
private static final long serialVersionUID = 1L;
private static final Random RND = new Random();
/** The size of a long in bytes. */
private static final int SIZE_OF_LONG = 8;
/** The size of the ID in byte. */
public static final int SIZE = 2 * SIZE_OF_LONG;
// ------------------------------------------------------------------------
/** The upper part of the actual ID. */
protected final long upperPart;
/** The lower part of the actual ID. */
protected final long lowerPart;
/** The memoized value returned by toString(). */
private transient String toString;
// --------------------------------------------------------------------------------------------
/**
* Constructs a new ID with a specific bytes value.
*/
public AbstractID(byte[] bytes) {
if (bytes == null || bytes.length != SIZE) {
throw new IllegalArgumentException("Argument bytes must by an array of " + SIZE + " bytes");
}
this.lowerPart = byteArrayToLong(bytes, 0);
this.upperPart = byteArrayToLong(bytes, SIZE_OF_LONG);
}
/**
* Constructs a new abstract ID.
*
* @param lowerPart the lower bytes of the ID
* @param upperPart the higher bytes of the ID
*/
public AbstractID(long lowerPart, long upperPart) {
this.lowerPart = lowerPart;
this.upperPart = upperPart;
}
/**
* Copy constructor: Creates a new abstract ID from the given one.
*
* @param id the abstract ID to copy
*/
public AbstractID(AbstractID id) {
if (id == null) {
throw new IllegalArgumentException("Id must not be null.");
}
this.lowerPart = id.lowerPart;
this.upperPart = id.upperPart;
}
/**
* Constructs a new random ID from a uniform distribution.
*/
public AbstractID() {
this.lowerPart = RND.nextLong();
this.upperPart = RND.nextLong();
}
// --------------------------------------------------------------------------------------------
/**
* Gets the lower 64 bits of the ID.
*
* @return The lower 64 bits of the ID.
*/
public long getLowerPart() {
return lowerPart;
}
/**
* Gets the upper 64 bits of the ID.
*
* @return The upper 64 bits of the ID.
*/
public long getUpperPart() {
return upperPart;
}
/**
* Gets the bytes underlying this ID.
*
* @return The bytes underlying this ID.
*/
public byte[] getBytes() {
byte[] bytes = new byte[SIZE];
longToByteArray(lowerPart, bytes, 0);
longToByteArray(upperPart, bytes, SIZE_OF_LONG);
return bytes;
}
// --------------------------------------------------------------------------------------------
// Standard Utilities
// --------------------------------------------------------------------------------------------
@Override
public boolean equals(Object obj) {
if (obj == this) {
return true;
} else if (obj != null && obj.getClass() == getClass()) {
AbstractID that = (AbstractID) obj;
return that.lowerPart == this.lowerPart && that.upperPart == this.upperPart;
} else {
return false;
}
}
@Override
public int hashCode() {
return ((int) this.lowerPart) ^
((int) (this.lowerPart >>> 32)) ^
((int) this.upperPart) ^
((int) (this.upperPart >>> 32));
}
@Override
public String toString() {
if (this.toString == null) {
final byte[] ba = new byte[SIZE];
longToByteArray(this.lowerPart, ba, 0);
longToByteArray(this.upperPart, ba, SIZE_OF_LONG);
this.toString = StringUtils.byteToHexString(ba);
}
return this.toString;
}
@Override
public int compareTo(AbstractID o) {
int diff1 = Long.compare(this.upperPart, o.upperPart);
int diff2 = Long.compare(this.lowerPart, o.lowerPart);
return diff1 == 0 ? diff2 : diff1;
}
// --------------------------------------------------------------------------------------------
// Conversion Utilities
// --------------------------------------------------------------------------------------------
/**
* Converts the given byte array to a long.
*
* @param ba the byte array to be converted
* @param offset the offset indicating at which byte inside the array the conversion shall begin
* @return the long variable
*/
private static long byteArrayToLong(byte[] ba, int offset) {
long l = 0;
for (int i = 0; i < SIZE_OF_LONG; ++i) {
l |= (ba[offset + SIZE_OF_LONG - 1 - i] & 0xffL) << (i << 3);
}
return l;
}
/**
* Converts a long to a byte array.
*
* @param l the long variable to be converted
* @param ba the byte array to store the result the of the conversion
* @param offset offset indicating at what position inside the byte array the result of the conversion shall be stored
*/
private static void longToByteArray(long l, byte[] ba, int offset) {
for (int i = 0; i < SIZE_OF_LONG; ++i) {
final int shift = i << 3; // i * 8
ba[offset + SIZE_OF_LONG - 1 - i] = (byte) ((l & (0xffL << shift)) >>> shift);
}
}
}
总结
- 面向开发者的抽象
- 核心运行时的抽象
- 数据流与操作抽象
- 数据转换抽象
- 算子、函数、数据分区的抽象
- 数据IO的抽象