从数据输入到数据处理——OneInputStreamOperator & AbstractUdfStreamOperator
StreamSource是用来开启整个流的算子,而承接输入数据并进行处理的算子就是OneInputStreamOperator、TwoInputStreamOperator等。
整个StreamOperator的继承关系如上图所示(图很大,建议点开放大看)。
OneInputStreamOperator这个接口的逻辑很简单:
public interface OneInputStreamOperator<IN, OUT> extends StreamOperator<OUT> {
/**
* Processes one element that arrived at this operator.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*/
void processElement(StreamRecord<IN> element) throws Exception;
/**
* Processes a {@link Watermark}.
* This method is guaranteed to not be called concurrently with other methods of the operator.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
void processWatermark(Watermark mark) throws Exception;
void processLatencyMarker(LatencyMarker latencyMarker) throws Exception;
}
而实现了这个接口的StreamFlatMap算子也很简单,没什么可说的:
public class StreamFlatMap<IN, OUT>
extends AbstractUdfStreamOperator<OUT, FlatMapFunction<IN, OUT>>
implements OneInputStreamOperator<IN, OUT> {
private static final long serialVersionUID = 1L;
private transient TimestampedCollector<OUT> collector;
public StreamFlatMap(FlatMapFunction<IN, OUT> flatMapper) {
super(flatMapper);
chainingStrategy = ChainingStrategy.ALWAYS;
}
@Override
public void open() throws Exception {
super.open();
collector = new TimestampedCollector<>(output);
}
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
collector.setTimestamp(element);
userFunction.flatMap(element.getValue(), collector);
}
}
从类图里可以看到,flink为我们封装了一个算子的基类AbstractUdfStreamOperator,提供了一些通用功能,比如把context赋给算子,保存快照等等,其中最为大家了解的应该是这两个:
@Override
public void open() throws Exception {
super.open();
FunctionUtils.openFunction(userFunction, new Configuration());
}
@Override
public void close() throws Exception {
super.close();
functionsClosed = true;
FunctionUtils.closeFunction(userFunction);
}
这两个就是flink提供的Rich***Function系列算子的open和close方法被执行的地方。
4.3 StreamSink
StreamSink着实没什么可说的,逻辑很简单,值得一提的只有两个方法:
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
sinkContext.element = element;
userFunction.invoke(element.getValue(), sinkContext);
}
@Override
protected void reportOrForwardLatencyMarker(LatencyMarker maker) {
// all operators are tracking latencies
this.latencyGauge.reportLatency(maker, true);
// sinks don't forward latency markers
}
其中,processElement 是继承自StreamOperator的方法。reportOrForwardLatencyMarker是用来计算延迟的,前面提到StreamSource会产生LateMarker,用于记录数据计算时间,就是在这里完成了计算。
算子这部分逻辑相对简单清晰,就讲这么多吧。