Flink:ProcessFunction源码分析

本文详细分析了Flink的Process Function家族,包括KeyedProcessFunction、ProcessWindowFunction、CoProcessFunction、ProcessJoinFunction和BroadcastProcessFunction的源码。通过示例展示了它们在数据流处理、窗口聚合、双流处理、流连接和广播处理中的应用。
摘要由CSDN通过智能技术生成

源码版本

flink-release-1.11.0
代码位置 org.apache.flink.streaming.api.functions

Flink提供了8个Process Function:
ProcessFunction:dataStream
KeyedProcessFunction:用于KeyedStream,keyBy之后的流处理
CoProcessFunction:用于connect连接的流
ProcessJoinFunction:用于join流操作
BroadcastProcessFunction:用于广播
KeyedBroadcastProcessFunction:keyBy之后的广播
ProcessWindowFunction:窗口增量聚合
ProcessAllWindowFunction:全窗口聚合

KeyedProcessFunction和ProcessFunction源码分析

KeyedProcessFunction和ProcessFunction源码类似,此处只做KeyedProcessFunction分析
在这里插入图片描述KeyedProcessFunction结构
ProcessFunctionProcessFunction结构

KeyedProcessFunction类结构
1 Context
调用{
   @link #processElement(ObjectContextCollector}{
   @link #onTimer(longOnTimerContextCollector}时可用的信息。
2 OnTimerContext
调用{
   @link #onTimer(longOnTimerContextCollector}可获得的信息。
3 processElement
处理输入流中的一个元素。此函数可以使用{
   @link Collector}参数输出零个或多个元素,并使用{
   @link Context}参数更新内部状态或设置计时器。 
4 onTimer
在使用{
   @link TimerService}设置的计时器触发时调用。
KeyedProcessFunction用来操作KeyedStream 
KeyedProcessFunction会处理流的每一个元素(每条数据来了之后都可以处理、过程处理函数),输出为0个、1个或者多个元素。
所有的 Process Function 都继承自RichFunction接口(富函数,它可以有各种生命周期、状态的一些操作,获取watermark、定义闹钟定义定时器等),
所以都有open()close()getRuntimeContext() 等方法。
而KeyedProcessFunction[KEY, IN, OUT] 还额外提供了两个方法:
 ①.processElement(I value, Context ctx, Collector<O> OUt), 流中的每一个元素都会调用这个方法,调用结果将会放在Collector数据类型中输出。 
    Context可以访问元素的时间戳,元素的key,以及TimerService时间服务。Context还可以将结果输出到别的流(side outputs)
  ②.onTimer( long timestamp, OnTimerContext ctx, Collector<O> OUT )是一个回调函数。当之前注册的定时器触发时调用(定时器触发时候的操作)。
    参数timestamp为定时器所设定的触发的时间戳。Collector为输出结果的集合。OnTimerContext和processElement的Context 参数一样,提供了上下文的一些信息,
    例如定时器触发的时间信息: 事件时间或者处理时间 。
TimerService 和 定时器 Timers
ContextOnTimerContext所持有的TimerService对象拥有以下方法:

  long currentProcessingTime() 返回当前处理时间
  long currentWatermark() 返回当前watermark的时间戳
  void registerProcessingTimeTimer(long timestamp) 会注册当前key的processing time的定时器。当processing time到达定时时间时,触发timer。
  void registerEventTimeTimer(long timestamp) 会注册当前key的event time 定时器。当水位线大于等于定时器注册的时间时,触发定时器执行回调函数。
  void deleteProcessingTimeTimer(long timestamp) 删除之前注册处理时间定时器。如果没有这个时间戳的定时器,则不执行。
  void deleteEventTimeTimer(long timestamp) 删除之前注册的事件时间定时器,如果没有此时间戳的定时器,则不执行。

当定时器timer触发时,会执行回调函数onTimer()。注意定时器timer只能在keyed streams上面使用。
KeyedProcessFunction源码如下
package org.apache.flink.streaming.api.functions;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.functions.AbstractRichFunction;
import org.apache.flink.streaming.api.TimeDomain;
import org.apache.flink.streaming.api.TimerService;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;

/**
 * A keyed function that processes elements of a stream.
 *
 * <p>For every element in the input stream {@link #processElement(Object, Context, Collector)}
 * is invoked. This can produce zero or more elements as output. Implementations can also
 * query the time and set timers through the provided {@link Context}. For firing timers
 * {@link #onTimer(long, OnTimerContext, Collector)} will be invoked. This can again produce
 * zero or more elements as output and register further timers.
 *
 * <p><b>NOTE:</b> Access to keyed state and timers (which are also scoped to a key) is only
 * available if the {@code KeyedProcessFunction} is applied on a {@code KeyedStream}.
 *
 * <p><b>NOTE:</b> A {@code KeyedProcessFunction} is always a
 * {@link org.apache.flink.api.common.functions.RichFunction}. Therefore, access to the
 * {@link org.apache.flink.api.common.functions.RuntimeContext} is always available and setup and
 * teardown methods can be implemented. See
 * {@link org.apache.flink.api.common.functions.RichFunction#open(org.apache.flink.configuration.Configuration)}
 * and {@link org.apache.flink.api.common.functions.RichFunction#close()}.
 *
 * @param <K> Type of the key. 键数据类型
 * @param <I> Type of the input elements. 输入元素的数据类型
 * @param <O> Type of the output elements. 输出结果的数据类型
 */
@PublicEvolving
public abstract class KeyedProcessFunction<K, I, O> extends AbstractRichFunction {
   

	private static final long serialVersionUID = 1L;

	/**
	 * Process one element from the input stream.
	 *
	 * <p>This function can output zero or more elements using the {@link Collector} parameter
	 * and also update internal state or set timers using the {@link Context} parameter.
	 *
	 * @param value The input value.
	 * @param ctx A {@link Context} that allows querying the timestamp of the element and getting
	 *            a {@link TimerService} for registering timers and querying the time. The
	 *            context is only valid during the invocation of this method, do not store it.
	 * @param out The collector for returning result values.
	 *
	 * @throws Exception This method may throw exceptions. Throwing an exception will cause the operation
	 *                   to fail and may trigger recovery.
	 */
	public abstract void processElement(I value, Context ctx, Collector<O> out) throws Exception;

	/**
	 * Called when a timer set using {@link TimerService} fires.
	 *
	 * @param timestamp The timestamp of the firing timer.
	 * @param ctx An {@link OnTimerContext} that allows querying the timestamp, the {@link TimeDomain}, and the key
	 *            of the firing timer and getting a {@link TimerService} for registering timers and querying the time.
	 *            The context is only valid during the invocation of this method, do not store it.
	 * @param out The collector for returning result values.
	 *
	 * @throws Exception This method may throw exceptions. Throwing an exception will cause the operation
	 *                   to fail and may trigger recovery.
	 */
	public void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception {
   }

	/**
	 * Information available in an invocation of {@link #processElement(Object, Context, Collector)}
	 * or {@link #onTimer(long, OnTimerContext, Collector)}.
	 */
	public abstract class Context {
   

		/**
		   当前正在处理的元素的时间戳或触发计时器的时间戳
		 * Timestamp of the element currently being processed or timestamp of a firing timer.
		 *
		 * <p>This might be {@code null}, for example if the time characteristic of your program
		 * is set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
		 */
		public abstract Long timestamp();

		/**
		 * A {@link TimerService} for querying time and registering timers.
		 */
		public abstract TimerService timerService();

		/**
		 * Emits a record to the side output identified by the {@link OutputTag}.
		 *
		 * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
		 * @param value The record to emit.
		 */
		public abstract <X> void output(OutputTag<X> outputTag, X value);

		/**
		 * Get key of the element being processed.
		 */
		public abstract K getCurrentKey();
	}

	/**
	 * Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}.
	 */
	public abstract class OnTimerContext extends Context {
   
		/**
		 * The {@link TimeDomain} of the firing timer.
		 */
		public abstract TimeDomain timeDomain();

		/**
		 * Get key of the firing timer.
		 */
		@Override
		public abstract K getCurrentKey();
	}

}

KeyedProcessFunction用法示例
示例一
负责维护状态的类
public class CountWithTimestampState {
   
    private String key;
    private long count;
    private long lastModified;

    public CountWithTimestampState() {
   
    }

    public CountWithTimestampState(String key, long count, long lastModified) {
   
        this.key = key;
        this.count = count;
        this.lastModified = lastModified;
    }

    public String getKey() {
   
        return key;
    }

    public void setKey(String key) {
   
        this.key = key;
    }

    public long getCount() {
   
        return count;
    }

    public void setCount(long count) {
   
        this.count = count;
    }

    public long getLastModified() {
   
        return lastModified;
    }

    public void setLastModified(long lastModified) {
   
        this.lastModified = lastModified;
    }

    @Override
    public String toString() {
   
        return "CountWithTimestampState{" +
                "key='" + key + '\'' +
                ", count=" + count +
                ", lastModified=" + lastModified +
                '}';
    }
}
输入元素数据类
public class WordWithCount {
   
    private String key;
    private long count;

    public WordWithCount() {
   
    }

    public WordWithCount(String key, long count) {
   
        this.key = key;
        this.count = count;
    }

    public String getKey() {
   
        return key;
    }

    public void setKey(String key) {
   
        this.key = key;
    }

    public long getCount() {
   
        return count;
    }

    public void setCount(long count) {
   
        this.count = count;
    }

    @Override
    public String toString() {
   
        return "WordWithCount{" +
                "key='" + key + '\'' +
                ", count=" + count +
                '}';
    }
}
处理函数
import com.scallion.bean.CountWithTimestampState;
import com.scallion.bean.WordWithCount;
import com.scallion.utils.TimeUtil;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值