【Flink】【第八章】 ProcessFunctionAPI

1.ProcessFunction介绍

1.1 API上的说明

A function that processes elements of a stream.

For every element in the input stream processElement(Object, 
ProcessFunction.Context, Collector) is invoked. This can produce 
zero or more elements as output. 

Implementations can also query the time and set timers through the 
provided ProcessFunction.Context. 

For firing timers onTimer(long, ProcessFunction.OnTimerContext, 
Collector) will be invoked. This can again produce zero or more 
elements as output and register further timers.


NOTE: Access to keyed state and timers (which are also scoped to a key)
 is only available if the ProcessFunction is applied on a KeyedStream.

NOTE: A ProcessFunction is always a 
org.apache.flink.api.common.functions.RichFunction. Therefore, 
access to the org.apache.flink.api.common.functions.RuntimeContext is 
always available and setup and teardown methods can be implemented.
 See 
org.apache.flink.api.common.functions.RichFunction.open(org.apache.flink.configuration.Configuration)
 and org.apache.flink.api.common.functions.RichFunction.close().

(1)ProcessFunction是一个函数用于处理流中的元素
(2)processElement()方法用于处理每一个元素,并且可以输出0到多个输出
(3)通过ProcessFunction.Context. 获取闹钟和设置闹钟
(4)闹钟响的时候,OnTimer()方法会执行,在这个方法中可以输出0到n个输出,并且可以再次设定闹钟
注意事项:
(1)闹钟的功能仅对KeyedStream有效
(2)ProcessFunction本质上也是RichFunction,因此也可以使用状态编程和生命周期方法

1.2 类结构查看

在这里插入图片描述
在这里插入图片描述

1.3 ProcessFunction应用场景

由于ProcessFunction是AbstractRichFunction子抽象类,因此RichFunction可以使用的场景,ProcessFunction都可以使用

richFunction的使用场景:

  1. 第三方写库
  2. 获取上下文环境,进行状态编程

ProcessFunction独特的使用场景:

  1. 定时器
  2. 侧输出流

processFunction只能做数据处理,不能用于定义传输和开窗(keyBy和window做不了)

额外说明: Flink SQL就是使用Process Function实现的。


1.4 8个ProcessFunction

Flink提供了8个Process Function,每一个Process Function都是针对不同的Stream使用的。
所有的ProcessFunction都是Stream的process算子的参数

  • ProcessFunction ( DataStream通用的)
  • KeyedProcessFunction ( KeyedStream)
  • CoProcessFunction (ConnectStream)
  • ProcessJoinFunction (JoinStream)
  • BroadcastProcessFunction (广播流)
  • KeyedBroadcastProcessFunction
  • ProcessWindowFunction (KeyBy后的windowStream的WindowFunc)
  • ProcessAllWindowFunction (DataStream的windowFunc)

2. ProcessFunction功能展示

package No08_process;

import org.apache.flink.api.common.functions.RuntimeContext;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;

public class _01_ProcessFunction功能展示 {
    public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> source = env.socketTextStream("hadoop102", 9999);
    }
    public static class MyProcessFunc extends ProcessFunction<String,String> {
        @Override
        public void open(Configuration parameters) throws Exception {
            //TODO 功能1 获取运行时上下文  用于状态编程   (此方法继承自RichFunction)
            RuntimeContext runtimeContext = getRuntimeContext();
            // 可以获取状态
            //runtimeContext.getState();
        }

        @Override
        public void close() throws Exception {
            super.close();
        }

        @Override
        //TODO 针对DataStream中的每一个Element调用此方法,返回值是void,可以自己决定是否有输出
        // 如果要输出,用collector输出
        public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
            // todo 功能3-out往主流输出
            out.collect(" ");

            //todo 功能2-ctx 获取处理时间、注册处理时间定时器,删除处理时间定时器
            ctx.timerService().currentProcessingTime();
            ctx.timerService().registerProcessingTimeTimer(1L);
            ctx.timerService().deleteProcessingTimeTimer(1L);
            //todo 功能2 获取事件时间、注册事件时间定时器,删除事件时间定时器
            ctx.timerService().currentWatermark();
            ctx.timerService().registerEventTimeTimer(1L);
            ctx.timerService().deleteEventTimeTimer(1L);

            //todo 功能4-ctx: 往侧输出流写出  ctx.output
            //ctx.output(new OutputTag<String>("outPutTag"){},value);


        }

        @Override
        //todo 功能5: 指定定时器触发时任务执行
        // ctx 还可以再次定闹钟
        //out 往主流输出
        public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
            super.onTimer(timestamp, ctx, out);
        }
    }

}

1.定时器

Context和OnTimerContext所持有的TimerService对象拥有以下方法:

  • currentProcessingTime(): Long 返回当前处理时间
  • currentWatermark(): Long 返回当前watermark的时间戳
  • registerProcessingTimeTimer(timestamp: Long): Unit 会注册当前key的processing time的定时器。当processing time到达定时时间时,触发timer。
  • registerEventTimeTimer(timestamp: Long): Unit 会注册当前key的event time 定时器。当水位线大于等于定时器注册的时间时,触发定时器执行回调函数。
  • deleteProcessingTimeTimer(timestamp: Long): Unit 删除之前注册处理时间定时器。如果没有这个时间戳的定时器,则不执行。
  • deleteEventTimeTimer(timestamp: Long): Unit 删除之前注册的事件时间定时器,如果没有此时间戳的定时器,则不执行。

当定时器timer触发时,会执行回调函数onTimer()。注意定时器timer只能在keyed streams上面使用。

package No08_process;

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class _03_定时器 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<String> source = env.socketTextStream("hadoop102", 9999);


        SingleOutputStreamOperator<String> res = source.keyBy(new KeySelector<String, String>() {
            @Override
            public String getKey(String s) throws Exception {
                return s;
            }
        }).process(new MyOnTimerProcessFunc());

        //定时器功能只能用于KeyedStream

        res.print();

        env.execute();
    }


    //todo 实现处理完当前数据两秒后输出一条数据
    public static class MyOnTimerProcessFunc extends KeyedProcessFunction<String,String,String>{

        @Override
        public void processElement(String value,Context ctx, Collector<String> out) throws Exception {
            out.collect(value);

            //注册两秒后的闹钟
            ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime() + 2000L);
        }


        //闹钟响了 定时任务
        @Override
        public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
            System.out.println("定时器触发了");
        }
    }


}

2.侧输出流

大部分的DataStream API的算子的输出是单一输出,也就是某种数据类型的流。除了split算子,可以将一条流分成多条流,这些流的数据类型也都相同。process function的side outputs功能可以产生多条流,并且这些流的数据类型可以不一样。一个side output可以定义为OutputTag[X]对象,X是输出流的数据类型。process function可以通过Context对象发送一个事件到一个或者多个side outputs。

package No08_process;

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;

public class _04_侧输出流 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> source = env.socketTextStream("hadoop102", 9999);
        KeyedStream<String, String> keyedStream = source.keyBy(new KeySelector<String, String>() {
            @Override
            public String getKey(String s) throws Exception {
                return s;
            }
        });

        //todo 将温度小于30度的输出到侧输出流,大于30度的输出的主流
        SingleOutputStreamOperator<String> result = keyedStream.process(new mySplit());

        result.print("high");
        result.getSideOutput(new OutputTag<Tuple2<String,Double>>("<30"){}).print("sideOut");
        env.execute();

    }
    public static class mySplit extends KeyedProcessFunction<String,String,String> {

        @Override
        public void processElement(String value, Context ctx, Collector<String> out) throws Exception {
            //todo 获取温度
            String[] fields = value.split(",");
            double temp = Double.parseDouble(fields[2]);

            if(temp >= 30){
                out.collect(value);
            }else{
                //测输出流的数据类型没有限定,输出时定义
                ctx.output(new OutputTag<Tuple2<String,Double>>("<30"){},new Tuple2<String,Double>(fields[0],temp));
            }
            //todo 说明:官方之所以推荐用这种方式对流进行split,是因为侧输出流可以和主流的数据类型不同

        }
    }

}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值