flink_sideOutput侧流输出

sideOutput

对于当前处理流中的部分或者全部数据单独输出到另外的流中

作用:

  1. 数据选择

    ​ 对于当前处理的流可以进行自定义选择然后输出到独立的侧流中

  2. 异常数据保留

    对于异常数据,输出到独立的侧流中,进行单独处理。如窗口中过期的延迟数据保留。

使用心法

可在如下算子中使用(别的方法貌似没有 Context 对象提供):

process方法提供了context 对象, 其output(OutputTag outputTag, X value);方法可收集需要输出到侧流的数据,并使用该streamdateData.getSideOutput(outputTag);方法可获取到该侧流数据。

DataStream<Integer> input = ...;

//定义输出标签
final OutputTag<String> outputTag = new OutputTag<String>("side-output"){};

SingleOutputStreamOperator<Integer> mainDataStream = input
  .process(new ProcessFunction<Integer, Integer>() {

      @Override
      public void processElement(
          Integer value,
          Context ctx,
          Collector<Integer> out) throws Exception {
        // 发送数据到主要的输出
        out.collect(value);

        // 发送数据到旁路输出
        ctx.output(outputTag, "sideout-" + String.valueOf(value));
      }
    });

//获取上面收集的侧流
DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);

数据选择应用(分流)

可以将当前处理流中的数据输出到多个侧流中

e.g:

将数据源中格式化失败的数据输出到 formatExceptionData 侧流中;将窗口中为偶数的数据输出到dayle_data2 流中。

public class TransformOperator {

    //定义侧流标签
    private static OutputTag<Tuple2<String, Integer>> outputTag = new OutputTag<Tuple2<String, Integer>>("dayle_data2"){};

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        DataStreamSource<Tuple2<String, Integer>> dataStreams = env
                .addSource(new SourceFunction<Tuple2<String, Integer>>() {
                    Random random = new Random();
                    int tp2 = 0;
                    int speed = 0;
                    int f1 = -1;

                    @Override
                    public void run(SourceContext<Tuple2<String, Integer>> ctx) throws Exception {
                        while (true) {
                            TimeUnit.SECONDS.sleep(1);
                            tp2 = Math.abs(random.nextInt() % 7);
                            f1 = Math.abs(++speed + tp2);
                            ctx.collect(Tuple2.of("a", f1));
                            System.out.println("source generator :\t" + f1);
                        }
                    }

                    @Override
                    public void cancel() {

                    }
                });

        //延迟数据触发边界测试
        //必须为匿名内部类
        OutputTag<String> erroOutputTag = new OutputTag<String>("formatExceptionData"){} ;

        SingleOutputStreamOperator<Tuple2<String, Integer>> socketTextStream = env
                .socketTextStream("10.164.29.143", 10086)
                .process(new FlatMapProcess(erroOutputTag));

        SingleOutputStreamOperator<Tuple2<String, Integer>> tuple2Stream = ReduceWindowPrint(socketTextStream, 5);

        //获取测流
        socketTextStream.getSideOutput(erroOutputTag).print();
        tuple2Stream.getSideOutput(outputTag).print();

        //正常处理的主流
        tuple2Stream.print();

        env.execute("event time process ");
    }

    private static SingleOutputStreamOperator<Tuple2<String, Integer>> ReduceWindowPrint(SingleOutputStreamOperator<Tuple2<String, Integer>> dataStreams, int dayleTime) {
        return dataStreams
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String,Integer>>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                                .withTimestampAssigner((e, t) -> e.f1*1000)
                                .withIdleness(Duration.ofSeconds(10))
                )
                .keyBy(e -> e.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .process(new ProcessWindowFunction<Tuple2<String,Integer>, Tuple2<String,Integer>, String, TimeWindow>() {

                    int i;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);

                        i = 0;
                    }

                    @Override
                    public void process(String s, Context context, Iterable<Tuple2<String, Integer>> elements, Collector<Tuple2<String, Integer>> out) throws Exception {
                        Iterator<Tuple2<String, Integer>> iterator = elements.iterator();
                        while (iterator.hasNext()) {
                            Tuple2<String, Integer> next = iterator.next();
                            out.collect(next);
                            if (next.f1%2 == 0) {
                                //收集数据到侧流
                                context.<Tuple2<String, Integer>>output(outputTag,Tuple2.of("side-" + next.f0, next.f1));
                            }
                        }
                        System.out.println("=======================  ^^^ window_num:\t" + i++ + " ^^^ ==================");
                        System.out.println("======================= range: [" + context.window().getStart() + "___" + context.window().getEnd() + " )");
                        System.out.println("======================= key: " + s );
                        System.out.println("======================= watermark: " + context.currentWatermark());
                        System.out.println("======================================================");
                    }
                });
    }
}

class FlatMapProcess extends ProcessFunction<String, Tuple2<String, Integer>> {

    private OutputTag<String> outputTag;

    public FlatMapProcess(OutputTag<String> outputTag) {
        this.outputTag = outputTag;
    }

    @Override
    public void processElement(String value, Context ctx, Collector<Tuple2<String, Integer>> out) throws Exception {
        try {
            String[] infos = value.split("\\s+");
            if (infos.length == 2) {
                out.collect(Tuple2.of(infos[0], Integer.valueOf(infos[1])));
            } else {
                ctx.output(outputTag, "format exception format_:\t" + value);
            }
        } catch (NumberFormatException e) {
            ctx.output(outputTag, "format exception format_:\t" + value);
        }
    }
}

异常数据保留应用

e.g: 将窗口中的不在窗口范围的异常数据输出到daly_dataStream中

心法

1.定义outputTag

2.设置窗口数据允许延迟时间

3.获取延迟数据

代码

public class TransformOperator {

    //定义侧流标签 必须为匿名内部类
    //输入数据为偶数的数据
    private static OutputTag<Tuple2<String, Integer>> outputTag = new OutputTag<Tuple2<String, Integer>>("dayle_data2"){};
    //解析错误的数据
    private static OutputTag<Tuple2<String, Integer>> sideOutput_dayle_data1 = new OutputTag<Tuple2<String, Integer>>("sideOutput_dayle_data1"){};

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);

        //必须为匿名内部类
        OutputTag<String> erroOutputTag = new OutputTag<String>("formatExceptionData"){} ;

        SingleOutputStreamOperator<Tuple2<String, Integer>> socketTextStream = env
                .socketTextStream("10.164.29.143", 10086)
                //解析输入数据并将异常数据输出到侧流
                .process(new FlatMapProcess(erroOutputTag));

        SingleOutputStreamOperator<Tuple2<String, Integer>> tuple2Stream = ReduceWindowPrint(socketTextStream, 5);

        //败获取源数据解析失的侧流
        socketTextStream.getSideOutput(erroOutputTag).map((MapFunction<String, String>) e -> "parseException:\t" + e).print();
        //获取迟到的数据
        tuple2Stream.getSideOutput(sideOutput_dayle_data1).map(new MapFunction<Tuple2<String,Integer>, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(Tuple2<String, Integer> e) throws Exception {
                return Tuple2.of("lateData:\t"+e.f0, e.f1);
            }
        }).print();
        //获取值为偶数的输入数据
        tuple2Stream.getSideOutput(outputTag).map(new MapFunction<Tuple2<String,Integer>, Tuple2<String, Integer>>() {
            @Override
            public Tuple2<String, Integer> map(Tuple2<String, Integer> value) throws Exception {
                return Tuple2.of("o_"+value.f0, value.f1);
            }
        }).print();

        //正常处理的主流
        tuple2Stream.print();

        env.execute("event time process ");
    }

    /**
     *
     * @param dataStreams   Tuple2<String, Integer>的数据流
     * @param dayleTime     watermark的延迟数据
     * @return
     */
    private static SingleOutputStreamOperator<Tuple2<String, Integer>> ReduceWindowPrint(SingleOutputStreamOperator<Tuple2<String, Integer>> dataStreams, int dayleTime) {

        return dataStreams
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String,Integer>>forBoundedOutOfOrderness(Duration.ofSeconds(dayleTime))
                                .withTimestampAssigner((e, t) -> e.f1*1000)
                                //多并行下如果有窗口数据为空,那么窗口需要barre对齐,不会触发。
                                // 空窗口将导致下游算子都无法进行计算,
                                // 设置idleness时间那么如果存在空窗口,当别的窗口有数据并且到达设置的时间(下面为10s)
                                // 窗口就会触发
                                .withIdleness(Duration.ofSeconds(10))
                )
                .keyBy(e -> e.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                //允许上个窗口范围内的数据迟到时间
                .allowedLateness(Time.seconds(2))
                //迟到数据输出到指定侧流
                .sideOutputLateData(sideOutput_dayle_data1)
                .process(new ProcessWindowFunction<Tuple2<String,Integer>, Tuple2<String,Integer>, String, TimeWindow>() {

                    int i;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        i = 0;
                    }

                    @Override
                    public synchronized void process(String s, Context context, Iterable<Tuple2<String, Integer>> elements, Collector<Tuple2<String, Integer>> out) throws Exception {
                        Iterator<Tuple2<String, Integer>> iterator = elements.iterator();
                        while (iterator.hasNext()) {
                            Tuple2<String, Integer> next = iterator.next();
                            out.collect(next);
                            if (next.f1%2 == 0) {
                                //收集数据到侧流
                                context.<Tuple2<String, Integer>>output(outputTag,Tuple2.of("side-" + next.f0, next.f1));
                            }
                        }
                        System.out.println("=======================  ^^^ window_num:\t" + i++ + " ^^^ ==================");
                        System.out.println("======================= range: [" + context.window().getStart() + "___" + context.window().getEnd() + " )");
                        System.out.println("======================= key: " + s );
                        System.out.println("======================= watermark: " + context.currentWatermark());
                        System.out.println("======================================================");
                    }
                });
    }
}

/**
 * 将sock 输入的string字符串解析,并将解析失败的数据输入到侧流,
 * 返回 Tuple2<String, Integer>
 */
class FlatMapProcess extends ProcessFunction<String, Tuple2<String, Integer>> {

    private OutputTag<String> outputTag;

    public FlatMapProcess(OutputTag<String> outputTag) {
        this.outputTag = outputTag;
    }

    @Override
    public void processElement(String value, Context ctx, Collector<Tuple2<String, Integer>> out) throws Exception {
        try {
            String[] infos = value.split("\\s+");
            if (infos.length == 2) {
                out.collect(Tuple2.of(infos[0], Integer.valueOf(infos[1])));
            } else {
                ctx.output(outputTag, "format exception format_:\t" + value);
            }
        } catch (NumberFormatException e) {
            //异常数据收集到outputTag指定的侧流
            ctx.output(outputTag, "format exception format_:\t" + value);
        }
    }
}

输入

a 

a 0
a 1
a 10
a 15
a 16
a 10
a 9
a 8
a 1
a 12
a 1
a 15
a 1
a 17
a 1

输出

2> parseException:	format exception format_:	a 
1> parseException:	format exception format_:	
2> (a,0)
2> (o_side-a,0)
2> (a,1)
=======================  ^^^ window_num:	0 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 9999
======================================================
2> (a,0)
2> (o_side-a,0)
2> (a,1)
2> (a,9)
=======================  ^^^ window_num:	1 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 10999
======================================================
2> (a,0)
2> (o_side-a,0)
2> (a,1)
2> (a,9)
2> (a,8)
2> (o_side-a,8)
=======================  ^^^ window_num:	2 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 10999
======================================================
2> (a,0)
2> (o_side-a,0)
2> (a,1)
2> (a,9)
2> (a,8)
2> (o_side-a,8)
2> (a,1)
=======================  ^^^ window_num:	3 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 10999
======================================================
2> (a,0)
2> (o_side-a,0)
2> (a,1)
2> (a,9)
2> (a,8)
2> (o_side-a,8)
2> (a,1)
2> (a,1)
=======================  ^^^ window_num:	4 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 10999
======================================================
2> (a,0)
2> (o_side-a,0)
2> (a,1)
2> (a,9)
2> (a,8)
2> (o_side-a,8)
2> (a,1)
2> (a,1)
2> (a,1)
=======================  ^^^ window_num:	5 ^^^ ==================
======================= range: [0___10000 )
======================= key: a
======================= watermark: 10999
======================================================
2> (lateData:	a,1)    -- 输入17后该条数据失效
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值