flink java旁路输出(Side Output)，对原始流进行分流、复制

最新推荐文章于 2023-06-18 17:09:43 发布

呆萌的代Ma

最新推荐文章于 2023-06-18 17:09:43 发布

阅读量1.8k

点赞数 1

分类专栏：大数据 java 文章标签： Flink ProcessFunction 数据分流旁路输出文本流

本文为CSDN博主"呆萌的代Ma"原创文章，转载请注明博客链接：https://blog.csdn.net/weixin_35757704/

本文链接：https://blog.csdn.net/weixin_35757704/article/details/120614409

版权

大数据同时被 2 个专栏收录

51 篇文章 6 订阅

订阅专栏

java

43 篇文章 1 订阅

订阅专栏

flink通过ProcessFunction来分流，可以将一份流进行拆分、复制等操作，比如下面的代码通过读取一个基本的文本流，将流分别做处理后进行输出：

案例代码

package wordcount;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;


public class manyOutWordCount {

    public static void main(String[] args) throws Exception {
        // 1.创建流式执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        // 2.从文件中读取数据
        DataStream<String> dataStream = env.readTextFile("src/main/resources/hello.txt");
        // 执行环境并行度设置3
        env.setParallelism(3);
        // 3.按照空格分词,流的类型是new Tuple2<>(wordLine, 1)
        DataStream<Tuple2<String, Integer>> sensorStream = dataStream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String value, Collector<Tuple2<String, Integer>> out) throws Exception {
                String[] wordString = value.split(" ");
                for (String wordLine : wordString) {
                    out.collect(new Tuple2<>(wordLine, 1));
                }
            }
        });

        //旁路输出，拆分流
        final OutputTag<Tuple2<String, Integer>> sideStream = new OutputTag<Tuple2<String, Integer>>("te") {
        };
        SingleOutputStreamOperator<Tuple2<String, Integer>> mainDataStream = sensorStream.process(new ProcessFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {
            @Override
            public void processElement(Tuple2<String, Integer> value, ProcessFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>.Context ctx, Collector<Tuple2<String, Integer>> out) throws Exception {
                out.collect(new Tuple2<>(value.f0, 2)); // 这里把 mainDataStream 的输出变为 Tuple(单词,2)
                ctx.output(sideStream, value); // 这里把 sideStream 的输出变为 Tuple(单词,1)
            }
        });
        DataStream<Tuple2<String, Integer>> sideOutput = mainDataStream.getSideOutput(sideStream);//获取sideOutput的数据

        sideOutput.print();
        mainDataStream.print();

        //执行
        env.execute();
    }
}

其中数据hello.txt的文件内容是：

hello world
hello flink
hello spark
When we have shuffled off this mortal coil
When we have shuffled off this mortal coil
ack
hello world
hello flink
hello spark
When we have shuffled off this mortal coil
When we have shuffled off this mortal coil
ack

呆萌的代Ma

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
flink java旁路输出(Side Output)，对原始流进行分流、复制

flink通过ProcessFunction来分流，可以将一份流进行拆分、复制等操作，比如下面的代码通过读取一个基本的文本流，将流分别做处理后进行输出：案例代码package wordcount;import org.apache.flink.api.common.functions.FlatMapFunction;import org.apache.flink.api.java.tuple.Tuple2;import org.apache.flink.streaming.api.datastr
复制链接

扫一扫