Flink学习笔记(5)——算子

基本转换算子

  1. map:输入一条记录,输出一个结果,不允许不输出

  2. flatmap:输入一条记录,可以输出0或者多个结果

  3. filter:如果结果为真,则仅发出记录

    package transform;
    
    import org.apache.flink.api.common.functions.FilterFunction;
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.common.functions.MapFunction;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.util.Collector;
    import wc.WordCountSet;
    
    /**
     * Created with IntelliJ IDEA.
     *
     * @Author: yingtian
     * @Date: 2021/05/13/9:55
     * @Description: 基础算子  map flatmap filter
     */
    public class TransformTest1_Base {
    
        public static void main(String[] args) throws Exception{
            // 创建执行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setParallelism(1);
    
            //读取数据
            String inputPath = WordCountSet.class.getClassLoader().getResource("sensor.txt").getFile();
            DataStreamSource<String> dataStream = env.readTextFile(inputPath);
    
            //输出每行字符的长度 结果一一对应 需要new一个MapFunction
            SingleOutputStreamOperator<Integer> mapStream = dataStream.map(new MapFunction<String, Integer>() {
                @Override
                public Integer map(String s) throws Exception {
                    return s.length();
                }
            });
    
            //按逗号拆分,输出拆分结果  结果0到多个
            SingleOutputStreamOperator<String> flatmapStream = dataStream.flatMap(new FlatMapFunction<String, String>() {
                @Override
                public void flatMap(String s, Collector<String> collector) throws Exception {
                    String[] split = s.split(",");
                    for (String str : split)
                        collector.collect(str);
                }
            });
    
            //过滤 包含sensor_1才要
            SingleOutputStreamOperator<String> filterStream = dataStream.filter(new FilterFunction<String>() {
                @Override
                public boolean filter(String s) throws Exception {
                    return s.startsWith("sensor_1");
                }
            });
    
            //打印输出
            mapStream.print("map");
            flatmapStream.print("flatmap");
            filterStream.print("filter");
    
            env.execute();
    
        }
    }
    
    

聚合操作算子

在flink的设计中,所有数据必须先分组才能做聚合操作。先keyBy得到KeyedStream,然后调用其reduce、sum等聚合操作方法。(先分组后聚合)。

常见的聚合操作算子主要有:

  • keyBy

  • 滚动聚合算子Rolling Aggregation

    package transform;
    
    import bean.SensorReading;
    import org.apache.flink.api.common.functions.MapFunction;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.datastream.KeyedStream;
    import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import wc.WordCountSet;
    
    /**
     * Created with IntelliJ IDEA.
     *
     * @Author: yingtian
     * @Date: 2021/05/08/15:04
     * @Description: 测试max maxBy
     */
    public class TransformTest2_RollingAggregation {
    
        public static void main(String[] args) throws Exception{
            //创建执行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setParallelism(1);
    
            //读取数据
            String inputPath = WordCountSet.class.getClassLoader().getResource("sensor.txt").getFile();
            DataStreamSource<String> dataStream = env.readTextFile(inputPath);
    
            //拆分数据
            SingleOutputStreamOperator<SensorReading> sensorStream = dataStream.map((MapFunction<String, SensorReading>) line -> {
                String[] split = line.split(",");
                return new SensorReading(split[0], new Long(split[1]), new Double(split[2]));
            });
    
            //按id分组
            KeyedStream<SensorReading, String> keyedStream = sensorStream.keyBy(SensorReading::getId);
    
            //分组取最大的温度值
            SingleOutputStreamOperator<SensorReading> maxStream = keyedStream.max("temperature");
            
            SingleOutputStreamOperator<SensorReading> maxByStream = keyedStream.maxBy("temperature");
    
            maxStream.print("max");
            maxByStream.print("maxBy");
    
            env.execute();
    
        }
    }
    
  • reduce
    Reduce适用于更加一般化的聚合操作场景。java中需要实现ReduceFunction函数式接口。

    package transform;
    
    import bean.SensorReading;
    import org.apache.flink.api.common.functions.MapFunction;
    import org.apache.flink.api.common.functions.ReduceFunction;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.datastream.KeyedStream;
    import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import wc.WordCountSet;
    
    /**
     * Created with IntelliJ IDEA.
     *
     * @Author: yingtian
     * @Date: 2021/05/14/16:00
     * @Description: 测试reduce
     */
    public class TransformTest2_Reduce {
    
        public static void main(String[] args) throws Exception{
            //创建执行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setParallelism(1);
    
            //读取数据
            String inputPath = WordCountSet.class.getClassLoader().getResource("sensor.txt").getFile();
            DataStreamSource<String> dataStream = env.readTextFile(inputPath);
    
            //拆分数据
            SingleOutputStreamOperator<SensorReading> sensorStream = dataStream.map((MapFunction<String, SensorReading>) line -> {
                String[] split = line.split(",");
                return new SensorReading(split[0], new Long(split[1]), new Double(split[2]));
            });
    
            //按id分组
            KeyedStream<SensorReading, String> keyedStream = sensorStream.keyBy(SensorReading::getId);
    
            //获取同组历史温度最高的传感器信息,同时要求实时更新其时间戳信息
            //value2代表最新的数据
            SingleOutputStreamOperator<SensorReading> reduceStream = keyedStream.reduce((ReduceFunction<SensorReading>) (value1, value2) -> new SensorReading(value1.getId(),value2.getTimestamp(),Math.max(value1.getTemperature(),value2.getTemperature())));
    
            reduceStream.print();
    
            env.execute();
        }
    }
    
    

多流转换算子

  • OutputTag
    OutputTag可以按照一定的条件拆分一个流。

    package transform;
    
    import bean.SensorReading;
    import org.apache.flink.api.common.functions.MapFunction;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.datastream.DataStreamSource;
    import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.streaming.api.functions.ProcessFunction;
    import org.apache.flink.util.Collector;
    import org.apache.flink.util.OutputTag;
    import wc.WordCountSet;
    
    /**
     * Created with IntelliJ IDEA.
     *
     * @Author: yingtian
     * @Date: 2021/05/14/16:22
     * @Description:
     */
    public class TransformTest4_MultipleStreams {
    
        public static void main(String[] args) throws Exception{
    
            //创建执行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            env.setParallelism(1);
    
            //读取数据
            String inputPath = WordCountSet.class.getClassLoader().getResource("sensor.txt").getFile();
            DataStreamSource<String> dataStream = env.readTextFile(inputPath);
    
            //拆分数据
            SingleOutputStreamOperator<SensorReading> sensorStream = dataStream.map((MapFunction<String, SensorReading>) line -> {
                String[] split = line.split(",");
                return new SensorReading(split[0], new Long(split[1]), new Double(split[2]));
            });
    
            //定义侧输出流  按照温度值30度为界分为两条流
            OutputTag<SensorReading> high = new OutputTag<SensorReading>("high"){};
            OutputTag<SensorReading> low = new OutputTag<SensorReading>("low"){};
    
            //实现侧输出流
            SingleOutputStreamOperator<SensorReading> outputStream = sensorStream.process(new ProcessFunction<SensorReading, SensorReading>() {
                @Override
                public void processElement(SensorReading sensorReading, Context context, Collector<SensorReading> collector) throws Exception {
                    collector.collect(sensorReading);//常规输出
    
                    if (sensorReading.getTemperature() > 30) { //侧输出
                        context.output(high, sensorReading);
                    } else {
                        context.output(low, sensorReading);
                    }
                }
            });
    
            //获取侧输出流
            DataStream<SensorReading> highStream = outputStream.getSideOutput(high);
            DataStream<SensorReading> lowStream = outputStream.getSideOutput(low);
    
            //打印
            outputStream.print("out");
            highStream.print("high");
            lowStream.print("low");
    
            env.execute();
        }
    }
    
    
  • Connect
    在这里插入图片描述
    DataStream,DataStream -> ConnectedStreams: 连接两个保持他们类型的数据流,两个数据流被Connect 之后,只是被放在了一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立。

  • CoMap

在这里插入图片描述
ConnectedStreams -> DataStream: 作用于ConnectedStreams 上,功能与map和flatMap一样,对ConnectedStreams 中的每一个Stream分别进行map和flatMap操作;

public static void main(String[] args) throws Exception{

        //创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        //读取数据
        String inputPath = WordCountSet.class.getClassLoader().getResource("sensor.txt").getFile();
        DataStreamSource<String> dataStream = env.readTextFile(inputPath);

        //拆分数据
        SingleOutputStreamOperator<SensorReading> sensorStream = dataStream.map((MapFunction<String, SensorReading>) line -> {
            String[] split = line.split(",");
            return new SensorReading(split[0], new Long(split[1]), new Double(split[2]));
        });

        //定义侧输出流  按照温度值30度为界分为两条流
        OutputTag<SensorReading> high = new OutputTag<SensorReading>("high"){};
        OutputTag<SensorReading> low = new OutputTag<SensorReading>("low"){};

        //实现侧输出流
        SingleOutputStreamOperator<SensorReading> outputStream = sensorStream.process(new ProcessFunction<SensorReading, SensorReading>() {
            @Override
            public void processElement(SensorReading sensorReading, Context context, Collector<SensorReading> collector) throws Exception {
                collector.collect(sensorReading);//常规输出

                if (sensorReading.getTemperature() > 30) { //侧输出
                    context.output(high, sensorReading);
                } else {
                    context.output(low, sensorReading);
                }
            }
        });

        //获取侧输出流
        DataStream<SensorReading> highStream = outputStream.getSideOutput(high);
        DataStream<SensorReading> lowStream = outputStream.getSideOutput(low);

		//连接两个流
        ConnectedStreams<SensorReading, SensorReading> connectStream = highStream.connect(lowStream);
        
        //分别对两个流做操作  使用coMapFunction
        SingleOutputStreamOperator<String> coMapStream = connectStream.map(new CoMapFunction<SensorReading, SensorReading, String>() {
            @Override
            public String map1(SensorReading value) throws Exception {
                return value.getId() + ":" + value.getTemperature() + ":high warning";
            }

            @Override
            public String map2(SensorReading value) throws Exception {
                return value.getId() + ":" + value.getTemperature() + ":low warning";
            }
        });

        //打印
        coMapStream.print();
        env.execute();
    }
  • Union
    在这里插入图片描述
    DataStream -> DataStream:对两个或者两个以上的DataStream进行Union操作,产生一个包含多有DataStream元素的新DataStream。

总结

  1. map必须要输出一个结果
  2. flatmap可以输出0到多个结果
  3. fliter为真就输出
  4. 使用min、max、minBy、maxBy、reduce等算子必须先使用keyBy分组,如果数据结构是元组,可以使用下标位置来作为分组、计算的参数,但是如果是pojo,只能使用形参来作为参数,pojo必须有get/set方法,也必须包含无参构造
  5. Connect 的数据类型可以不同,Connect 只能合并两个流,coMap可以对两个流分别做操作并返回
  6. Union可以合并多条流,Union的数据结构必须是一样的
  7. 流转换图:
    在这里插入图片描述

ps:以上内容整理于SGG教程。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值