Flink——Transformations的案例

创建flink初始环境

//配置WEBUI
        Configuration configuration=new Configuration();
        configuration.setInteger(RestOptions.PORT,8848);
        //创建flink执行环境
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(configuration);
        environment.setParallelism(4);
        environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        DataStream<Person> source = environment.addSource(new DataDB());
        source.print("source");
        DataStream<Times> timesSource = environment.addSource(new TimeDB());
        timesSource.print("timesource");

Map 接受一个元素并产生一个元素。将输入流的值加倍的映射函数

  DataStream<String> map = source.map(new MapFunction<Person, String>() {
        @Override
        public String map(Person person) throws Exception {
            String log=person.getPname()+","+person.getPage()+","+person.getPsex()+","+person.getPid();
            return log;
        } 
    });
    map.print("map");

FlatMap 接受一个元素并产生零个、一个或多个元素。

  DataStream<String> flatMap = map.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String s, Collector<String> collector) throws Exception {
                for(String word:s.split(",")){
                    collector.collect(word);
                }
            }
        });
        flatMap.print("flatMap");

Filter 过滤数据,如果返回true则该元素继续向下传递,如果为false则将该元素过滤掉

 DataStream<Person> filter = source.filter(new FilterFunction<Person>() {
            @Override
            public boolean filter(Person person) throws Exception {
                return person.getPage() > 18;
            }
        });
        filter.print("filter");

keyBy 将数据流元素进行逻辑上的分组,具有相同Key的记录将被划分到同一分组。KeyBy()使用Hash Partitioner实现。

KeyedStream<Person, Tuple> keyBy = source.keyBy("psex");`

Reduce 按照KeyedStream中的逻辑分组,将当前数据与最后一次的Reduce结果进行合并,合并逻辑由开发者自己实现该类运算应用在KeyedStream上

 DataStream<Person> reduce = keyBy.reduce(new ReduceFunction<Person>() {
            @Override
            public Person reduce(Person person, Person t1) throws Exception {
                int a = person.getPage() + t1.getPage();
                return new Person(person.getPid(), person.getPname(), person.getPsex(), a);
            }
        });
        reduce.print("reduce");

Fold Fold与Reduce类似,区别在于Fold是一个提供了初始值的Reduce,用初始值进行合并运算

DataStream<Integer> fold = keyBy.fold(3, new FoldFunction<Person, Integer>() {
        @Override
        public Integer fold(Integer integer, Person o) throws Exception {
            return integer + o.getPage();
        }
    });
    fold.print("fold");

Aggregation 渐进聚合具有相同Key的数据流元素,以min和minBy为例,min返回的是整个KeyedStream的最小值,minBy按照Key进行分组,返回每个分组的最小值。

		keyBy.sum("page").print("sum");
        keyBy.min("page").print("min");
        keyBy.max("page").print("max");
        keyBy.minBy("page").print("minBy");
        keyBy.maxBy("page").print("maxBy");

Window 对KeyedStream的数据,按照Key进行时间窗口切分,如每5秒钟一个滚动窗口,每个key都有自己的窗口。
TumblingEventTimeWindows根据事件自带时间,TumblingProcessingTimeWindows根据系统时间

DataStream<String> aggregate = keyBy.window(TumblingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(2)))
            .aggregate(new AggregateFunction<Person, Integer, String>() {
                //初始化一个累加器
                @Override
                public Integer createAccumulator() {
                    Integer integer = 0;
                    return integer;
                }

                //中间结果,来一条执行一次
                @Override
                public Integer add(Person person, Integer integer) {
                    return person.getPage() + integer;
                }

                //在窗口结束的时候执行一次
                @Override
                public String getResult(Integer integer) {
                    return integer.toString();
                }

                //累加结果
                @Override
                public Integer merge(Integer integer, Integer acc1) {
                    return integer + acc1;
                }
            });
    aggregate.print("aggregate");

WindowAll 对一般的DataStream进行时间窗口切分,即全局1个窗口,如每5秒钟一个滚动窗口。应用在DataStream上

 DataStream<Person> windowAll = source.windowAll(SlidingProcessingTimeWindows.of(Time.seconds(20), Time.seconds(5)))
                .reduce(new ReduceFunction<Person>() {
                    @Override
                    public Person reduce(Person person, Person t1) throws Exception {
                        Integer s = person.getPage() + t1.getPage();
                        return new Person(person.getPid(), person.getPname(), person.getPsex(), s);
                    }
                });
        windowAll.print("windowAll");

Window Apply 将Window函数应用到窗口上,Window函数将一个窗口的数据作为整体进行处理。Window Stream有两种:分组后的WindowedStream和未分组的AllWindowedStream。

DataStream<String> apply = source.keyBy("psex").countWindow(5)
        //触发器 按照个数触发计算
        .trigger(CountTrigger.of(5))
        .apply(new WindowFunction<Person, String, Tuple, GlobalWindow>() {
            @Override
            public void apply(Tuple tuple, GlobalWindow globalWindow, Iterable<Person> iterable, Collector<String> collector) throws Exception {
                int sum = 0;
                for (Person person : iterable) {
                    sum += person.getPage();
                }
                collector.collect("" + sum);
            }
        });
apply.print("apply");

Window Reduce 在WindowedStream上应用ReduceFunction,输出结果为DataStream

DataStream<Person> reducewindows = source.keyBy("psex").countWindow(5).reduce(new ReduceFunction<Person>() {
    @Override
    public Person reduce(Person person, Person t1) throws Exception {
        Integer s = person.getPage() + t1.getPage();
        return new Person(person.getPid(), person.getPname(), person.getPsex(), s);
    }
});
reducewindows.print("reducewindows");

Window Fold 在WindowedStream上应用Fold

 DataStream<Integer> windowfold = source.keyBy("psex").countWindow(5).fold(5, new FoldFunction<Person, Integer>() {
            @Override
            public Integer fold(Integer integer, Person o) throws Exception {
                return integer + o.getPage();
            }
        });
        windowfold.print("windowfold");

Window Aggregation 在WindowedStream上应用aggregation

    SingleOutputStreamOperator<String> windowaggregation = source.keyBy("psex").countWindow(5)
            .aggregate(new AggregateFunction<Person, Integer, String>() {
                //初始化一个累加器
                @Override
                public Integer createAccumulator() {
                    Integer integer = 0;
                    return integer;
                }

                //中间结果,来一条执行一次
                @Override
                public Integer add(Person person, Integer integer) {
                    return person.getPage() + integer;
                }

                //在窗口结束的时候执行一次
                @Override
                public String getResult(Integer integer) {
                    return integer.toString();
                }

                //累加结果
                @Override
                public Integer merge(Integer integer, Integer acc1) {
                    return integer + acc1;
                }
            });
   windowaggregation.print("windowaggregation");

Union 把两个或多个DataStream合并,所有DataStream中的元素都会组合成一个新的DataStream但是不去重

DataStream<Person> union = source.union(keyBy);
        union.print("union");

Window Join 在相同时间范围的窗口上Join两个DataStream数据流,输出结果为DataStream。

  DataStream<Tuple7<Integer, String, String, Integer, Integer, Double, Long>> join = source.join(timesSource)
                .where(person -> person.getPid()).equalTo(times -> times.getTid())
                .window(TumblingProcessingTimeWindows.of(Time.seconds(3)))
                .apply(new JoinFunction<Person, Times, Tuple7<Integer, String, String, Integer, Integer, Double, Long>>() {
                    @Override
                    public Tuple7<Integer, String, String, Integer, Integer, Double, Long> join(Person person, Times times) throws Exception {
                        return new Tuple7<Integer, String, String, Integer, Integer, Double, Long>(person.getPid(), person.getPname(), person.getPsex(), person.getPage(), times.getTid(), times.getTem(), times.getTimes());
                    }
                });
        join.print("join");

执行 environment.execute();

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值