flink技术作实时分析

本文介绍了如何使用ApacheFlink处理前端日志数据,包括定义UserPageView类、自定义数据源生成、设置事件时间和水位线、实现时间窗口内的用户行为计数(两种方法:聚合和reduce)。展示了如何在1小时内统计每个用户的访问次数。
摘要由CSDN通过智能技术生成

前言:

        近年来,随着社会的不断发展,人们对于海量数据的挖掘和运用越来越重视,大数据的统计分析可以为企业决策者提供充实的依据。例如,通过对某网站日志数据统计分析,可以得出网站的日访问量,从而得出网站的欢迎程度;通过对移动APP的下载数据量进行统计分析,可得出应用程序的受欢迎程度,可通过不同维度进行更深层次的数据分析,为运营分析与推广决策提供可靠的数据依据。

利用flink对前端日志数据做ETL(试题案例)

1.定义UserPageView类有user,url,ts

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@AllArgsConstructor
@NoArgsConstructor
public class UserPageView {
    private String user;
    private String url;
    private Long ts;
}

2.自定义数据源 user从"Mary,Alice,Bob,Cary"中随机获取,url从"./home,./cart,./fav,./prod?id=1,./prod?id=2"中随机,ts使用当前时间戳

GeneratorFunction<Long, UserPageView> generatorFunction = new GeneratorFunction<Long, UserPageView>() {
            @Override
            public UserPageView map(Long aLong) throws Exception {
                String[] users = "Mary,Alice,Bob,Cary".split(",");
                String[] urls = "./home,./cart,./fav,./prod?id=1,.prod?id=2".split(",");
                Random random = new Random();
                UserPageView userPageView = new UserPageView();
                userPageView.setUser(users[random.nextInt(users.length)]);
                userPageView.setUrl(urls[random.nextInt(urls.length)]);
                userPageView.setTs(System.currentTimeMillis());
                return userPageView;
            }
        };

3.设置时间语义为事件时间,水位线设置5秒延迟时间

DataStreamSource<UserPageView> stream =
                env.fromSource(source,
                        WatermarkStrategy
                        .<UserPageView>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                        .withTimestampAssigner((event,ts)->{
                            return event.getTs();
                        }),
                        "Generator Source");

4.转化成二元组,里面有user,数量(两种方法)

5.设置user为key 开1小时窗口,查询出1小时内user出现了多少次【两个小题和为一个代码】

(1).方法一

SingleOutputStreamOperator<Tuple2<String, Integer>> operator = stream.map(value -> Tuple2.of(value.getUser(), 1))
                .returns(Types.TUPLE(Types.STRING, Types.INT))
                .keyBy(value -> value.f0)
                .window(TumblingEventTimeWindows.of(Time.hours(1)))
                .aggregate(new AggregateFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple2<String, Integer>>() {

                    @Override
                    public Tuple2<String, Integer> createAccumulator() {
                        return Tuple2.of(null, 0);
                    }

                    @Override
                    public Tuple2<String, Integer> add(Tuple2<String, Integer> value, Tuple2<String, Integer> accumulator) {
                        accumulator.f0 = value.f0;
                        accumulator.f1 += value.f1;
                        return accumulator;
                    }

                    @Override
                    public Tuple2<String, Integer> getResult(Tuple2<String, Integer> accumulator) {
                        return accumulator;
                    }

                    @Override
                    public Tuple2<String, Integer> merge(Tuple2<String, Integer> a, Tuple2<String, Integer> b) {
                        return null;
                    }
                });

(2).方法二

SingleOutputStreamOperator<Tuple2<String, Integer>> reduce = stream.map(value -> Tuple2.of(value.getUser(), 1))
                .returns(Types.TUPLE(Types.STRING, Types.INT))
                .keyBy(value -> value.f0)
                .window(TumblingEventTimeWindows.of(Time.hours(1)))
                .reduce(new ReduceFunction<Tuple2<String, Integer>>() {
                    public Tuple2<String, Integer> reduce(Tuple2<String, Integer> v1, Tuple2<String, Integer> v2) {
                        return new Tuple2<>(v1.f0, v1.f1 + v2.f1);
                    }
                });

6.输出可查看效果【二选一即可】

operator.print();
reduce.print();

代码总结:

public class UserPageViewCountDmo {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //并行度设置为1
        env.setParallelism(1);
        GeneratorFunction<Long, UserPageView> generatorFunction = new GeneratorFunction<Long, UserPageView>() {
            @Override
            public UserPageView map(Long aLong) throws Exception {
                String[] users = "Mary,Alice,Bob,Cary".split(",");
                String[] urls = "./home,./cart,./fav,./prod?id=1,.prod?id=2".split(",");
                Random random = new Random();
                UserPageView userPageView = new UserPageView();
                userPageView.setUser(users[random.nextInt(users.length)]);//自定义数据源 user从"Mary,Alice,Bob,Cary"中随机获取
                userPageView.setUrl(urls[random.nextInt(urls.length)]);//url从"./home,./cart,./fav,./prod?id=1,./prod?id=2"中随机
                userPageView.setTs(System.currentTimeMillis());//ts使用当前时间戳
                return userPageView;
            }
        };
        long numberOfRecords = Long.MAX_VALUE;//模拟无界流

        DataGeneratorSource<UserPageView> source =
                new DataGeneratorSource<>(
                        generatorFunction,
                        numberOfRecords,
                        RateLimiterStrategy.perSecond(1),//1秒产生一条数据
                        Types.POJO(UserPageView.class));



        DataStreamSource<UserPageView> stream =
                env.fromSource(source,
                        WatermarkStrategy
                                //设置时间语义为事件时间,水位线设置5秒延迟时间
                        .<UserPageView>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                        .withTimestampAssigner((event,ts)->{
                            return event.getTs();
                        }),
                        "Generator Source");

//        stream.print();

//        //转化成二元组,里面有user,数量(方法一)
        SingleOutputStreamOperator<Tuple2<String, Integer>> operator = stream.map(value -> Tuple2.of(value.getUser(), 1))
                .returns(Types.TUPLE(Types.STRING, Types.INT))
                .keyBy(value -> value.f0)
                //设置user为key 开1小时窗口,查询出1小时内user出现了多少次
                .window(TumblingEventTimeWindows.of(Time.hours(1)))
                .aggregate(new AggregateFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple2<String, Integer>>() {

                    @Override
                    public Tuple2<String, Integer> createAccumulator() {
                        return Tuple2.of(null, 0);
                    }

                    @Override
                    public Tuple2<String, Integer> add(Tuple2<String, Integer> value, Tuple2<String, Integer> accumulator) {
                        accumulator.f0 = value.f0;//记录用户名
                        accumulator.f1 += value.f1;//用户次数累加
                        return accumulator;
                    }

                    @Override
                    public Tuple2<String, Integer> getResult(Tuple2<String, Integer> accumulator) {
                        return accumulator;
                    }

                    @Override
                    public Tuple2<String, Integer> merge(Tuple2<String, Integer> a, Tuple2<String, Integer> b) {
                        return null;
                    }
                });


        //转化成二元组,里面有user,数量(方法二)
        SingleOutputStreamOperator<Tuple2<String, Integer>> reduce = stream.map(value -> Tuple2.of(value.getUser(), 1))
                .returns(Types.TUPLE(Types.STRING, Types.INT))
                .keyBy(value -> value.f0)
                //设置user为key 开1小时窗口,查询出1小时内user出现了多少次
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .reduce(new ReduceFunction<Tuple2<String, Integer>>() {
                    public Tuple2<String, Integer> reduce(Tuple2<String, Integer> v1, Tuple2<String, Integer> v2) {
                        return new Tuple2<>(v1.f0, v1.f1 + v2.f1);
                    }
                });

//        operator.print();
        reduce.print();

        //执行
        env.execute();
    }
}

测试效果:

//每10秒打印一次效果(仅做参考)
(Alice,2)
(Mary,2)
(Bob,1)

  • 13
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值