大数据技术-flink-利用RichMapFunction和AggregateFunction计算系统访问量PV\UV

flink某一个算子,不同分区水印不一致

在研发过程红碰到一个问题,实时计算的flink某一个算子的不同分区水印不一致,很明显,由于上游在keyby处理后,导致下游不同分区数据分配不均衡,导致不同分区到达的水印不一致,这种情况下可以考虑收缩分区数量;

如下是统计PV\UV的核心计算代码;

 		int parallelism = 2;
        int dbParallelism = 1;
        int days = 7;
        DataStream<PVModel> dataStream = dataStream
                .keyBy(item -> item.getName())
                .map(new MyRepeatMapFunction(days)).setParallelism(parallelism)
                .keyBy(item -> item.getName())
                .timeWindow(Time.days(days), Time.seconds(10))
                .aggregate(new MyPVStatisticFunction()).setParallelism(parallelism)
                .name(this.days + "Day");
        dataStream.addSink(sink).setParallelism(dbParallelism).name("ToDbSink");

这里采用map的RichMapFunction对数据进行预处理,将PV的访问数据加工成PVModel模型,如下

datenameuvpv
2022-05-30 13:00:00ikong11
2022-05-30 14:00:00ikong01
2022-05-30 15:00:00ikong01
2022-05-30 14:00:00lilei11
2022-05-30 15:00:00lilei01

思路:通过state记录当天的用户是否已经存在,不存在则uv=1,否则uv=0,pv会一直为1;

MyRepeatMapFunction.java

 public static class MyRepeatMapFunction extends RichMapFunction<UVModel, PVModel> {

        private int days;

        public MyRepeatMapFunction(int days) {
            this.days = days;
        }


        MapState<String, Integer> uvState;
        MapState<String, Integer> pvState;

        @Override
        public void open(Configuration parameters) throws Exception {

            uvState = this.getRuntimeContext()
                    .getMapState(new MapStateDescriptor<>("uv-state" + days
                            , BasicTypeInfo.STRING_TYPE_INFO
                            , BasicTypeInfo.INT_TYPE_INFO)
                    );

            pvState = this.getRuntimeContext()
                    .getMapState(new MapStateDescriptor<>("pv-state" + days
                            , BasicTypeInfo.STRING_TYPE_INFO
                            , BasicTypeInfo.INT_TYPE_INFO)
                    );
        }


        @Override
        public PVModel map(UVModel uvm) throws Exception {
            String uvKey = uvm.getName() + new DateTime(uvm.getViewTime()).toString("yyyyMMdd");
            String pvKey = uvm.getName();
            int uv = 0;
            if (!uvState.contains(uvKey)) {
                uv = 1;
                uvState.put(uvKey, uv);
            }

            int pv = 1;
            pvState.put(pvKey, pv);
            PVModel vm = new PVModel();
            vm.setDays(uv);
            vm.setCount(pv);
            vm.setName(uvm.getName());

            return vm;
        }

    }

通过聚合函数AggregateFunction,对用户的uv、pv数据进行累加,最终得到用户最近x天的uv和pv总数

public static class MyPVStatisticFunction implements AggregateFunction<PVModel, PVModel, PVModel> {


        @Override
        public MyPVStatisticFunction createAccumulator() {
            return new PVModel();
        }

        @Override
        public PVModel add(PVModel uvm, PVModel acc) {

            acc.setCount(acc.getCount() + 1);

            acc.setName(uvm.getName());

            acc.setDays(acc.getDays() + uvm.getDays());

            return acc;
        }

        @Override
        public PVModel getResult(PVModel tp) {
            return tp;
        }

        @Override
        public PVModel merge(PVModel tp1, PVModel tp2) {
            PVModel uvm = new PVModel();
            uvm.setCount(tp1.getCount() + tp2.getCount());
            uvm.setDays(tp1.getDays() + tp2.getDays());
            uvm.setName(tp1.getName());
            return uvm;
        }
    }

对PVModel进行累加后的结果如下

datenameuvpv
2022-05-30 ikong13
2022-05-30 14:00:00lilei12

其他辅助代码

UVModel.java

public class UVModel {

    private String name = "";

    private String url = "";

    private Long viewTime = 0L;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public Long getViewTime() {
        return viewTime;
    }

    public void setViewTime(Long viewTime) {
        this.viewTime = viewTime;
    }

    @Override
    public String toString() {
        return "UVModel{" +
                "name='" + name + '\'' +
                ", url='" + url + '\'' +
                ", viewTime=" + viewTime +
                '}';
    }
}

PVModel.java

public class PVModel implements Serializable {
    private String name = "";

    private int days = 0;

    private int count = 0;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getDays() {
        return days;
    }

    public void setDays(int days) {
        this.days = days;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }


    @Override
    public String toString() {
        return "UserViewStatisticModel{" +
                "name='" + name + '\'' +
                ", days=" + days +
                ", count=" + count +
                '}';
    }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

码者人生

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值