Flink实时数仓项目—DWS层设计与实现

本文介绍了使用Flink构建实时数仓DWS层的实践,详细阐述了地区主题表和关键词主题表的设计与实现,包括维度、度量字段的设定,Flink SQL操作,以及数据写入ClickHouse的过程。
摘要由CSDN通过智能技术生成


前言

前面使用DataStream的方式完成了前两个主题,后面两个主题用Flink SQL的方式完成。


三、DWS层—地区主题表的实现

1.地区主题需求

在这里插入图片描述

2.地区主题表实体类的设计

2.1 维度字段

省份的维度字段:省份id、省份名称、省份code、省份iso_code、省份3166_2_code(通过省份id可以查到所有的维度)

2.2 度量字段

度量字段:下单总次数、下单总金额
(pv和uv在访客主题中就已经求出来了,这里可以不用求了)

2.3 其他字段

窗口字段:窗口开始时间、窗口结束时间

3.功能实现

3.1 获取表环境

    //1、获取流式处理环境
    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    env.setParallelism(1);

    //2、获取表的流执行环境
    StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

3.2 DDL读取数据

使用DDL的方式从Kafka中读取数据并创建表,同时提取时间戳,设置事件语义,代码如下:

    //3、使用DDL连接Kafka,并在DDL语句中指定事件时间
    tableEnv.executeSql("create table order_wide(" +
            "province_id BIGINT," +
            "province_name STRING," +
            "province_area_code STRING," +
            "province_iso_code STRING," +
            "province_3316_2_code STRING," +
            "order_id BIGINT," +
            "total_amount DECIMAL," +
            "create_time STRING," +
            "ts AS TO_TIMESTAMP(create_time)," +
            "WATERMARK FOR ts AS ts - INTERVAL '1' SECOND)" +
            "WITH(" +
            "'connector'='kafka'," +
            "'topic'='dwm_order_wide'," +
            "'properties.group.id'='province_stats'," +
            "'properties.bootstrap.servers'='hadoop102,hadoop103,hadoop104'," +
            "'format'='json'," +
            "'scan.startup.mode'='latest-offset'" +
            ")");

3.3 开窗分组聚合统计

    //4、开窗分组聚合
    Table table = tableEnv.sqlQuery("select " +
            "DATE_FORMAT(window_start,'yyyy-MM-dd HH:mm:ss') stt," +
            "DATE_FORMAT(window_end,'yyyy-MM-dd HH:mm:ss') edt," +
            "province_id," +
            "province_name," +
            "province_area_code," +
            "province_iso_code," +
            "province_3316_2_code," +
            "count(distinct order_id) order_count," +
            "sum(total_amount) order_amount," +
            "UNIX_TIMESTAMP()*1000 ts" +
            "from table(" +
            "TUMBLE(table order_wide," +
            "DESCRIPTOR(ts)," +
            "INTERVAL '10' SECOND))" +
            "group by " +
            "window_start," +
            "window_end," +
            "province_id," +
            "province_name," +
            "province_area_code," +
            "province_iso_code," +
            "province_3316_2_code");

3.4 写入ClickHouse

将执行生成的table转化为流,然后将结果写入ClickHouse,代码如下:

    //5、将动态表转化为流
    DataStream<ProvinceStats> provinceStatsDataStream = tableEnv.toAppendStream(table, ProvinceStats.class);

    //6、将结果写入ClickHouse
    provinceStatsDataStream.addSink(ClickHouseUtil.getProvinceStatsSink("insert into province_stats_2022 values(?,?,?,?,?,?,?,?,?,?)"));
    public static SinkFunction<ProvinceStats> getProvinceStatsSink(String sql){
   
        return JdbcSink.<ProvinceStats>sink(sql,
                ((preparedStatement, provinceStats) -> 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值