Flink实时数仓项目—DWS层设计与实现
前言
前面使用DataStream的方式完成了前两个主题,后面两个主题用Flink SQL的方式完成。
三、DWS层—地区主题表的实现
1.地区主题需求
2.地区主题表实体类的设计
2.1 维度字段
省份的维度字段:省份id、省份名称、省份code、省份iso_code、省份3166_2_code(通过省份id可以查到所有的维度)
2.2 度量字段
度量字段:下单总次数、下单总金额
(pv和uv在访客主题中就已经求出来了,这里可以不用求了)
2.3 其他字段
窗口字段:窗口开始时间、窗口结束时间
3.功能实现
3.1 获取表环境
//1、获取流式处理环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
//2、获取表的流执行环境
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
3.2 DDL读取数据
使用DDL的方式从Kafka中读取数据并创建表,同时提取时间戳,设置事件语义,代码如下:
//3、使用DDL连接Kafka,并在DDL语句中指定事件时间
tableEnv.executeSql("create table order_wide(" +
"province_id BIGINT," +
"province_name STRING," +
"province_area_code STRING," +
"province_iso_code STRING," +
"province_3316_2_code STRING," +
"order_id BIGINT," +
"total_amount DECIMAL," +
"create_time STRING," +
"ts AS TO_TIMESTAMP(create_time)," +
"WATERMARK FOR ts AS ts - INTERVAL '1' SECOND)" +
"WITH(" +
"'connector'='kafka'," +
"'topic'='dwm_order_wide'," +
"'properties.group.id'='province_stats'," +
"'properties.bootstrap.servers'='hadoop102,hadoop103,hadoop104'," +
"'format'='json'," +
"'scan.startup.mode'='latest-offset'" +
")");
3.3 开窗分组聚合统计
//4、开窗分组聚合
Table table = tableEnv.sqlQuery("select " +
"DATE_FORMAT(window_start,'yyyy-MM-dd HH:mm:ss') stt," +
"DATE_FORMAT(window_end,'yyyy-MM-dd HH:mm:ss') edt," +
"province_id," +
"province_name," +
"province_area_code," +
"province_iso_code," +
"province_3316_2_code," +
"count(distinct order_id) order_count," +
"sum(total_amount) order_amount," +
"UNIX_TIMESTAMP()*1000 ts" +
"from table(" +
"TUMBLE(table order_wide," +
"DESCRIPTOR(ts)," +
"INTERVAL '10' SECOND))" +
"group by " +
"window_start," +
"window_end," +
"province_id," +
"province_name," +
"province_area_code," +
"province_iso_code," +
"province_3316_2_code");
3.4 写入ClickHouse
将执行生成的table转化为流,然后将结果写入ClickHouse,代码如下:
//5、将动态表转化为流
DataStream<ProvinceStats> provinceStatsDataStream = tableEnv.toAppendStream(table, ProvinceStats.class);
//6、将结果写入ClickHouse
provinceStatsDataStream.addSink(ClickHouseUtil.getProvinceStatsSink("insert into province_stats_2022 values(?,?,?,?,?,?,?,?,?,?)"));
public static SinkFunction<ProvinceStats> getProvinceStatsSink(String sql){
return JdbcSink.<ProvinceStats>sink(sql,
((preparedStatement, provinceStats) ->