数仓之订单事实表【累积型快照事实表】

数仓之订单事实表【累积型快照事实表】

1.创建订单事实表

DWD订单事实表(累积型快照事实表)
drop table if exists dwd_fact_order_info;
create external table dwd_fact_order_info (
                                              `id` string COMMENT '订单编号',
                                              `order_status` string COMMENT '订单状态',
                                              `user_id` string COMMENT '用户id',
                                              `out_trade_no` string COMMENT '支付流水号',
                                              `create_time` string COMMENT '创建时间(未支付状态)',
                                              `payment_time` string COMMENT '支付时间(已支付状态)',
                                              `cancel_time` string COMMENT '取消时间(已取消状态)',
                                              `finish_time` string COMMENT '完成时间(已完成状态)',
                                              `refund_time` string COMMENT '退款时间(退款中状态)',
                                              `refund_finish_time` string COMMENT '退款完成时间(退款完成状态)',
                                              `province_id` string COMMENT '省份ID',
                                              `activity_id` string COMMENT '活动ID',
                                              `original_total_amount` decimal(16,2) COMMENT '原价金额',
                                              `benefit_reduce_amount` decimal(16,2) COMMENT '优惠金额',
                                              `feight_fee` decimal(16,2) COMMENT '运费',
                                              `final_total_amount` decimal(16,2) COMMENT '订单金额'
) COMMENT '订单事实表'
    PARTITIONED BY (`dt` string)--按天进行分区
    stored as parquet--列式存储parquet
    location '/warehouse/gmall/dwd/dwd_fact_order_info/'--HDFS数据的位置
    tblproperties ("parquet.compression"="lzo");--采用LZO对数据压缩

2.编写脚本

#!/bin/bash

#指定数据库&hive路径
APP=gmall
hive=/opt/module/hive-3.1.2/bin/hive

#如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;then
    do_date=$2
else 
    do_date=`date -d "-1 day" +%F`
fi

sql1="
insert overwrite table ${APP}.dwd_fact_order_info partition(dt)
select
    if(new.id is null,old.id,new.id),
    if(new.order_status is null,old.order_status,new.order_status),
    if(new.user_id is null,old.user_id,new.user_id),
    if(new.out_trade_no is null,old.out_trade_no,new.out_trade_no),
    if(new.tms['1001'] is null,old.create_time,new.tms['1001']),--1001对应未支付状态
    if(new.tms['1002'] is null,old.payment_time,new.tms['1002']),--1002对应已支付状态
    if(new.tms['1003'] is null,old.cancel_time,new.tms['1003']),--1003对应已取消状态
    if(new.tms['1004'] is null,old.finish_time,new.tms['1004']),--1004对应已完成状态
    if(new.tms['1005'] is null,old.refund_time,new.tms['1005']),--1005对应退款中状态
    if(new.tms['1006'] is null,old.refund_finish_time,new.tms['1006']),--1006对应退款完成状态
    if(new.province_id is null,old.province_id,new.province_id),
    if(new.activity_id is null,old.activity_id,new.activity_id),
    if(new.original_total_amount is null,old.original_total_amount,new.original_total_amount),
    if(new.benefit_reduce_amount is null,old.benefit_reduce_amount,new.benefit_reduce_amount),
    if(new.feight_fee is null,old.feight_fee,new.feight_fee),
    if(new.final_total_amount is null,old.final_total_amount,new.final_total_amount),
    date_format(if(new.tms['1001'] is null,old.create_time,new.tms['1001']),'yyyy-MM-dd')
from
(
    select
        id,
        order_status,
        user_id,
        out_trade_no,
        create_time,
        payment_time,
        cancel_time,
        finish_time,
        refund_time,
        refund_finish_time,
        province_id,
        activity_id,
        original_total_amount,
        benefit_reduce_amount,
        feight_fee,
        final_total_amount
    from ${APP}.dwd_fact_order_info
    where dt
    in
    (
        select
          date_format(create_time,'yyyy-MM-dd')
        from ${APP}.ods_order_info
        where dt='$do_date'
    )
)old
full outer join
(
    select
        info.id,
        info.order_status,
        info.user_id,
        info.out_trade_no,
        info.province_id,
        act.activity_id,
        log.tms,
        info.original_total_amount,
        info.benefit_reduce_amount,
        info.feight_fee,
        info.final_total_amount
    from
    (
        select
            order_id,
            str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') tms
        from ${APP}.ods_order_status_log
        where dt='$do_date'
        group by order_id
    )log
    join
    (
        select * from ${APP}.ods_order_info where dt='$do_date'
    )info
    on log.order_id=info.id
    left join
    (
        select * from ${APP}.ods_activity_order where dt='$do_date'
    )act
    on log.order_id=act.order_id
)new
on old.id=new.id;
"

#sql的运行
 $hive -e "$sql"

3.思路

1.新旧表关联:
查询 [ods层订单状态表&订单信息表&活动订单表(新表:三表关联) + dwd层订单信息表(旧表:要查询ods层是否有当天的数据)]进行关联
2.三种情况:
   2.1[新表]有数据,[旧表]无数据 =>数据插入到当天的分区
   2.2[新表]有数据,[旧表]有数据 =>需要更新数据
   2.3[新表]无数据,[旧表]有数据 =>保留数据
3.重新写入覆盖到dwd层事实订单表

★sql细节突破

一条订单信息——对应——多个订单状态的处理

1.拼接:concat,订单状态操作时间
2.列转行:collect_set,按订单id进行分组
3.拼接每种订单状态信息:concat_ws,
4.将字符串转成<k,v>格式的map集合:str_to_map,数据样例如下:

“{”“1001"”:"“2020-10-12 10:31:30.0"”,"“1002"”:"“2020-10-12 10:31:30.0"”,"“1004"”:"“2020-10-12 10:31:30.0"”}"

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值