9（11）第6章需求三：品牌复购率11

最新推荐文章于 2022-08-10 15:37:35 发布

佑熙

最新推荐文章于 2022-08-10 15:37:35 发布

阅读量409

点赞数

分类专栏：电商数仓3 文章标签：大数据

本文链接：https://blog.csdn.net/weixin_42871374/article/details/105412000

版权

电商数仓3 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

第6章需求三：品牌复购率
6.2 DWS层
6.2.1 用户购买商品明细表（宽表）

hive (gmall)>
drop table if exists dws_sale_detail_daycount;
create external table dws_sale_detail_daycount
( user_id string comment ‘用户 id’,
sku_id string comment ‘商品 Id’,
user_gender string comment ‘用户性别’,
user_age string comment ‘用户年龄’,
user_level string comment ‘用户等级’,
order_price decimal(10,2) comment ‘商品价格’,
sku_name string comment ‘商品名称’,
sku_tm_id string comment ‘品牌id’,
sku_category3_id string comment ‘商品三级品类id’,
sku_category2_id string comment ‘商品二级品类id’,
sku_category1_id string comment ‘商品一级品类id’,
sku_category3_name string comment ‘商品三级品类名称’,
sku_category2_name string comment ‘商品二级品类名称’,
sku_category1_name string comment ‘商品一级品类名称’,
spu_id string comment ‘商品 spu’,
sku_num int comment ‘购买个数’,
order_count string comment ‘当日下单单数’,
order_amount string comment ‘当日下单金额’
) COMMENT ‘用户购买商品明细表’
PARTITIONED BY (dt string)
stored as parquet
location ‘/warehouse/gmall/dws/dws_user_sale_detail_daycount/’
tblproperties (“parquet.compression”=“snappy”);
6.2.2 数据导入

hive (gmall)>
with
tmp_detail as
(
select
user_id,
sku_id,
sum(sku_num) sku_num,
count() order_count,
sum(od.order_pricesku_num) order_amount
from dwd_order_detail od
where od.dt=‘2019-02-10’
group by user_id, sku_id
)
insert overwrite table dws_sale_detail_daycount partition(dt=‘2019-02-10’)
select
tmp_detail.user_id,
tmp_detail.sku_id,
u.gender,
months_between(‘2019-02-10’, u.birthday)/12 age,
u.user_level,
price,
sku_name,
tm_id,
category3_id,
category2_id,
category1_id,
category3_name,
category2_name,
category1_name,
spu_id,
tmp_detail.sku_num,
tmp_detail.order_count,
tmp_detail.order_amount
from tmp_detail
left join dwd_user_info u on tmp_detail.user_id =u.id and u.dt=‘2019-02-10’
left join dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt=‘2019-02-10’
;

6.2.3 数据导入脚本

1）在/home/atguigu/bin目录下创建脚本dws_sale.sh
[atguigu@hadoop102 bin]$ vim dws_sale.sh
在脚本中填写如下内容
#!/bin/bash

定义变量方便修改

APP=gmall
hive=/opt/module/hive/bin/hive

如果是输入的日期按照取输入日期；如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date= $do_date=`date -d "-1 day"+%F` fi sql=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_detail as ( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, sum(od.order_price*sku_num) order_amount from "$ APP".dwd_order_detail od
where od.dt=‘ $do_date' group by user_id, sku_id ) insert overwrite table "$ APP".dws_sale_detail_daycount partition(dt=‘ $do_date') select tmp_detail.user_id, tmp_detail.sku_id, u.gender, months_between('$ do_date’, u.birthday)/12 age,
u.user_level,
price,
sku_name,
tm_id,
category3_id,
category2_id,
category1_id,
category3_name,
category2_name,
category1_name,
spu_id,
tmp_detail.sku_num,
tmp_detail.order_count,
tmp_detail.order_amount
from tmp_detail
left join " $APP".dwd_user_info u on tmp_detail.user_id=u.id and u.dt='$ do_date’
left join “ $APP".dwd_sku_info s on tmp_detail.sku_id =s.id and s.dt='$ do_date’;
"
$h i v e - e "$ sql”
2）增加脚本执行权限
[atguigu@hadoop102 bin]$ chmod 777 dws_sale.sh
3）执行脚本导入数据
[atguigu@hadoop102 bin]$ dws_sale.sh 2019-02-11
4）查看导入数据
hive (gmall)>
select * from dws_sale_detail_daycount where dt=‘2019-02-11’ limit 2;