新零售电商数据仓库系统项目
)
- 有需要项目资料的小伙伴请下方留言哦
一、前言
- 本次项目提供的代码资源仅为数据导入以及数据分析(注:该项目只实现了用户访问时间分析和订单分析两个模块)的部分。
- 用户使用浏览器(PC 端)和 app 对该平台进行访问,记录用户的注册信息,通过调研平台和数据挖掘记录用户的扩展信息,业务系统对用户的每次点击和下单信息记录到数据库中,并提供一台从库将主库的业务数据实时同步到从库中,同时分配对从库的只读访问权限,运营平台通过 Sqoop 将数据抽取到运营分析平台,运营分析平台进行指标的计算,将结果存入到 MySQL 中,将结果展示到报表。该项目的下游为可视化展示。由项目组其他人员完成。他们主要是将我们分析的结果,通过可视化展示,供一线人员使用。我们这里不提供该部分内容。但实际项目中,每个人一般是负责项目的一部分。
- 该项目业务实现部分,为一个运营分析平台中的众多模块的一部分
二、业务表模型
- 当我们对用户访问时间进行分析的时候需要用到用户表,用户的扩展表,用户的 app 端点击日志表,用户的 pc 端点击日志表,在这个需求当中,我们的事实表为用户的点击日志,而我们的维度表只有用户维度包括用户信息表和用户扩展信息,所以需要列出这四张表
- 当我们对用户订单分布进行分析的时候需要用到用户表,用户扩展表,用户订单表,用户收货地址表,订单收货地址表,交易表,订单商品表。在这个需求当中我们的事实表是用户订单表,维度表是用户维度,商品维度,收货地址维度,购物车维度等等,所以需要列出这 8 张表
- 在上图中还有一些和其他的表没有连线的表意味着在我们实现的部分当没有用到这些表,但是在实现其他的指标时候可能会用到,所以在这里选择保留
三、准备
安装部署文档书写标准
- 安装前准备(硬件、软件、其他)
- 安装中的步骤(傻瓜式操作)
- 安装过程中可能出现的异常处理(回滚)
- 安装完成后的测试
- 完成整个部署后的环境清理(安装残留)
数据来源技术:
- 网页埋点/日志:logstash/flume等日志收集工具
- 数据库信息:sqoop等数据抽取工具
- 各种形式的文件:可以放置hdfs或直接load进hive表
1、导入表格
将表导入到mysql库
mysql>source /root/snbap_ods.sql
- hive中创建ods层并从mysql库导入表格
##进入hive
hive
##创建ods库
create database snbap_ods;
##创建dw库
create database snbap_dw;
##创建dm库
create database snbap_dm;
##退出hive
exit;
##导入表格
sqoop import --connect jdbc:mysql://hadoopwei:3306/default \
--username root --password ok --table mysql表名 --hive-import --hive-table snbap_ods.hive中表名
2、创建用户宽表
将用户的基本信息表和用户的扩展表进行合并形成用户宽表.
包括信息如下:
用户 id,登录名,用户性别,出生日期,年龄,星座,省份,城市,城市等级,邮箱,邮箱运营商,手机号,手机号段(前三位),手机运营商,注册时间,登录 ip,登录来源,邀请人,会员积分,已使用积分,会员等级名称,是否黑名单,是否结婚,学历,月收入,职业,是否孕妇,是否有小孩,是否有车,使用手机品牌,使用手机等级,使用手机种类数量,更换手机数量 ,是否马甲用户,马甲账户数量,用户忠诚度,用户购物类型,体重,身高。
create table snbap_dw.user_basic
as select u.user_name,u.user_gender,
user_birthday, user_age,
constellation, province, city, city_level, e_mail,
op_mail, mobile, num_seg_mobile, op_Mobile, register_time,
login_ip, login_source, request_user, total_score, used_score,
is_blacklist, is_married, education, monthly_income, user_order_flag,
is_pregnant_woman,is_have_children,is_have_car,phone_brand,phone_brand_level,
phone_cnt,change_phone_cnt,is_maja,majia_account_cnt,
loyal_model,shopping_type_model,weight,height
from snbap_ods.user u join snbap_ods.user_extend e on u.user_id=e.user_id;
3、生成pc分区表
create table snbap_dw.user_pc_click_partition(
`log_id` bigint,
`user_id` bigint,
`session_id` string,
`cookie_id` string,
`visit_time` string,
`visit_url` string,
`visit_os` string,
`browser_name` string,
`visit_ip` string,
`province` string,
`city` string,
`pv` int,
`goods_id` bigint,
`shop_id` bigint
)
partitioned by (visit_date string);
##因为数据倾斜所以增加分片数
set mapred.reduce.tasks = 15;
##开启分区及修改最大分片数
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=600000;
set hive.exec.max.dynamic.partitions=6000000;
set hive.exec.max.created.files=6000000;
insert into snbap_dw.user_pc_click_partition partition (visit_date)
SELECT
`log_id` ,
`user_id` ,
`session_id` ,
`cookie_id` ,
`visit_time` ,
`visit_url` ,
`visit_os` ,
`browser_name` ,
`visit_ip` ,
`province`,
`city` ,
count(`page_id`) over(partition by user_id,visit_time) pv,
`goods_id`,
`shop_id` ,
date(visit_time)
from snbap_ods.user_pc_click_log ;
4、生成app分区表
create table snbap_dw.user_app_click_partition(
`user_id` bigint,
`imei` string,
`log_time` string,
`visit_os` string,
`os_version` string,
`app_name` string,
`app_version` string,
`device_token` string,
`visit_ip` string,
pv int,
`province` string,
`city` string
)
partitioned by(log_date string) ;
set mapred.reduce.tasks = 15;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=600000;
set hive.exec.max.dynamic.partitions=6000000;
set hive.exec.max.created.files=6000000;
insert into snbap_dw.user_app_click_partition partition(log_date)
SELECT
`user_id` ,
`imei` ,
`log_time` ,
`visit_os` ,
`os_version` ,
`app_name` ,
`app_version` ,
`device_token` ,
`visit_ip` ,
count(1) over(partition by user_id,log_time) pv ,
`province` ,
`city` ,
date(log_time)
from snbap_ods.user_app_click_log;
四、用户访问时间分析
- 按照用户 id 进行分组,统计用户 PC 端和 APP 端不同时间粒度的访问频率
- 根据用户的访问日志信息,统计以下指标,结果存入用户访问模型表 user_visit
1、pc端指标
最近一次访问时间
最近一次访问使用的 session
最近一次使用的 cookie
最近一次的 pc 端的 pv 量
最近一次访问使用的浏览器
最近一次访问使用的操作系统
create table snbap_dw.user_last_pc_click as
select distinct * from
(select user_id,session_id,cookie_id,visit_os,browser_name,visit_time,
rank() over(partition by user_id order by visit_time desc) r,
count(page_id) over(partition by user_id,visit_time order by visit_time ) cnt
from snbap_ods.user_pc_click_log) a
where r=1;
第一次 pc 端访问的日期
第一次 pc 端访问的 session
第一次 pc 端访问的 cookie
第一次访问的 pv
第一次访问使用的浏览器
第一次访问的 os
create table snbap_dw.user_first_pc_click as
select distinct * from
(select user_id,session_id,cookie_id,visit_os,browser_name,visit_time,
rank() over(partition by user_id order by visit_time ) r,
count(page_id) over(partition by user_id,visit_time order by visit_time desc ) cnt
from snbap_ods.user_pc_click_log) a
where r=1;
PC 连续 7 天访问次数(跑任务的日期的前 7 天,以下相同不再赘述)
连续 15 天访问次数
连续 30 天访问次数
连续 60 天访问的次数
连续 90 天访问的次数
create table snbap_dw.user_pc_visit_total as
with
t1 as (select user_id,count(visit_time) c7 from snbap_dw.user_pc_click_partition
where visit_time between date_sub(current_date(),7)
and current_date()
group by user_id),
t2 as (select user_id, count(visit_time) c15 from snbap_dw.user_pc_click_partition
where visit_time between date_sub(current_date(),15)
and current_date()
group by user_id),
t3 as (select user_id,count(visit_time) c30 from snbap_dw.user_pc_click_partition
where visit_time between date_sub(current_date(),30)
and current_date()
group by user_id),
t4 as (select user_id, count(visit_time) c60 from snbap_dw.user_pc_click_partition
where visit_time between date_sub(current_date(),60)
and current_date() group by user_id),
t5 as (select user_id, count(visit_time) c90 from snbap_dw.user_pc_click_partition
where visit_time between date_sub(current_date(),90)
and current_date() group by user_id)
select coalesce(t1.user_id,t2.user_id,t3.user_id,t4.user_id,t5.user_id) user_id,nvl(c7,0) week,
nvl(c15,0) half,nvl(c30,0) mon,nvl(c60,0) mon2,nvl(c90,0) season
from t1 full join t2 on t1.user_id=t2.user_id
full join t3 on t1.user_id=t2.user_id full join t4 on t1.user_id=t4.user_id
full join t5 on t1.user_id=t5.user_id;
近 30 天 pc 端每天的平均 pv
select cp.user_id, round(pv_30_total/30,2) from snbap_dw.user_pc_click_partition cp
join snbap_dw.user_pv_30_total t on cp.user_id=t.user_id
where visit_time between date_sub(current_date(),30)
and current_date();
近 30 天 pc 端的 pv
近 30 天的 0 到 5 点的 pv 数量
近 30 天的 6 到 7 点的 pv 数量
近 30 天的 8 到 9 的 pv 数量
近 30 天的 10 到 11 的 pv 数量
近 30 天的 12 到 13 的 pv 数量
近 30 天的 14 到 15 点的 pv 数量
近 30 天的 16 到 17 点的 pv 数量
近 30 天的 18 到 19 点的 pv 数量
近 30 天的 20 到 21 点的 pv 数量
近 30 天的 22 到 23 点的 pv 数量
近 30 天访问使用的不同 ip 数量、
近 30 天使用的 cookie 的数量
create table snbap_dw.user_pv_30_total as
select user_id, count(pv) pv_30_total,
sum(case when hour(visit_time) between 0 and 5 then 1 else 0 end ) pv_30_0_5,
sum(case when hour(visit_time) between 6 and 7 then 1 else 0 end ) pv_30_6_7,
sum(case when hour(visit_time) between 8 and 9 then 1 else 0 end ) pv_30_8_9,
sum(case when hour(visit_time) between 10 and 11 then 1 else 0 end ) pv_30_10_11,
sum(case when hour(visit_time) between 12 and 13 then 1 else 0 end ) pv_30_12_13,
sum(case when hour(visit_time) between 14 and 15 then 1 else 0 end ) pv_30_14_15,
sum(case when hour(visit_time) between 16 and 17 then 1 else 0 end ) pv_30_16_17,
sum(case when hour(visit_time) between 18 and 19 then 1 else 0 end ) pv_30_18_19,
sum(case when hour(visit_time) between 20 and 21 then 1 else 0 end ) pv_30_20_21,
sum(case when hour(visit_time) between 22 and 23 then 1 else 0 end ) pv_30_22_23,
count(distinct visit_ip) visit_ip_30_cnt,
count(distinct cookie_id) cookie_id_30_cnt
from snbap_dw.user_pc_click_partition
where visit_date between date_sub(current_date(),30) and current_date()
group by user_id;
近 30 天最常用的 ip
create table snbap_dw.user_30_visit_popular_ip as
select user_id,visit_ip from (
select user_id,visit_ip,
dense_rank() over(partition by user_id order by ip_cnt desc) rn
from
(select user_id,visit_ip,count(visit_ip) ip_cnt
from snbap_dw.user_pc_click_partition
where visit_date between date_sub(current_date(),30)
and current_date()
group by user_id,visit_ip) a) b
where rn=1;
近 30 使用最常用的 cookie_id
create table snbap_dw.user_30_cookie_popular_id as
select user_id,cookie_id from (
select user_id,cookie_id,
dense_rank() over(partition by user_id order by id_cnt desc) rn
from
(select user_id,cookie_id,count(cookie_id) id_cnt
from snbap_dw.user_pc_click_partition
where visit_date between date_sub(current_date(),30)
and current_date()
group by user_id,cookie_id) a) b
where rn=1;
近 30pc 最常用浏览器
create table snbap_dw.user_30_popular_browser as
select user_id,browser_name from (
select user_id,browser_name,
dense_rank() over(partition by user_id order by id_cnt desc) rn
from
(select user_id,browser_name,count(browser_name) id_cnt
from snbap_dw.user_pc_click_partition
where visit_date between date_sub(current_date(),30)
and current_date()
group by user_id,browser_name) a) b
where rn=1;
近 30 天使用最常用系统
create table snbap_dw.user_30_popular_os as
select user_id,visit_os from (
select user_id,visit_os,
dense_rank() over(partition by user_id order by id_cnt desc) rn
from
(select user_id,visit_os,count(visit_os) id_cnt
from snbap_dw.user_pc_click_partition
where visit_date between date_sub(current_date(),30)
and current_date()
group by user_id,visit_os) a) b
where rn=1;
创建user_pc_visit总表
create table snbap_dm.user_pc_visit as
select lc.user_id,lc.visit_time last_visit,lc.session_id last_session,lc.cookie_id last_cookie,
lc.cnt last_pv,lc.browser_name last_browser,lc.visit_os last_os,
fc.visit_time first_visit,fc.session_id first_session,fc.cookie_id first_cookie,
fc.cnt first_pv,fc.browser_name first_browser,fc.visit_os first_os,
week visit_week_cnt,half visit_2weeks_cnt,mon visit_month_cnt,mon2 visit_mon2_cnt,
season visit_season_cnt,mon pc_30_cnt,
round(pv_30_total/30,2) avg_30_total,pv_30_0_5,pv_30_6_7,pv_30_8_9,pv_30_10_11,
pv_30_12_13,pv_30_14_15,pv_30_16_17,pv_30_18_19,pv_30_20_21,pv_30_22_23,
pv.visit_ip_30_cnt,vi.visit_ip popular_ip,cookie_id_30_cnt,ci.cookie_id popular_cookie,
b.browser_name popular_browser,vo.visit_os popular_os
from
snbap_dw.user_last_pc_click lc join
snbap_dw.user_first_pc_click fc on lc.user_id=fc.user_id left join
snbap_dw.user_pc_visit_total vt on lc.user_id=vt.user_id left join
snbap_dw.user_pv_30_total pv on lc.user_id=pv.user_id left join
snbap_dw.user_30_visit_popular_ip vi on lc.user_id=vi.user_id left join
snbap_dw.user_30_cookie_popular_id ci on lc.user_id=ci.user_id left join
snbap_dw.user_30_popular_browser b on lc.user_id=b.user_id left join
snbap_dw.user_30_popular_os vo on lc.user_id=vo.user_id ;
2、app端指标
最近一次 app 访问的日期
最近一次访问 app 的名称
最近一次 app 访问的操作系统
create table snbap_dw.user_last_app_visit as
select distinct * from (
select user_id,log_time, app_name,visit_os,rank() over(partition by user_id order by log_date desc) r
from snbap_dw.user_app_click_partition) a
where r=1;
第一次 app 访问日期
第一 app 访问 app 的名称
第一次 app 访问 os
app 第一次访问 ip
create table snbap_dw.user_first_app_visit as
select distinct * from (
select user_id,log_time, app_name,visit_os,visit_ip,rank() over(partition by user_id order by log_date ) r
from snbap_dw.user_app_click_partition) a
where r=1;
app 近 7 天访问 pv 数
app 近 15 天访问 pv 数
app 近 30 天的访问 pv 数
app 近 60 天的访问 pv 数
app 近 90 天的访问 pv 数
create table snbap_dw.user_app_visit_count as
with
t1 as (select user_id, count(log_time) c7 from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),7)
and current_date() group by user_id),
t2 as (select user_id, count(log_time) c15 from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),15)
and current_date() group by user_id),
t3 as (select user_id, count(log_time) c30 from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),30)
and current_date() group by user_id),
t4 as (select user_id, count(log_time) c60 from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),60)
and current_date() group by user_id),
t5 as (select user_id, count(log_time) c90 from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),90)
and current_date() group by user_id)
select coalesce(t1.user_id,t2.user_id,t3.user_id,t4.user_id,t5.user_id) user_id,nvl(c7,0) week,nvl(c15,0) half,
nvl(c30,0) mon,nvl(c60,0) mon2,nvl(c90,0) season from t1
full join t2 on t1.user_id=t2.user_id
full join t3 on t1.user_id=t3.user_id
full join t4 on t1.user_id=t4.user_id
full join t5 on t1.user_id=t5.user_id;
app 近 30 天 0 到 5 点的访问 pv 数
app 近 30 天的 6 到 7 点的访问 pv 数
app 近 30 天 8 到 9 的访问 pv 数
app 近 30 天 10 到 11 访问 pv 数
app 近 30 天 12 到 13 点的访问 pv 数
app 近 30 天 14 到 15 点的访问 pv 数
app 近 30 天 16 到 17 点的访问 pv 数
app 近 30 天 18 到 19 点的访问 pv 数
app 近 30 天 20 到 21 点的访问 pv 数
app 近 30 天 22 到 23 点的访问 pv 数
create table snbap_dw.user_app_30_cnt_time as
select user_id, sum(case when hour(log_time) between 0 and 5 then 1 else 0 end) sum1,
sum( case when hour(log_time) between 6 and 7 then 1 else 0 end ) sum2,
sum( case when hour(log_time) between 8 and 9 then 1 else 0 end ) sum3,
sum( case when hour(log_time) between 10 and 11 then 1 else 0 end ) sum4,
sum( case when hour(log_time) between 12 and 13 then 1 else 0 end ) sum5,
sum( case when hour(log_time) between 14 and 15 then 1 else 0 end ) sum6,
sum( case when hour(log_time) between 16 and 17 then 1 else 0 end ) sum7,
sum( case when hour(log_time) between 18 and 19 then 1 else 0 end ) sum8,
sum( case when hour(log_time) between 20 and 21 then 1 else 0 end ) sum9,
sum( case when hour(log_time) between 22 and 23 then 1 else 0 end ) sum10
from snbap_dw.user_app_click_partition
where log_time between date_sub(current_date(),30)
and current_date()
group by user_id;
创建user_app_visit总表
create table snbap_dm.user_app_visit as
select fv.user_id,lv.log_time last_time,lv.app_name last_app,lv.visit_os last_os,
fv.log_time first_time,fv.app_name first_name,fv.visit_os first_os,fv.visit_ip first_ip,
week,half,mon,mon2,season,
sum1,sum2,sum3,sum4,sum5,sum6,sum7,sum8,sum9,sum10
from
snbap_dw.user_first_app_visit fv left join
snbap_dw.user_last_app_visit lv on fv.user_id=lv.user_id left join
snbap_dw.user_app_visit_count avc on fv.user_id=avc.user_id left join
snbap_dw.user_app_30_cnt_time ct on fv.user_id=ct.user_id;
3、综合指标
最近一次访问的 ip
最近一次访问的城市
最近一次访问的省份
create table snbap_dm.user_last_visit as
with
t1 as (select distinct * from
(select user_id,log_time time, visit_ip,province,city,rank() over(partition by user_id order by log_time desc) r
from snbap_dw.user_app_click_partition) app
where r=1),
t2 as (select distinct * from
(select user_id, visit_time time, visit_ip,province,city,rank() over(partition by user_id order by visit_time desc) r
from snbap_dw.user_pc_click_partition) pc
where r=1),
t3 as (select t1.user_id,t1.time,t1.visit_ip,t1.province,t1.city from t1 union all
select t2.user_id,t2.time,t2.visit_ip,t2.province,t2.city from t2 ),
t4 as (select distinct user_id,time,visit_ip,province,city ,rank() over(partition by t3.user_id order by t3.time desc) r from t3)
select distinct * from t4 where r=1;
##简化上述代码
create table snbap_dw.user_last_visit as
with
t1 as (select user_id,log_time time, visit_ip,province,city
from snbap_dw.user_app_click_partition
union all
select user_id, visit_time time, visit_ip,province,city
from snbap_dw.user_pc_click_partition),
t2 as (select distinct user_id,time,visit_ip,province,city ,rank() over(partition by t1.user_id order by t1.time desc ) r from t1)
select distinct user_id,time,visit_ip,province,city from t2 where r=1;
第一次访问的 ip
第一次访问的城市
第一次访问的省份
create table snbap_dw.user_first_visit as
with
t1 as (select user_id,log_time time, visit_ip,province,city
from snbap_dw.user_app_click_partition
union all
select user_id, visit_time time, visit_ip,province,city
from snbap_dw.user_pc_click_partition),
t2 as (select distinct user_id,time,visit_ip,province,city ,rank() over(partition by t1.user_id order by t1.time ) r from t1)
select distinct user_id,time,visit_ip,province,city from t2 where r=1;
综合app和pc访问的总表user_visit
create table snbap_dm.user_visit as
select lv.user_id,lv.visit_ip last_ip,lv.province last_province,lv.city last_city,
fv.visit_ip first_ip ,fv.province first_province,fv.city first_city from
snbap_dw.user_last_visit lv join
snbap_dw.user_first_visit fv on lv.user_id=fv.user_id;
五、用户订单分布
- 由于数据严重倾斜,所以人为插入一些数据,以下代码我这边在hive里执行两次
INSERT INTO snbap_ods.us_order(order_no,user_id,user_name,order_money,order_type,
order_status,pay_status,pay_type,order_date)
SELECT order_no,user_id,user_name,ROUND(RAND()*100,2),ROUND(RAND()),ROUND(RAND()*2),
ROUND(RAND()),ROUND(RAND()*2),CURRENT_TIMESTAMP()
FROM snbap_ods.us_order;
- 类型代表含义:
按照用户 id 进行分组,统计订单时间以及地域的分布情况
1、订单指标
第一次下单时间,
最近一次下单时间,
首单距今时间,
尾单距今时间,
近 30 天订单数量(不含退拒),
近 30 天订单金额(不含退拒),
近 60 天订单数量(不含退拒),
近 60 天订单金额(不含退拒),
近 90 天订单数量(不含退拒),
近 90 天订单金额(不含退拒),
近 30 天订单数量(含退拒),
近 30 天订单金额(含退拒),
近 60 天购买次数(含退拒),
近 60 天购买金额(含退拒),
近 90 天购买次数(含退拒),
近 90 天购买金额(含退拒),
近 90 天的客单价(含退拒),
最大消费金额 ,
最小消费金额,
累计消费次数(不含退拒),
累计消费金额(不含退拒),
客单价
最近一次退货时间
create table snbap_dw.user_order_info as
select user_id, min(order_date) first_order_date,max(order_date) last_order_date,
datediff(current_date(),min(order_date)) first_order_days,
datediff(current_date(),max(order_date)) last_order_days,
sum(case when order_status<>2 and order_date between date_sub(current_date(),30)
and current_date() then 1 else 0 end) order_cnt_30,
sum(case when order_status<>2 and order_date between date_sub(current_date(),30)
and current_date() then order_money else 0 end) order_money_30,
sum(case when order_status<>2 and order_date between date_sub(current_date(),60)
and current_date() then 1 else 0 end) order_cnt_60,
sum(case when order_status<>2 and order_date between date_sub(current_date(),60)
and current_date() then order_money else 0 end) order_money_60,
sum(case when order_status<>2 and order_date between date_sub(current_date(),90)
and current_date() then order_money else 0 end) order_money_90,
sum(case when order_status<>2 and order_date between date_sub(current_date(),90)
and current_date() then 1 else 0 end) order_cnt_90,
sum(case when order_date between date_sub(current_date(),30)
and current_date() then 1 else 0 end) order_cnt_30_all,
sum(case when order_date between date_sub(current_date(),30)
and current_date() then order_money else 0 end) order_money_30_all,
sum(case when order_date between date_sub(current_date(),60)
and current_date() then 1 else 0 end) order_cnt_60_all,
sum(case when order_date between date_sub(current_date(),60)
and current_date() then order_money else 0 end) order_money_60_all,
sum(case when order_date between date_sub(current_date(),90)
and current_date() then 1 else 0 end) order_cnt_90_all,
sum(case when order_date between date_sub(current_date(),90)
and current_date() then order_money else 0 end) order_money_90_all,
max(order_money) max_order_money,
min(order_money) min_order_money,
sum(case when order_status<>2 then 1 else 0 end) total_order_cnt,
sum(case when order_status<>2 then order_money else 0 end) total_order_money,
sum(order_money) total_order_money_all,
max(case when order_status=2 then update_time else null end) last_retreat_time
from snbap_ods.us_order
group by user_id;
退/拒货商品数量,
退/拒货商品金额,
create table snbap_dw.dwd_user_retreat as
select i.user_id,
sum(case when order_status=2 then goods_amount else 0 end) retreat_goods_num,
sum(case when order_status=2 then cost_price else 0 end) retreat_goods_money
from snbap_ods.us_order u join snbap_ods.order_item i on u.order_id=i.order_id
group by i.user_id;
最常用收货地址,
create table snbap_dw.dwd_user_address as
select user_id,address popular_address from (
select user_id,address,rank() over(partition by user_id order by addr_cnt desc ) rn from
(select user_id,address,count(1) addr_cnt
from snbap_ods.order_delivery d join snbap_ods.us_order u on d.order_id=u.order_id group by user_id,address) a) b
where rn=1;
最常用支付方式
create table snbap_dw.dwd_user_popular_pay as
select user_id,pay_type popular_type from (
select user_id,pay_type,rank() over(partition by user_id order by pay_cnt desc) rn from
(select user_id,pay_type,count(1) pay_cnt
from snbap_ods.us_order group by user_id,pay_type) a) b
where rn=1;
订单指标终表
create table snbap_dm.user_order_info_final as
select u.*,popular_address,popular_type
from snbap_dw.user_order_info u join
snbap_dw.dwd_user_address a on u.user_id=a.user_id
join snbap_dw.dwd_user_popular_pay p on u.user_id=p.user_id
join snbap_dw.dwd_user_retreat r on u.user_id=r.user_id;
2、下单分布
学校下单总数,
公司下单总数,
家里下单总数,
create table snbap_dw.user_addr_countself as
select u.user_id,
sum(case when user_order_flag=1 then 1 else 0 end) user_school_count,
sum(case when user_order_flag=2 then 1 else 0 end) user_company_count,
sum(case when user_order_flag=3 then 1 else 0 end) user_home_count
from user_addr u join order_delivery o
on u.addr_id=o.addr_id
group by u.user_id;
凌晨下单总数,
上午下单总数,
中午下单总数,
下午下单总数,
晚上下单总数,
指标口径:凌晨:0‐05 上午:06‐12 中午:13‐15 下午 16‐20 晚上:21‐24
create table snbap_dw.user_count_timeduan as
select user_id,
sum(case when hour(order_date) between 0 and 5 then 1 else 0 end) sum1,
sum( case when hour(order_date) between 6 and 12 then 1 else 0 end ) sum2,
sum( case when hour(order_date) between 13 and 15 then 1 else 0 end ) sum3,
sum( case when hour(order_date) between 16 and 20 then 1 else 0 end ) sum4,
sum( case when hour(order_date) between 21 and 24 then 1 else 0 end ) sum5
from snbap_ods.us_order
group by user_id;
下单分布总表
create table snbap_dm.user_order_distribute as
select t.user_id,nvl(user_school_count,0),nvl(user_company_count,0),nvl(user_home_count,0),
nvl(sum1,0),nvl(sum2,0),nvl(sum3,0),nvl(sum4,0),nvl(sum5,0)
from snbap_dw.user_count_timeduan t full join
snbap_dw.user_addr_countself a
on t.user_id=a.user_id;