9(10)6.1 DWS层10

6.1 DWS层
目标:统计当日、当周、当月活动的每个设备明细
6.1.1 每日活跃设备明细

1)建表语句
hive (gmall)>
drop table if exists dws_uv_detail_day;
create external table dws_uv_detail_day
(
mid_id string COMMENT ‘设备唯一标识’,
user_id string COMMENT ‘用户标识’,
version_code string COMMENT ‘程序版本号’,
version_name string COMMENT ‘程序版本名’,
lang string COMMENT ‘系统语言’,
source string COMMENT ‘渠道号’,
os string COMMENT ‘安卓系统版本’,
area string COMMENT ‘区域’,
model string COMMENT ‘手机型号’,
brand string COMMENT ‘手机品牌’,
sdk_version string COMMENT ‘sdkVersion’,
gmail string COMMENT ‘gmail’,
height_width string COMMENT ‘屏幕宽高’,
app_time string COMMENT ‘客户端日志产生时的时间’,
network string COMMENT ‘网络模式’,
lng string COMMENT ‘经度’,
lat string COMMENT ‘纬度’
)
partitioned by(dt string)
stored as parquet
location ‘/warehouse/gmall/dws/dws_uv_detail_day’
;
2)数据导入
以用户单日访问为key进行聚合,如果某个用户在一天中使用了两种操作系统、两个系统版本、多个地区,登录不同账号,只取其中之一
hive (gmall)>
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_day
partition(dt=‘2019-02-10’)
select
mid_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_code)) version_code,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(lang))lang,
concat_ws(’|’, collect_set(source)) source,
concat_ws(’|’, collect_set(os)) os,
concat_ws(’|’, collect_set(area)) area,
concat_ws(’|’, collect_set(model)) model,
concat_ws(’|’, collect_set(brand)) brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(gmail)) gmail,
concat_ws(’|’, collect_set(height_width)) height_width,
concat_ws(’|’, collect_set(app_time)) app_time,
concat_ws(’|’, collect_set(network)) network,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat
from dwd_start_log
where dt=‘2019-02-10’
group by mid_id;
3)查询导入结果
hive (gmall)> select * from dws_uv_detail_day limit 1;
hive (gmall)> select count(*) from dws_uv_detail_day;
4)思考:不同渠道来源的每日活跃数统计怎么计算?

6.1.2 每周活跃设备明细

根据日用户访问明细,获得周用户访问明细。
1)建表语句
hive (gmall)>
drop table if exists dws_uv_detail_wk;
create external table dws_uv_detail_wk(
mid_id string COMMENT ‘设备唯一标识’,
user_id string COMMENT ‘用户标识’,
version_code string COMMENT ‘程序版本号’,
version_name string COMMENT ‘程序版本名’,
lang string COMMENT ‘系统语言’,
source string COMMENT ‘渠道号’,
os string COMMENT ‘安卓系统版本’,
area string COMMENT ‘区域’,
model string COMMENT ‘手机型号’,
brand string COMMENT ‘手机品牌’,
sdk_version string COMMENT ‘sdkVersion’,
gmail string COMMENT ‘gmail’,
height_width string COMMENT ‘屏幕宽高’,
app_time string COMMENT ‘客户端日志产生时的时间’,
network string COMMENT ‘网络模式’,
lng string COMMENT ‘经度’,
lat string COMMENT ‘纬度’,
monday_date string COMMENT ‘周一日期’,
sunday_date string COMMENT ‘周日日期’
) COMMENT ‘活跃用户按周明细’
PARTITIONED BY (wk_dt string)
stored as parquet
location ‘/warehouse/gmall/dws/dws_uv_detail_wk/’
;
2)数据导入
hive (gmall)>
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_wk partition(wk_dt)
select
mid_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_code)) version_code,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(lang)) lang,
concat_ws(’|’, collect_set(source)) source,
concat_ws(’|’, collect_set(os)) os,
concat_ws(’|’, collect_set(area)) area,
concat_ws(’|’, collect_set(model)) model,
concat_ws(’|’, collect_set(brand)) brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(gmail)) gmail,
concat_ws(’|’, collect_set(height_width)) height_width,
concat_ws(’|’, collect_set(app_time)) app_time,
concat_ws(’|’, collect_set(network)) network,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
date_add(next_day(‘2019-02-10’,‘MO’),-7),
date_add(next_day(‘2019-02-10’,‘MO’),-1),
concat(date_add( next_day(‘2019-02-10’,‘MO’),-7), ‘_’ , date_add(next_day(‘2019-02-10’,‘MO’),-1)
)
from dws_uv_detail_day
where dt>=date_add(next_day(‘2019-02-10’,‘MO’),-7) and dt<=date_add(next_day(‘2019-02-10’,‘MO’),-1)
group by mid_id;
3)查询导入结果
hive (gmall)> select * from dws_uv_detail_wk limit 1;
hive (gmall)> select count(*) from dws_uv_detail_wk;

6.1.3 每月活跃设备明细

1)建表语句
hive (gmall)>
drop table if exists dws_uv_detail_mn;

create external table dws_uv_detail_mn(
mid_id string COMMENT ‘设备唯一标识’,
user_id string COMMENT ‘用户标识’,
version_code string COMMENT ‘程序版本号’,
version_name string COMMENT ‘程序版本名’,
lang string COMMENT ‘系统语言’,
source string COMMENT ‘渠道号’,
os string COMMENT ‘安卓系统版本’,
area string COMMENT ‘区域’,
model string COMMENT ‘手机型号’,
brand string COMMENT ‘手机品牌’,
sdk_version string COMMENT ‘sdkVersion’,
gmail string COMMENT ‘gmail’,
height_width string COMMENT ‘屏幕宽高’,
app_time string COMMENT ‘客户端日志产生时的时间’,
network string COMMENT ‘网络模式’,
lng string COMMENT ‘经度’,
lat string COMMENT ‘纬度’
) COMMENT ‘活跃用户按月明细’
PARTITIONED BY (mn string)
stored as parquet
location ‘/warehouse/gmall/dws/dws_uv_detail_mn/’
;
2)数据导入
hive (gmall)>
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table dws_uv_detail_mn partition(mn)
select
mid_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_code)) version_code,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(lang)) lang,
concat_ws(’|’, collect_set(source)) source,
concat_ws(’|’, collect_set(os)) os,
concat_ws(’|’, collect_set(area)) area,
concat_ws(’|’, collect_set(model)) model,
concat_ws(’|’, collect_set(brand)) brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(gmail)) gmail,
concat_ws(’|’, collect_set(height_width)) height_width,
concat_ws(’|’, collect_set(app_time)) app_time,
concat_ws(’|’, collect_set(network)) network,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
date_format(‘2019-02-10’,‘yyyy-MM’)
from dws_uv_detail_day
where date_format(dt,‘yyyy-MM’) = date_format(‘2019-02-10’,‘yyyy-MM’)
group by mid_id;
3)查询导入结果
hive (gmall)> select * from dws_uv_detail_mn limit 1;
hive (gmall)> select count(*) from dws_uv_detail_mn ;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值