CC00021.pbpositions——|Hadoop&PB级数仓.V05|——|PB数仓.v05|会员活跃度分析|DWD建表&加载数据|

一、DWD层建表和数据加载
### --- DWD层建表和数据加载

~~~     主要任务:ODS(包含json串) => DWD
~~~     json数据解析,丢弃无用数据(数据清洗),保留有效信息,并将数据展开,形成每日启动明细表。
2021-09-28 18:19:32.966 [main] INFO com.yanqi.ecommerce.AppStart - 
{"app_active":{"name":"app_active","json":
{"entry":"1","action":"1","error_code":"0"},"time":1596309585861},"attr":
{"area":"绍兴","uid":"2F10092A10","app_v":"1.1.16","event_type":"common","device_id":"1FB872-9A10010","os_type":"3.0","channel":"ML",
"language":"chinese","brand":"Huawei-2"}}
2020-07-21
二、DWD层表创建和加载数据
### --- 创建DWD层表:表的格式:parquet、分区表
~~~     # 语法

use dwd;
drop table if exists dwd.dwd_start_log;
CREATE TABLE dwd.dwd_start_log(
`device_id` string,
`area` string,
`uid` string,
`app_v` string,
`event_type` string,
`os_type` string,
`channel` string,
`language` string,
`brand` string,
`entry` string,
`action` string,
`error_code` string
)
PARTITIONED BY (dt string)
STORED AS parquet;
hive (default)> use dwd;

hive (dwd)> CREATE TABLE dwd.dwd_start_log(
          > `device_id` string,
          > `area` string,
          > `uid` string,
          > `app_v` string,
          > `event_type` string,
          > `os_type` string,
          > `channel` string,
          > `language` string,
          > `brand` string,
          > `entry` string,
          > `action` string,
          > `error_code` string
          > )
          > PARTITIONED BY (dt string)
          > STORED AS parquet;

hive (dwd)> show tables;
dwd_start_log
二、加载DWD层数据
### --- 加载DWD层数据

[root@hadoop02 ~]# vim /data/yanqidw/script/member_active/dwd_load_start.sh
#!/bin/bash
source /etc/profile

# 可以输入日期;如果未输入日期取昨天的时间
if [ -n "$1" ]
then
    do_date=$1
else
    do_date=`date -d "-1 day" +%F`
fi

# 定义要执行的SQL
sql="with tmp as
( 
select split(str, ' ')[7] line 
from ods.ods_start_log where dt='$do_date'
)
insert overwrite table dwd.dwd_start_log partition(dt='$do_date')
select get_json_object(line, '$.attr.device_id'),
get_json_object(line, '$.attr.area'),
get_json_object(line, '$.attr.uid'),
get_json_object(line, '$.attr.app_v'),
get_json_object(line, '$.attr.event_type'),
get_json_object(line, '$.attr.os_type'),
get_json_object(line, '$.attr.channel'),
get_json_object(line, '$.attr.language'),
get_json_object(line, '$.attr.brand'),
get_json_object(line, '$.app_active.json.entry'),
get_json_object(line, '$.app_active.json.action'),
get_json_object(line, '$.app_active.json.error_code')
from tmp;
"
hive -e "$sql"
### --- 为脚本授予执行权限

[root@hadoop02 ~]# chmod +x /data/yanqidw/script/member_active/dwd_load_start.sh
三、执行该脚本,验证结果
### --- 执行该脚本,验证结果
~~~     # 加载数据到ODS分区下
~~~     # 确保ods分区下是有数据存在

hive (ods)> show partitions ods.ods_start_log;
OK
partition
dt=2021-09-28
hive (ods)> select * from ods.ods_start_log limit 10;
~~~     # 执行该脚本

[root@hadoop02 ~]# /data/yanqidw/script/member_active/dwd_load_start.sh 
~~~输出参数
Total MapReduce CPU Time Spent: 0 msec
OK
_c0 _c1 _c2 _c3 _c4 _c5 _c6 _c7 _c8 _c9 _c10    _c11
~~~     # 查看数据是否通过ODS加载到DWD下

hive (dwd)> show partitions dwd_start_log;
OK
partition
dt=2021-09-28
hive (dwd)> select * from dwd.dwd_start_log limit 5;
OK
dwd_start_log.device_id dwd_start_log.area  dwd_start_log.uid   dwd_start_log.app_v dwd_start_log.event_type    dwd_start_log.os_type   dwd_start_log.channel   dwd_start_log.language  dwd_start_log.brand dwd_start_log.entry dwd_start_log.action    dwd_start_log.error_code    dwd_start_log.dt
1FB872-9A1001   连云港 2F10092A1   1.1.8   common  0.43    PN  chinese iphone-7    1   0   0   2021-09-28
1FB872-9A1002   金昌  2F10092A2   1.1.5   common  5.8.7   OF  chinese xiaomi-0    1   1   0   2021-09-28
1FB872-9A1003   句容  2F10092A3   1.1.16  common  0.99    YI  chinese iphone-3    1   0   0   2021-09-28
1FB872-9A1004   肇庆  2F10092A4   1.1.6   common  6.6.2   CD  chinese xiaomi-6    1   1   0   2021-09-28
1FB872-9A1005   武汉  2F10092A5   1.1.2   common  6.9 WG  chinese xiaomi-1    1   1   0   2021-09-28
### --- 日志文件 =》 Flume =》 HDFS =》 ODS =》 DWD ODS =》 DWD;

~~~     json数据的解析;数据清洗
~~~     下一步任务:DWD(会员的每日启动信息明细) => DWS(如何建表,如何加载数据)
~~~     活跃会员 ===> 新增会员 ===> 会员留存
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

yanqi_vip

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值