广告业务需求分析
-
用户事件日志数据如下所示:
{ "lagou_event": [{ "name": "goods_detail_loading", "json": { "entry": "3", "goodsid": "0", "loading_time": "80", "action": "4", "staytime": "68", "showtype": "4" }, "time": 1596225273755 }, { "name": "loading", "json": { "loading_time": "18", "action": "1", "loading_type": "2", "type": "3" }, "time": 1596231657803 }, { "name": "ad", "json": { "duration": "17", "ad_action": "0", "shop_id": "786", "event_type": "ad", "ad_type": "4", "show_style": "1", "product_id": "2772", "place": "placeindex_left", "sort": "0" }, "time": 1596278404415 }, { "name": "favorites", "json": { "course_id": 0, "id": 0, "userid": 0 }, "time": 1596239532527 }, { "name": "praise", "json": { "id": 2, "type": 3, "add_time": "1596258672095", "userid": 8, "target": 6 }, "time": 1596274343507 }], "attr": { "area": "拉萨", "uid": "2F10092A86", "app_v": "1.1.12", "event_type": "common", "device_id": "1FB872-9A10086", "os_type": "4.1", "channel": "KS", "language": "chinese", "brand": "xiaomi-2" } }
-
用户事件日志数据中的信息包括:
- 商品详情页加载:goods_detail_loading
- 商品列表:loading
- 消息通知:notification
- 商品评论:comment
- 收藏:favorites
- 点赞:praise
- 广告:ad
- action。用户行为;0 曝光;1 曝光后点击;2 购买
- duration。停留时长
- shop_id。商家id
- event_type。“ad”
- ad_type。格式类型;1 JPG;2 PNG;3 GIF;4 SWF
- show_style。显示风格,0 静态图;1 动态图
- product_id。产品id
- place。广告位置;首页=1,左侧=2,右侧=3,列表页=4
- sort。排序位置
广告业务需求指标:
- 点击次数统计(分时统计):
- 曝光次数、不同用户id数、不同用户数
- 点击次数、不同用户id数、不同用户数
- 购买次数、不同用户id数、不同用户数
- 转化率-漏斗分析
- 点击率 = 点击次数 / 曝光次数
- 购买率 = 购买次数 / 点击次数
- 活动曝光效果评估:
- 行为(曝光、点击、购买)、时间段、广告位、产品,统计对应的次数
- 时间段、广告位、商品,曝光次数最多的前N个
广告业务ODS层建表及数据加载
-
ODS创建广告业务明细数据表【ods_log_event】,
CREATE EXTERNAL TABLE ods.ods_log_event(`str` string) PARTITIONED BY (`dt` string) STORED AS TEXTFILE LOCATION '/user/data/logs/event';
-
【ods_load_event_log.sh】
#!/bin/bash source /etc/profile if [ -n "$1" ] then do_date=$1 else do_date=`date -d "-1 day" +%F` fi sql=" alter table ods.ods_log_event add partition (dt='$do_date'); " hive -e "$sql"
-
此时【ods_log_event】表中数据为:
hive (ods)> select * from ods_log_event limit 1; OK ods_log_event.str ods_log_event.dt 2020-07-21 16:54:41.491 [main] INFO com.lagou.ecommerce.AppEvent - {"lagou_event":[{"name":"favorites","json":{"course_id":2,"id":0,"userid":0},"time":1595260800000},{"name":"praise","json":{"id":5,"type":1,"add_time":"1631749438222","userid":3,"target":4},"time":1595260800000}],"attr":{"area":"常州","uid":"2F10092A2","app_v":"1.1.18","event_type":"common","device_id":"1FB872-9A1002","os_type":"7.7","channel":"LJ","language":"chinese","brand":"xiaomi-3"}} 2020-07-21 Time taken: 10.121 seconds, Fetched: 1 row(s)
广告业务DWD层建表及数据加载
- ODS层的【ods_log_event】表中记录用户事件日志数据信息,事件的主要信息在json串中(json数组),公共信息在另外一个json串中;
- DWD层数据来源于ODS层,即ODS => 解析json,从json串中,提取jsonArray数据;将公共信息从json串中解析出来 => 所有事件的明细
- 所有事件的明细包含:分区,事件(json串),公共信息字段,然后从所有事件的明细 => 广告json串解析 => 广告事件的明细
- 广告事件的明细包含:分区,广告信息字段,公共信息字段
DWD层创建所有事件明细表【dwd_event_log】以及广告点击明细
表【dwd_ad】
-- 所有事件明细
drop table if exists dwd.dwd_event_log;
CREATE EXTERNAL TABLE dwd.dwd_event_log(
`device_id` string,
`uid` string,
`app_v` string,
`os_type` string,
`event_type` string,
`language` string,
`channel` string,
`area` string,
`brand` string,
`name` string,
`event_json` string,
`report_time` string)
PARTITIONED BY (`dt` string)
stored as parquet;
-- 广告点击明细
drop table if exists dwd.dwd_ad;
CREATE TABLE dwd.dwd_ad(
`device_id` string,
`uid` string,
`app_v` string,
`os_type` string,
`event_type` string,
`language` string,
`channel` string,
`area` string,
`brand` string,
`report_time` string,
`duration` int,
`ad_action` int,
`shop_id` int,
`ad_type` int,
`show_style` smallint,
`product_id` int,
`place` string,
`sort` int,
`hour` string
)
PARTITIONED BY (`dt` string)
stored as parquet;
DWD层数据加载全部的事件日志【dwd_load_event_log.sh】
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
use dwd;
add jar /data/lagoudw/jars/cn.lagou.dw-1.0-SNAPSHOT-jar-with-dependencies.jar;
create temporary function json_array as 'cn.lagou.dw.hive.udf.ParseJsonArray';
with tmp_start as
(
select split(str, ' ')[7] as line
from ods.ods_log_event
where dt='$do_date'
)
insert overwrite table dwd.dwd_event_log
PARTITION (dt='$do_date')
select
device_id,
uid,
app_v,
os_type,
event_type,
language,
channel,
area,
brand,
get_json_object(k,'$.name') as name,
get_json_object(k,'$.json') as json,
get_json_object(k,'$.time') as time
from
(
select
get_json_object(line,'$.attr.device_id') as device_id,
get_json_object(line,'$.attr.uid') as uid,
get_json_object(line,'$.attr.app_v') as app_v,
get_json_object(line,'$.attr.os_type') as os_type,
get_json_object(line,'$.attr.event_type') as event_type,
get_json_object(line,'$.attr.language') as language,
get_json_object(line,'$.attr.channel') as channel,
get_json_object(line,'$.attr.area') as area,
get_json_object(line,'$.attr.brand') as brand,
get_json_object(line,'$.lagou_event') as lagou_event
from tmp_start
) A lateral view explode(json_array(lagou_event)) B as k
"
hive -e "$sql"
DWD层从全部的事件日志中获取广告事件详细信【dwd_load_ad_log.sh】:
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dwd.dwd_ad
PARTITION (dt='$do_date')
select
device_id,
uid,
app_v,
os_type,
event_type,
language,
channel,
area,
brand,
report_time,
get_json_object(event_json,'$.duration') ,
get_json_object(event_json,'$.ad_action') ,
get_json_object(event_json,'$.shop_id') ,
get_json_object(event_json,'$.ad_type'),
get_json_object(event_json,'$.show_style'),
get_json_object(event_json,'$.product_id'),
get_json_object(event_json,'$.place'),
get_json_object(event_json,'$.sort'),
from_unixtime(ceil(report_time/1000), 'HH')
from dwd.dwd_event_log
where dt='$do_date' and name='ad';
"
hive -e "$sql"
广告点击次数分析
- 广告点击次数需求分析,
- 广告的详细数据为:
- action。用户行为;0 曝光;1 曝光后点击;2 购买
- duration。停留时长
- shop_id。商家id
- event_type。“ad”
- ad_type。格式类型;1 JPG;2 PNG;3 GIF;4 SWF
- show_style。显示风格,0 静态图;1 动态图
- product_id。产品id
- place。广告位置;首页=1,左侧=2,右侧=3,列表页=4
- sort。排序位置
- 分时统计:
- 曝光次数、不同用户id数(公共信息中的uid)、不同用户数(公共信息中的device_id)
- 点击次数、不同用户id数、不同用户数(device_id)
- 购买次数、不同用户id数、不同用户数(device_id)
创建ADS层表【ads_ad_show】
create table ads.ads_ad_show(
cnt bigint,
u_cnt bigint,
device_cnt bigint,
ad_action tinyint,
hour string
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
加载ADS层数据【ads_load_ad_show.sh】
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table ads.ads_ad_show
partition (dt='$do_date')
select count(1),
count(distinct uid),
count(distinct device_id),
ad_action,
hour
from dwd.dwd_ad
where dt='$do_date'
group by ad_action, hour
"
hive -e "$sql"
点击率购买率
点击率购买率需求分析
分时统计:
- 点击率 = 点击次数 / 曝光次数
- 购买率 = 购买次数 / 点击次数
创建ADS层表点击率购买率表【ads_ad_show_rate】
create table ads.ads_ad_show_rate(
hour string,
click_rate double,
buy_rate double
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
加载ADS层数据
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
with tmp as(
select max(case when ad_action='0' then cnt end) show_cnt,
max(case when ad_action='1' then cnt end) click_cnt,
max(case when ad_action='2' then cnt end) buy_cnt,
hour
from ads.ads_ad_show
where dt='$do_date'
group by hour
)
insert overwrite table ads.ads_ad_show_rate
partition (dt='$do_date')
select hour,
click_cnt / show_cnt as click_rate,
buy_cnt / click_cnt as buy_rate
from tmp;
"
hive -e "$sql"
广告效果分析
广告效果需求分析:
- 活动曝光效果评估:
- 效果表:行为(曝光、点击、购买)、时间段、广告位、商品,统计对应的次数
- 排名表:时间段、广告位、商品,曝光次数最多的前100个
创建ADS层广告效果表【ads_ad_show_place】,广告排名表【ads_ad_show_place_window】
create table ads.ads_ad_show_place(
ad_action tinyint,
hour string,
place string,
product_id int,
cnt bigint
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
create table ads.ads_ad_show_place_window(
hour string,
place string,
product_id int,
cnt bigint,
rank int
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
广告效果加载ADS层数据【ads_ad_show_place.sh】
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table ads.ads_ad_show_place partition (dt='$do_date')
select ad_action,
hour,
place,
product_id,
count(1)
from dwd.dwd_ad
where dt='$do_date'
group by ad_action, hour, place, product_id;
"
hive -e "$sql"
广告效果加载ADS层数据【ads_ad_show_place_window.sh】
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table ads.ads_ad_show_place_window
partition (dt='$do_date')
select *
from (
select hour,
place,
product_id,
cnt,
row_number() over (partition by hour, place,
product_id order by cnt desc) rank
from ads.ads_ad_show_place
where dt='$do_date' and ad_action='0'
) t
where rank <= 100
"
hive -e "$sql"