用户行为表:t_user_video_action_d
分区:ds(格式 yyyyMMdd)
主键:user_id、video_id
含义:一个 user 对一个视频的所有行为聚合,每天增量
字段名 | 字段含义 | 类型 |
user_id | 用户 | string |
video_id | 视频 id | string |
expose_cnt | 曝光次数 | int |
like_cnt | 点赞次数 | int |
视频表:t_video_d
分区:ds(格式 yyyyMMdd)
主键:video_id
含义:当天全量视频数据
字段名 | 字段含义 | 类型 | 枚举 |
video_id | 视频 id | string | |
video_type | 视频类型 | string | 娱乐、新闻、搞笑 |
video_user_id | 视频创建者 user_id | string | |
video_create_time | 视频创建时间 | bigint |
作者表:t_video_user_d
分区:ds(格式 yyyyMMdd)
主键:video_user_id
含义:当天全量视频创建者数据
字段名 | 字段含义 | 类型 | 枚举 |
video_user_id | 视频创建者user_id | string | |
video_user_name | 名称 | string | |
video_user_type | 视频创建者类型 | string | 娱乐、新闻、搞笑 |
1、需求方需要视频号搞笑类型视频的曝光点赞时长等数据,请提供一张 ads 表。 搞笑类型视频定义:视频类型为搞笑或者视频创建者类型为搞笑,需要产出字段:视频 id,视频创建者 user_id,视频创建者名称、当天曝光次数、当天点赞次数、近 30 天曝光次数、近 30 天点赞次数
-- 创建ads表
CREATE TABLE IF NOT EXISTS ads_video_funny_stats (
video_id STRING,
video_user_id STRING,
video_user_name STRING,
expose_cnt_today INT,
like_cnt_today INT,
expose_cnt_30_days INT,
like_cnt_30_days INT
) PARTITIONED BY (ds STRING);
-- 插入数据到ads表
INSERT OVERWRITE TABLE ads_video_funny_stats PARTITION (ds)
SELECT
a.video_id,
b.video_user_id,
b.video_user_name,
SUM(a.expose_cnt) AS expose_cnt_today,
SUM(a.like_cnt) AS like_cnt_today,
SUM(a.expose_cnt) OVER (PARTITION BY a.video_id ORDER BY a.ds ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS expose_cnt_30_days,
SUM(a.like_cnt) OVER (PARTITION BY a.video_id ORDER BY a.ds ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS like_cnt_30_days,
a.ds
FROM t_user_video_action_d a
JOIN t_video_d v ON a.video_id = v.video_id and a.ds=v.ds
JOIN t_video_user_d b ON v.video_user_id = b.video_user_id and v.ds=b.ds
WHERE (v.video_type = '搞笑' OR b.video_user_type = '搞笑')
GROUP BY a.video_id, b.video_user_id, b.video_user_name, a.ds;
2、需要输出一张中间 dws 层表:要求能够查询每个视频创建者 user_id 最近发布的 3 个 视频,在过去一周内的曝光,点赞数据 需要产出字段:视频创建者 user_id,曝光用户数,点赞用户数,曝光次数、点赞次数
-- 创建dws表
CREATE TABLE IF NOT EXISTS dws_video_creator_recent_stats (
video_user_id STRING,
expose_user_count INT,
like_user_count INT,
expose_cnt INT,
like_cnt INT
) PARTITIONED BY (ds STRING);
-- 插入数据到dws表
WITH recent_videos AS (
SELECT
v.video_user_id,
a.user_id,
a.video_id,
a.expose_cnt,
a.like_cnt,
ROW_NUMBER() OVER (PARTITION BY v.video_user_id ORDER BY v.video_create_time DESC) AS rn
FROM t_user_video_action_d a
JOIN t_video_d v ON a.video_id = v.video_id and a.ds=v.ds
WHERE a.ds BETWEEN DATE_SUB(CURRENT_DATE(), 6) AND CURRENT_DATE()
),
top_3_videos AS (
SELECT
video_user_id,
user_id,
video_id,
expose_cnt,
like_cnt
FROM recent_videos
WHERE rn <= 3
)
INSERT OVERWRITE TABLE dws_video_creator_recent_stats PARTITION (ds)
SELECT
video_user_id,
COUNT(DISTINCT user_id) AS expose_user_count,
COUNT(DISTINCT CASE WHEN like_cnt > 0 THEN user_id END) AS like_user_count,
SUM(expose_cnt) AS expose_cnt,
SUM(like_cnt) AS like_cnt,
CURRENT_DATE() AS ds
FROM top_3_videos
GROUP BY video_user_id;