Hive实战

Hive实战训练

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a0XI3RtQ-1643005625659)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220123220117228.png)]

准备工作:

1)目标表分析

创建原始数据表:gulivideo_ori,gulivideo_user_ori,
创建最终表:gulivideo_orc,gulivideo_user_orc

2)创建原始数据表:

create table gulivideo_ori(
videoId string,uploader string,age int,category array<string>,length int,views int,rate float,ratings int,comments int,relatedId array<string>)
row format delimited fields terminated by "\t"
collection items terminated by "&"
stored as textfile;
create table gulivideo_user_ori(uploader string,videos int,friends int)
row format delimited fields terminated by "\t"
stored as textfile;

3)创建 orc 存储格式带 snappy 压缩的表

create table gulivideo_orc(videoId string,uploader string,age int,category array<string>,length int,
views int,rate float,ratings int,comments int,relatedId array<string>)
stored as orc
tblproperties("orc.compress"="SNAPPY");
create table gulivideo_user_orc(uploader string,videos int,friends int)
row format delimited fields terminated by "\t"
stored as orc
tblproperties("orc.compress"="SNAPPY");

4)向 ori 表插入数据

load data local inpath "/opt/module/hive/datas/guliVideo/1.txt" into table gulivideo_ori;
load data local inpath "/opt/module/hive/datas/guliVideo/user.txt" into table gulivideo_user_ori;

5)向 orc 表插入数据

insert into table gulivideo_orc select * from gulivideo_ori;
insert into table gulivideo_user_orc select * from gulivideo_user_ori;

统计视频观看数Top10

思路:使用 order by 按照 views 字段做一个全局排序即可,同时我们设置只显示前 10 条。

SELECT videoId,views
FROM gulivideo_orc
ORDER BY views DESC LIMIT 10;

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VnVTpCJM-1643005625660)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124141149171.png)]

统计视频类别热度 Top10

(1)即统计每个类别有多少个视频,显示出包含视频最多的前 10 个类别。

(2)我们需要按照类别 group by 聚合,然后 count 组内的 videoId 个数即可。

(3)因为当前表结构为:一个视频对应一个或多个类别。所以如果要 group by 类别, 需要先将类别进行列转行(展开),然后再进行 count 即可。

(4)最后按照热度排序,显示前 10 条。

SELECT t1.category_name,COUNT(t1.videoId) hot
FROM(
SELECT videoId,category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
) t1
GROUP BY t1.category_name
ORDER BY hot DESC LIMIT 10;

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yVDg5XJ6-1643005625661)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124141107934.png)]

统计出视频观看数最高的 20 个视频的所属类别以及类别包含 Top20 视频的个数

(1)先找到观看数最高的 20 个视频所属条目的所有信息,降序排列

(2)把这 20 条信息中的 category 分裂出来(列转行)

(3)最后查询视频分类名称和该分类下有多少个 Top20 的视频

select t2.category_name,count(t2.videoId) video_sum
from(select t1.videoId,category_name from(
select videoId,views,category
from gulivideo_orc
order by views desc limit 20) t1
lateral view explode(t1.category) t1_emp as category_name
) t2
group by t2.category_name;

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xOPGsKww-1643005625661)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124141013352.png)]

统计视频观看数 Top50 所关联视频的所属类别排序

SELECT t6.category_name,t6.video_sum,
rank() over(ORDER BY t6.video_sum DESC ) rk
FROM(SELECT t5.category_name,COUNT(t5.relatedid_id) video_sum
FROM(SELECT t4.relatedid_id,category_name
FROM(SELECT t2.relatedid_id,t3.category
FROM(SELECT relatedid_id
FROM(SELECT videoId,views,relatedid
FROM gulivideo_orc
ORDER BY views DESC LIMIT 50)t1
lateral VIEW explode(t1.relatedid) t1_tmp AS relatedid_id)t2
JOIN gulivideo_orc t3 ON t2.relatedid_id = t3.videoId)t4
lateral VIEW explode(t4.category) t4_tmp AS category_name)t5
GROUP BY t5.category_name
ORDER BY video_sum DESC)t6

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gnzXy6yI-1643005625662)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124141620264.png)]

统计每个类别中的视频热度 Top10,以 Music 为例

(1)要想统计 Music 类别中的视频热度 Top10,需要先找到 Music 类别,那么就需要将 category 展开,所以可以创建一张表用于存放 categoryId 展开的数据。

(2)向 category 展开的表中插入数据。

(3)统计对应类别(Music)中的视频热度。

SELECT t1.videoId,t1.views,t1.category_name
FROM(SELECT videoId,views,category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
WHERE t1.category_name = "Music"
ORDER BY t1.views DESC LIMIT 10;

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LBmunbX5-1643005625663)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124141922184.png)]

统计每个类别视频观看数 Top10

SELECT t2.videoId,t2.views,t2.category_name,t2.rk
FROM(SELECT t1.videoId,t1.views,t1.category_name,rank() over(
PARTITION BY t1.category_name ORDER BY t1.views DESC ) rk
FROM(SELECT videoId,views,category_name
FROM gulivideo_orc
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
)t2
WHERE t2.rk <=10

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5N04aYD0-1643005625664)(C:\Users\Admin\AppData\Roaming\Typora\typora-user-images\image-20220124142225046.png)]

统计上传视频最多的用户 Top10 以及他们上传的视频观看次数在前 20 的视频

(1)求出上传视频最多的 10 个用户

(2)关联 gulivideo_orc 表,求出这 10 个用户上传的所有的视频,按照观看数取前 20

SELECT t2.videoId,t2.views,t2.uploader
FROM(SELECT uploader,videos
FROM gulivideo_user_orc
ORDER BY videos DESC LIMIT 10
)t1
JOIN gulivideo_orc t2 ON t1.uploader = t2.uploader
ORDER BY t2.views DESC;

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值