04具体分析

1.统计视频观看数Top10

思路:
	使用order by按照views字段做一个全局排序即可,同时我们设置只显示前10条。
	
代码:
	select videoId, uploader, age, category, length, views, rate, ratings, comments 
	from gulivideo_orc order by views desc limit 10;

2.统计视频类别热度Top10

思路:
	(1)即统计每个类别有多少个视频,显示出包含视频最多的前10个类别。
	(2)我们需要按照类别group by聚合,然后count组内的videoId个数即可。
	(3)因为当前表结构为:一个视频对应一个或多个类别。所以如果要group by类别,需要先将类别进行列转行(展开),然后再进行count即可。
	(4)最后按照热度排序,显示前10条。
	
代码:
	select category_name as category, count(t1.videoId) as hot from 
	(select videoId,category_name from 
			gulivideo_orc lateral view explode(category) t_catetory as category_name) t1 
	group by t1.category_name order by hot desc limit 10;		

3 统计出视频观看数最高的20个视频的所属类别以及类别包含Top20视频的个数

思路:
	(1)先找到观看数最高的20个视频所属条目的所有信息,降序排列
	(2)把这20条信息中的category分裂出来(列转行)
	(3)最后查询视频分类名称和该分类下有多少个Top20的视频
	
代码:
	select category_name as category,count(t2.videoId) as hot_with_views from 
	(select videoId, category_name from
		(select * from gulivideo_orc order by views desc limit 20) t1
			lateral view explode(category) t_catetory as category_name) t2 
	group by category_name order by hot_with_views desc;

4.统计视频观看数Top50所关联视频的所属类别Rank

思路:
	(1)查询出观看数最多的前50个视频的所有信息(当然包含了每个视频对应的关联视频),记为临时表t1
		t1:观看数前50的视频
			select * from gulivideo_orc order by views desc limit 50;
	(2)将找到的50条视频信息的相关视频relatedId列转行,记为临时表t2
		t2:将相关视频的id进行列转行操作
			select explode(relatedId) as videoId from t1;
	(3)将相关视频的id和gulivideo_orc表进行inner join操作
		t5:得到两列数据,一列是category,一列是之前查询出来的相关视频id
			(select distinct(t2.videoId), t3.category from t2 inner join 
				gulivideo_orc t3 on t2.videoId = t3.videoId) t4 lateral view explode(category) t_catetory as category_name;
	(4)按照视频类别进行分组,统计每组视频个数,然后排行

代码:
	select category_name as category,count(t5.videoId) as hot from 
		(select videoId,category_name from 
			(select distinct(t2.videoId),t3.category from
				(select explode(relatedId) as videoId from 
					(elect * from gulivideo_orc order by views desc limit 50) t1) t2 
				inner join 
			gulivideo_orc t3 on t2.videoId = t3.videoId) t4 lateral view explode(category) t_catetory as category_name) t5
				group by category_name order by hot desc;

5.统计每个类别中的视频热度Top10,以Music为例

思路:
	(1)要想统计Music类别中的视频热度Top10,需要先找到Music类别,那么就需要将category展开,所以可以创建一张表用于存放categoryId展开的数据。
	(2)向category展开的表中插入数据。
	(3)统计对应类别(Music)中的视频热度。

代码:
	创建表类别表:
		create table gulivideo_category(
			videoId string,uploader string,age int,categoryId string,length int,
			views int,rate float,ratings int,comments int,relatedId array<string>)
		row format delimited 
		fields terminated by "\t" 
		collection items terminated by "&" 
		stored as orc;
		
	向类别表中插入数据:
		insert into table gulivideo_category  
			select videoId,uploader,age,categoryId,
				length,views,rate,ratings,comments,relatedId 
			from 
				gulivideo_orc lateral view explode(category) catetory as categoryId;
				
	统计Music类别的Top10(也可以统计其他)
		select videoId, views from gulivideo_category 
		where categoryId = "Music" order by views desc limit 10;

6.统计每个类别中视频流量Top10,以Music为例

思路:
	(1)创建视频类别展开表(categoryId列转行后的表)
	(2)按照ratings排序即可
	
代码:
	select videoId,views,ratings from gulivideo_category 
		where categoryId = "Music" order by ratings desc limit 10;

7.统计上传视频最多的用户Top10以及他们上传的观看次数在前20的视频

思路:
	(1)先找到上传视频最多的10个用户的用户信息
		select * from gulivideo_user_orc order by videos desc limit 10;
	(2)通过uploader字段与gulivideo_orc表进行join,得到的信息按照views观看次数进行排序即可。

代码:
	select t2.videoId,t2.views,t2.ratings,t1.videos,t1.friends 
		from (select * from gulivideo_user_orc order by videos desc limit 10) t1 
			join gulivideo_orc t2 
		on t1.uploader = t2.uploader order by views desc limit 20;

8.统计每个类别视频观看数Top10

思路:
	(1)先得到categoryId展开的表数据
	(2)子查询按照categoryId进行分区,然后分区内排序,并生成递增数字,该递增数字这一列起名为rank列
	(3)通过子查询产生的临时表,查询rank值小于等于10的数据行即可。

代码:
	select t1.* from (select videoId,categoryId,views,
		row_number() over(partition by categoryId order by views desc) 
			rank from gulivideo_category) t1 where rank <= 10;	
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

hao难懂

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值