目录
一、项目需求
1.统计视频观看数 Top10
2.统计视频类别热度Top10
3.统计出视频观看数最高的20个视频的所属类别以及类别包含这Top20视频的个数
4.统计视频观看数Top50所关联视频的所属类别的热度排名
5.统计每个类别中的视频热度Top10,以 Music为例
6.统计每个类别中视频流量 Top10 ,以 Music为例
7.统计上传视频最多的用户Top10以及他们上传的观看次数在前20的视频
8.统计每个类别视频观看数Top10(分组取topN)
二、数据介绍
1.视频数据表:
2.用户表:
三、创建表结构
1.视频表:
create table youtube_ori(
videoId string,
uploader string,
age int,
category array<string>,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array<string>)
row format delimited
fields terminated by "\t"
collection items terminated by "&" ;
create table youtube_orc(
videoId string,
uploader string,
age int,
category array<string>,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array<string>)
clustered by (uploader) into 8 buckets
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as orc;
2.用户表:
create table youtube_user_ori(
uploader string,
videos int,
friends int)
clustered by (uploader) into 24 buckets
row format delimited fields terminated by "\t";