3.5、准备工作
3.5.1、创建表
创建表:youtube_ori,youtube_user_ori,
创建表:youtube_orc,youtube_user_orc
youtube_ori:
create table youtube_ori( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int, relatedId array<string>) row format delimited fields terminated by "\t" collection items terminated by "&" stored as textfile; |
youtube_user_ori:
create table youtube_user_ori( uploader string, videos int, friends int) clustered by (uploader) into 24 buckets row format delimited fields terminated by "\t" stored as textfile; |
然后把原始数据插入到orc表中
youtube_orc:
create table youtube_orc( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int, relatedId array<string>) clustered by (uploader) into 8 buckets row format delimited fields terminated by "\t" collection items terminated by "&" stored as orc; |
youtube_user_orc:
create table youtube_user_orc( uploader string, videos int, friends int) clustered by (uploader) into 24 buckets row format delimited fields terminated by "\t" stored as orc; |
3.5.2、导入ETL后的数据
youtube_ori:
load da |
youtube_user_ori:
load da |
3.5.3、向ORC表插入数据
youtube_orc: