collect_set函数
1. 建表语句
drop table if exists stud;
create table stud (name string,area string ,course string,score int);
2.插入数据
insert into table stud values ('zhangsan','baijin','math',88);
insert into table stud values ('lisi','shanghai','chese',90);
insert into table stud values ('wangwu','tianjin','eningsh',95);
insert into table stud values ('zhaoliu','baijin','eningsh',90);
insert into table stud values ('tan7','shanghai','chese',80);
3. 查看表格
hive (default)> select * from stud
> ;
OK
stud.name stud.area stud.course stud.score
zhangsan baijin math 88
lisi shanghai Chinese 90
wangwu tianjin English 95
zhaoliu baijin English 90
tan7 shanghai Chinese 80
Time taken: 0.06 seconds, Fetched: 5 row(s)
4.算出各课程的平均数,要附带上地区(这个时候就需要行转列函数了,因为课程平均数只有一行,但是地区有多个,就需要把多个地区放在一起,用到了行转列函数(collect_set))
hive (default)> select course ,collect_set(area) areas ,avg(score) avg_score from stud group by course;
Query ID = atguigu_20191219111821_b9373434-fee3-4083-be12-2ba180a39e99
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1576713221960_0006)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.00 s
--------------------------------------------------------------------------------
OK
course areas avg_score
Chinese ["shanghai"] 85.0
English ["tianjin","baijin"] 92.5
math ["baijin"] 88.0
Time taken: 7.257 seconds, Fetched: 3 row(s)
5.也可以单独的取指定的位置合并后的地区
hive (default)> select course ,collect_set(area)[1] area ,avg(score) avg_score from stud group by course;
Query ID = atguigu_20191219112100_79c34b1d-61a7-4716-9545-05dbe6aa0f9f
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1576713221960_0006)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.71 s
--------------------------------------------------------------------------------
OK
course area avg_score
Chinese NULL 85.0
English baijin 92.5
math NULL 88.0
Time taken: 6.687 seconds, Fetched: 3 row(s)