hive 复杂数据类型map和array 使用案例

有一张hive表,表名stdent_score,包含两列,分别是学生姓名name(类型string),

学生成绩score(类型map<string,int>或者 array<string>) ,
成绩列中key是课程名称,例如语文、数学等,value是对应课程分数(0-100),
请用一个hql求一下每个学生成绩最好的课程及分数、最差的课程及分数、平均分数

表名称:student_score
字段:
name,score
数据格式:
huangbo	yuwen:80,shuxue:89,yingyu:95
xuzheng	yuwen:70,shuxue:65,yingyu:81
wangbaoqiang	yuwen:75,shuxue:100,yingyu:76

map类型做法:

1)建表

create table student_score(name string,score map<string,int>)
row format delimited fields terminated by "\t"
collection items terminated by ","
map keys terminated by ":";

存储结果:

+---------------------+----------------------------------------+
| student_score.name  |          student_score.score           |
+---------------------+----------------------------------------+
| huangbo             | {"yuwen":80,"shuxue":89,"yingyu":95}   |
| xuzheng             | {"yuwen":70,"shuxue":65,"yingyu":81}   |
| wangbaoqiang        | {"yuwen":75,"shuxue":100,"yingyu":76}  |
+---------------------+----------------------------------------+

 2)使用explode炸裂为多行:

set hive.mapred.mode=nonstrict;  笛卡尔积

使用explode炸裂为多行
select name,sc_name,sc_value
from student_score lateral view explode(score) sc_map as sc_name,sc_value;
+---------------+----------+-----------+
|     name      | sc_name  | sc_value  |
+---------------+----------+-----------+
| huangbo       | yuwen    | 80        |
| huangbo       | shuxue   | 89        |
| huangbo       | yingyu   | 95        |
| xuzheng       | yuwen    | 70        |
| xuzheng       | shuxue   | 65        |
| xuzheng       | yingyu   | 81        |
| wangbaoqiang  | yuwen    | 75        |
| wangbaoqiang  | shuxue   | 100       |
| wangbaoqiang  | yingyu   | 76        |
+---------------+----------+-----------+

3)使用开窗函数求最大、最小、平均


select distinct(a.name), 
first_value(a.sc_name) over(partition by name order by sc_value desc) as max_name,
max(a.sc_value) over(partition by name ) as max_v,
first_value(a.sc_name) over(partition by name order by sc_value) as min_name,
min(a.sc_value) over(partition by name ) as min_v,
avg(sc_value) over(partition by name) as avg_v
from(
select name,sc_name,sc_value
from student_score lateral view explode(score) sc_map as sc_name,sc_value
) a;

+---------------+-----------+--------+-----------+--------+--------------------+
|    a.name     | max_name  | max_v  | min_name  | min_v  |       avg_v        |
+---------------+-----------+--------+-----------+--------+--------------------+
| huangbo       | yingyu    | 95     | yuwen     | 80     | 88.0               |
| wangbaoqiang  | shuxue    | 100    | yuwen     | 75     | 83.66666666666667  |
| xuzheng       | yingyu    | 81     | shuxue    | 65     | 72.0               |
+---------------+-----------+--------+-----------+--------+--------------------+

另外见到别人非常取巧的一种做法:

成绩与科目拼接,成绩在前,自然排序;但是成绩中存在100,就不灵了!

select 
  name, 
  split(max(score), '-')[1] max_name,
  split(max(score), '-')[0] max_score,
  split(min(score), '-')[1] min_name,
  split(min(score), '-')[0] min_score,
  avg(split(score, '-')[0]) avg
from (
select name,concat(sc_value, '-', sc_name) score 
from student_score lateral view explode(score) sc_map as sc_name,sc_value
) a
group by name;

100被算为最小值,这是这种写法需要注意的地方。 

+---------------+-----------+------------+-----------+------------+--------------------+
|     name      | max_name  | max_score  | min_name  | min_score  |        avg         |
+---------------+-----------+------------+-----------+------------+--------------------+
| huangbo       | yingyu    | 95         | yuwen     | 80         | 88.0               |
| wangbaoqiang  | yingyu    | 76         | shuxue    | 100        | 83.66666666666667  |
| xuzheng       | yingyu    | 81         | shuxue    | 65         | 72.0               |
+---------------+-----------+------------+-----------+------------+--------------------+

 

array类型做法:

1)建表 (仓库文件路径location与上个表student_score保持一致,不用再导入数据)

create table student_score2(name string,score array<string>)
row format delimited fields terminated by "\t"
collection items terminated by ","
location "/user/hive/warehouse/mianshi.db/student_score";

数据格式:

+----------------------+----------------------------------------+
| student_score2.name  |          student_score2.score          |
+----------------------+----------------------------------------+
| huangbo              | ["yuwen:80","shuxue:89","yingyu:95"]   |
| xuzheng              | ["yuwen:70","shuxue:65","yingyu:81"]   |
| wangbaoqiang         | ["yuwen:75","shuxue:100","yingyu:76"]  |
+----------------------+----------------------------------------+

2)使用explode炸裂为多行:增加sc_value列,方便排序与统计。

set hive.mapred.mode=nonstrict;  笛卡尔积

select name, scores ,cast(split(scores,":")[1] as int) as sc_value
from student_score2 lateral view explode(score) sc_map as scores

效果: 

+---------------+-------------+-----------+
|     name      |   scores    | sc_value  |
+---------------+-------------+-----------+
| huangbo       | yuwen:80    | 80        |
| huangbo       | shuxue:89   | 89        |
| huangbo       | yingyu:95   | 95        |
| xuzheng       | yuwen:70    | 70        |
| xuzheng       | shuxue:65   | 65        |
| xuzheng       | yingyu:81   | 81        |
| wangbaoqiang  | yuwen:75    | 75        |
| wangbaoqiang  | shuxue:100  | 100       |
| wangbaoqiang  | yingyu:76   | 76        |
+---------------+-------------+-----------+

3)使用开窗函数求最大、最小、平均

select distinct(a.name),
first_value(a.scores) over(partition by name order by sc_value desc) as max_v,
first_value(a.scores) over(partition by name order by sc_value) as min_v,
avg(sc_value) over(distribute by name) as avg_v
from(
select name, scores ,cast(split(scores,":")[1] as int) as sc_value
from student_score2 lateral view explode(score) sc_map as scores
) a;

结果:

+---------------+-------------+------------+--------------------+
|    a.name     |    max_v    |   min_v    |       avg_v        |
+---------------+-------------+------------+--------------------+
| huangbo       | yingyu:95   | yuwen:80   | 88.0               |
| wangbaoqiang  | shuxue:100  | yuwen:75   | 83.66666666666667  |
| xuzheng       | yingyu:81   | shuxue:65  | 72.0               |
+---------------+-------------+------------+--------------------+

 

 

 

 

 

 

 

 

 

 

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值