【Hive】笔试题 04 （求学生选课情况）

最新推荐文章于 2024-05-06 17:26:42 发布

云祁

最新推荐文章于 2024-05-06 17:26:42 发布

阅读量2.5k

点赞数 4

分类专栏： # ---- Hive 文章标签： hive

本文链接：https://blog.csdn.net/BeiisBei/article/details/104879474

版权

---- Hive 专栏收录该内容

37 篇文章 32 订阅

订阅专栏

文章目录

1、数据说明

（1）数据格式

id course 
1,a 
1,b 
1,c 
1,e 
2,a 
2,c 
2,d 
2,f 
3,a 
3,b 
3,c 
3,e

（2）字段含义

表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门。

2、数据准备

（1）建表t_course

create table t_course(id int,course string)
row format delimited fields terminated by ",";

（2）导入数据

load data local inpath "/home/hadoop/course/course.txt" into table t_course;

3、需求

编写Hive的HQL语句来实现以下结果：表中的1表示选修，表中的0表示未选修

id    a    b    c    d    e    f
1     1    1    1    0    1    0
2     1    0    1    1    0    1
3     1    1    1    0    1    0

4、解析

第一步：

select collect_set(course) as courses from id_course;

第二步：

set hive.strict.checks.cartesian.product=false;

create table id_courses as select t1.id as id,t1.course as id_courses,t2.course courses 
from 
( select id as id,collect_set(course) as course from id_course group by id ) t1 
join 
(select collect_set(course) as course from id_course) t2;

启用严格模式：

hive.mapred.mode = strict // Deprecated
hive.strict.checks.large.query = true

该设置会禁用：

不指定分页的orderby
对分区表不指定分区进行查询
和数据量无关，只是一个查询模式

hive.strict.checks.type.safety = true

严格类型安全，该属性不允许以下操作：

bigint和string之间的比较
bigint和double之间的比较

hive.strict.checks.cartesian.product = true

该属性不允许笛卡尔积操作

第三步：得出最终结果：
思路：
拿出course字段中的每一个元素在id_courses中进行判断，看是否存在。

select id,
case when array_contains(id_courses, courses[0]) then 1 else 0 end as a,
case when array_contains(id_courses, courses[1]) then 1 else 0 end as b,
case when array_contains(id_courses, courses[2]) then 1 else 0 end as c,
case when array_contains(id_courses, courses[3]) then 1 else 0 end as d,
case when array_contains(id_courses, courses[4]) then 1 else 0 end as e,
case when array_contains(id_courses, courses[5]) then 1 else 0 end as f 
from id_courses;