03.Hive必刷50题--11-15题

– 11、查询没有学全所有课程的同学的信息:

方式1:

先查询课程总数

   select count(1) from course;

根据得到的课程数,进行查询 

SELECT s.*
FROM student s LEFT JOIN score sc on s.s_id=sc.s_id
GROUP BY s.s_id,s_name,s_birth,s_sex HAVING COUNT(s.s_id)<3

思路:

进行student和score表的连接后,选3课的sid必然出现3次

 

方式2:

SELECT s.*
FROM student s
LEFT JOIN
(SELECT s_id 
FROM score 
GROUP BY s_id HAVING COUNT(s_id)=3
) tmp
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL

思路:

先查询出score中学生选了3门课的s_id,之后和student进行left join,最后为null的就是没有选所有课的学生

方式3:

SELECT *
FROM student s
JOIN (SELECT COUNT(1) count FROM course)tmp1
LEFT JOIN 
(SELECT s_id,COUNT(c_id) count
FROM score
GROUP BY s_id
)tmp2
ON tmp1.count=tmp2.count AND s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL

思路:

进行3个表的连接,先统计出course表有多少课程,之后进行左连接score表,连接条件就是score表中的sid数和统计的课程数tmp1.count一样并通过sid进行连接。最后查询那些tmp2.count为null的就是没有选所有课程的学生。

 

 

– 12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:

SELECT DISTINCT(s.s_id),s.s_name,s.s_birth,s.s_sex
FROM student s
LEFT JOIN score sc ON s.s_id=sc.s_id
WHERE sc.c_id IN (SELECT c_id FROM score WHERE s_id='01') and s.s_id!='01'

思路:

统计出01学生选的课程有那些,之后查询其他学生的课程是否含括在内

 

– 13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:

group_concat函数:将相同的行组合起来

语法:group_concat([DISTINCT] 要连接的字段 [Order BY ASC/DESC 排序字段] [Separator '分隔符'])

参考网址:https://www.iteye.com/blog/hchmsguo-555543

列子:

SELECT s_id,GROUP_CONCAT(c_id ORDER BY c_id DESC Separator '_') concat
FROM score
GROUP BY s_id

hive不支持group_concat方法,可用 concat_ws(分隔符, collect_set(连接字段)) 实现

SELECT s_id,concat_ws('_',collect_set(c_id)) concat
FROM score
GROUP BY s_id

方式一: 

SELECT *
FROM student s
JOIN 
(SELECT concat_ws(',',collect_set(c_id)) concat 
FROM score 
WHERE s_id='01'
)tmp1
LEFT JOIN
(SELECT s_id,concat_ws(',',collect_set(c_id)) concat 
FROM score 
WHERE s_id!='01' 
GROUP BY s_id
)tmp2
ON tmp2.s_id=s.s_id and tmp2.concat=tmp1.concat
WHERE tmp2.concat IS NOT NULL

思路:

通过concat_ws函数,找到01学过的课程与其他学生的课程进行对比,找到一样的学生。

 方式二:

SELECT * 
FROM student s
JOIN 
(SELECT CONCAT_WS(',',collect_set(c_id)) concat FROM score WHERE s_id='01')tmp1 
JOIN 
(SELECT s_id,CONCAT_WS(',',collect_set(c_id)) concat FROM score WHERE s_id!='01' GROUP BY s_id)tmp2
ON
s.s_id=tmp2.s_id AND tmp1.concat=tmp2.concat

 

– 14、查询没学过"张三"老师讲授的任一门课程的学生姓名:

SELECT *
FROM student s
LEFT JOIN
(SELECT s_id 
FROM score 
WHERE score.c_id 
IN (SELECT c_id 
FROM course 
WHERE course.t_id=(SELECT t_id FROM teacher WHERE t_name='张三')
) 
GROUP BY s_id
)tmp
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL

上面的sql在mysql中可以执行,但是在hive中无法执行

Error: Error while compiling statement: FAILED: ParseException line 10:19 cannot recognize input near 'SELECT' 't_id' 'FROM' in expression specification (state=42000,code=40000)

原因:hive中子查询语句只能出现在from子句中,其他地方目前不允许

错误的修改:

SELECT *
FROM student s
JOIN(SELECT t_id 
FROM teacher 
WHERE t_name='张三'
)tmp1
LEFT JOIN
(SELECT s_id 
FROM score 
WHERE score.c_id 
IN (SELECT c_id 
FROM course 
WHERE course.t_id=tmp1.t_id
) 
GROUP BY s_id
)tmp2
ON s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL

报错:Unknown column 'tmp1.t_id' in 'where clause'

原因:在子查询中是无法查到我们现在关联的表的内容的,只能查到子查询范围内的数据

SELECT *
FROM student s
LEFT JOIN
(SELECT s_id 
FROM score 
JOIN(SELECT t_id 
FROM teacher 
WHERE t_name='张三'
)tmp1
WHERE score.c_id 
IN (SELECT c_id 
FROM course 
WHERE course.t_id=tmp1.t_id
) 
GROUP BY s_id
)tmp2
ON s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL

思路:

score表中的cid,在张三老师授课的cid范围内

上面的sql写的太啰嗦了,优化:

SELECT s.* 
FROM student s
LEFT JOIN 
(SELECT sc.s_id
FROM teacher t 
JOIN course c ON t.t_name='张三' AND t.t_id=c.t_id
JOIN score sc ON sc.c_id = c.c_id
)tmp 
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL

思路:

将teacher表和course进行连接,通过限定条件张三老师,得到cid的信息,在通过与score表的连接得到学过张三老师cid课程学生的sid信息,之后把这个信息连接到student表中,tmp.sid为null的就是没有学过的

 

– 15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:

一开始错误的写法:

SELECT s.s_id,s.s_name,tmp.avg
FROM student s
JOIN 
(SELECT s_id,AVG(s_score) avg FROM score GROUP BY s_id HAVING COUNT(s_score<60)>1)tmp
on s.s_id=tmp.s_id

原因:这里count内部并没有进行s_score<60的计算,而是直接进行了count(s_score),所有人的score都比1多

修改:

SELECT s.s_id,s.s_name,tmp.avg
FROM student s
JOIN 
(SELECT s_id,AVG(s_score) avg FROM score WHERE s_score<60 GROUP BY s_id HAVING COUNT(s_id)>1)tmp
on s.s_id=tmp.s_id

where筛选

SELECT * FROM score WHERE s_score<60 

聚合函数统计的sid

SELECT *,COUNT(s_id) FROM score WHERE s_score<60 GROUP BY s_id

后续部分参见:

00.Hive必刷50题--建表,插入数据

01.Hive必刷50题--1-5题

02.Hive必刷50题--6-10题

03.Hive必刷50题--11-15题

04.Hive必刷50题--16-20题

05.Hive必刷50题--21-25题

06.Hive必刷50题--26-30题

07.Hive必刷50题--31-35题

08.Hive必刷50题--36-40题

09.Hive必刷50题--41-45题

010.Hive必刷50题--46-50题

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值