– 11、查询没有学全所有课程的同学的信息:
方式1:
先查询课程总数
select count(1) from course;
根据得到的课程数,进行查询
SELECT s.*
FROM student s LEFT JOIN score sc on s.s_id=sc.s_id
GROUP BY s.s_id,s_name,s_birth,s_sex HAVING COUNT(s.s_id)<3
思路:
进行student和score表的连接后,选3课的sid必然出现3次
方式2:
SELECT s.*
FROM student s
LEFT JOIN
(SELECT s_id
FROM score
GROUP BY s_id HAVING COUNT(s_id)=3
) tmp
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL
思路:
先查询出score中学生选了3门课的s_id,之后和student进行left join,最后为null的就是没有选所有课的学生
方式3:
SELECT *
FROM student s
JOIN (SELECT COUNT(1) count FROM course)tmp1
LEFT JOIN
(SELECT s_id,COUNT(c_id) count
FROM score
GROUP BY s_id
)tmp2
ON tmp1.count=tmp2.count AND s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL
思路:
进行3个表的连接,先统计出course表有多少课程,之后进行左连接score表,连接条件就是score表中的sid数和统计的课程数tmp1.count一样并通过sid进行连接。最后查询那些tmp2.count为null的就是没有选所有课程的学生。
– 12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:
SELECT DISTINCT(s.s_id),s.s_name,s.s_birth,s.s_sex
FROM student s
LEFT JOIN score sc ON s.s_id=sc.s_id
WHERE sc.c_id IN (SELECT c_id FROM score WHERE s_id='01') and s.s_id!='01'
思路:
统计出01学生选的课程有那些,之后查询其他学生的课程是否含括在内
– 13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:
group_concat函数:将相同的行组合起来
语法:group_concat([DISTINCT] 要连接的字段 [Order BY ASC/DESC 排序字段] [Separator '分隔符'])
参考网址:https://www.iteye.com/blog/hchmsguo-555543
列子:
SELECT s_id,GROUP_CONCAT(c_id ORDER BY c_id DESC Separator '_') concat
FROM score
GROUP BY s_id
hive不支持group_concat方法,可用 concat_ws(分隔符, collect_set(连接字段)) 实现
SELECT s_id,concat_ws('_',collect_set(c_id)) concat
FROM score
GROUP BY s_id
方式一:
SELECT *
FROM student s
JOIN
(SELECT concat_ws(',',collect_set(c_id)) concat
FROM score
WHERE s_id='01'
)tmp1
LEFT JOIN
(SELECT s_id,concat_ws(',',collect_set(c_id)) concat
FROM score
WHERE s_id!='01'
GROUP BY s_id
)tmp2
ON tmp2.s_id=s.s_id and tmp2.concat=tmp1.concat
WHERE tmp2.concat IS NOT NULL
思路:
通过concat_ws函数,找到01学过的课程与其他学生的课程进行对比,找到一样的学生。
方式二:
SELECT *
FROM student s
JOIN
(SELECT CONCAT_WS(',',collect_set(c_id)) concat FROM score WHERE s_id='01')tmp1
JOIN
(SELECT s_id,CONCAT_WS(',',collect_set(c_id)) concat FROM score WHERE s_id!='01' GROUP BY s_id)tmp2
ON
s.s_id=tmp2.s_id AND tmp1.concat=tmp2.concat
– 14、查询没学过"张三"老师讲授的任一门课程的学生姓名:
SELECT *
FROM student s
LEFT JOIN
(SELECT s_id
FROM score
WHERE score.c_id
IN (SELECT c_id
FROM course
WHERE course.t_id=(SELECT t_id FROM teacher WHERE t_name='张三')
)
GROUP BY s_id
)tmp
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL
上面的sql在mysql中可以执行,但是在hive中无法执行
Error: Error while compiling statement: FAILED: ParseException line 10:19 cannot recognize input near 'SELECT' 't_id' 'FROM' in expression specification (state=42000,code=40000)
原因:hive中子查询语句只能出现在from子句中,其他地方目前不允许
错误的修改:
SELECT *
FROM student s
JOIN(SELECT t_id
FROM teacher
WHERE t_name='张三'
)tmp1
LEFT JOIN
(SELECT s_id
FROM score
WHERE score.c_id
IN (SELECT c_id
FROM course
WHERE course.t_id=tmp1.t_id
)
GROUP BY s_id
)tmp2
ON s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL
报错:Unknown column 'tmp1.t_id' in 'where clause'
原因:在子查询中是无法查到我们现在关联的表的内容的,只能查到子查询范围内的数据
SELECT *
FROM student s
LEFT JOIN
(SELECT s_id
FROM score
JOIN(SELECT t_id
FROM teacher
WHERE t_name='张三'
)tmp1
WHERE score.c_id
IN (SELECT c_id
FROM course
WHERE course.t_id=tmp1.t_id
)
GROUP BY s_id
)tmp2
ON s.s_id=tmp2.s_id
WHERE tmp2.s_id IS NULL
思路:
score表中的cid,在张三老师授课的cid范围内
上面的sql写的太啰嗦了,优化:
SELECT s.*
FROM student s
LEFT JOIN
(SELECT sc.s_id
FROM teacher t
JOIN course c ON t.t_name='张三' AND t.t_id=c.t_id
JOIN score sc ON sc.c_id = c.c_id
)tmp
ON s.s_id=tmp.s_id
WHERE tmp.s_id IS NULL
思路:
将teacher表和course进行连接,通过限定条件张三老师,得到cid的信息,在通过与score表的连接得到学过张三老师cid课程学生的sid信息,之后把这个信息连接到student表中,tmp.sid为null的就是没有学过的
– 15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:
一开始错误的写法:
SELECT s.s_id,s.s_name,tmp.avg
FROM student s
JOIN
(SELECT s_id,AVG(s_score) avg FROM score GROUP BY s_id HAVING COUNT(s_score<60)>1)tmp
on s.s_id=tmp.s_id
原因:这里count内部并没有进行s_score<60的计算,而是直接进行了count(s_score),所有人的score都比1多
修改:
SELECT s.s_id,s.s_name,tmp.avg
FROM student s
JOIN
(SELECT s_id,AVG(s_score) avg FROM score WHERE s_score<60 GROUP BY s_id HAVING COUNT(s_id)>1)tmp
on s.s_id=tmp.s_id
where筛选
SELECT * FROM score WHERE s_score<60
聚合函数统计的sid
SELECT *,COUNT(s_id) FROM score WHERE s_score<60 GROUP BY s_id
后续部分参见: