1、DML的高级练习
student.txt
01 赵雷 1990-01-01 男
02 钱电 1990-12-21 男
03 孙风 1990-05-20 男
04 李云 1990-08-06 男
05 周梅 1991-12-01 女
06 吴兰 1992-03-01 女
07 郑竹 1989-07-01 女
08 王菊 1990-01-20 女
course.txt
01 语文 02
02 数学 01
03 英语 03
sc.txt
01 01 80
01 02 90
01 03 99
02 01 70
02 02 60
02 03 80
03 01 80
03 02 80
03 03 80
04 01 50
04 02 30
04 03 20
05 01 76
05 02 87
06 01 31
06 03 34
07 02 89
07 03 98
teacher.txt
01 张三
02 李四
03 王五
建表并导入数据:
0: jdbc:hive2://hadoop:11240> create database school;
0: jdbc:hive2://hadoop:11240> use school;
0: jdbc:hive2://hadoop:11240> create table Student(Sid string,Sname string,Sbirth string,Ssex string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/student.txt'
. . . . . . . . . . . . . . > into table student;
0: jdbc:hive2://hadoop:11240> create table Course(Cid string,Cname string,Tid string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/course.txt'
. . . . . . . . . . . . . . > into table Course;
0: jdbc:hive2://hadoop:11240> create table SC(Sid string,Cid string,score int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/sc.txt'
. . . . . . . . . . . . . . > into table sc;
0: jdbc:hive2://hadoop:11240> create table teacher(Tid string,Tname string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/teacher.txt'
. . . . . . . . . . . . . . > into table teacher;
(1)查询平均成绩大于60分的同学的学号和平均成绩
0: jdbc:hive2://hadoop:11240> select sid,avg(score) as avg_score from sc group by sid having avg(score)>60;
+------+--------------------+
| sid | avg_score |
+------+--------------------+
| 01 | 89.66666666666667 |
| 02 | 75.0 |
| 03 | 80.0 |
| 05 | 81.5 |
| 07 | 93.5 |
+------+--------------------+
(2)查询学过01并且也学过编号02课程的同学的学号、姓名
0: jdbc:hive2://hadoop:11240> select student.sid,student.sname from student
. . . . . . . . . . . . . . > join sc sc1 on sc1.sid=student.sid
. . . . . . . . . . . . . . > join sc sc2 on sc2.sid=sc1.sid
. . . . . . . . . . . . . . > where sc1.cid='01' and sc2.cid='02';
+--------------+----------------+
| student.sid | student.sname |
+--------------+----------------+
| 01 | 赵雷 |
| 02 | 钱电 |
| 03 | 孙风 |
| 04 | 李云 |
| 05 | 周梅 |
+--------------+----------------+
(3)查询张三老师教的课的平均成绩
0: jdbc:hive2://hadoop:11240> select avg(sc.score) as avg_score from sc
. . . . . . . . . . . . . . > join course on course.cid=sc.cid
. . . . . . . . . . . . . . > join teacher on teacher.tid=course.tid
. . . . . . . . . . . . . . > where teacher.tname="张三";
+--------------------+
| avg_score |
+--------------------+
| 72.66666666666667 |
+--------------------+
(4)查询姓“张”的学生名单
0: jdbc:hive2://hadoop:11240> select sname from student where sname like "张%";
+--------------+----------------+-----------------+---------------+
| student.sid | student.sname | student.sbirth | student.ssex |
+--------------+----------------+-----------------+---------------+
+--------------+----------------+-----------------+---------------+
(5)查询1990年出生的学生(substring())
0: jdbc:hive2://hadoop:11240> select sname from student where substr(sbirth,1,4)='1990';
+--------+
| sname |
+--------+
| 赵雷 |
| 钱电 |
| 孙风 |
| 李云 |
| 王菊 |
+--------+
(6)查询课程编号为03且课程成绩在80分以上的学生的学号和姓名
0: jdbc:hive2://hadoop:11240> select student.sid,student.sname from student
. . . . . . . . . . . . . . > join sc on sc.sid=student.sid
. . . . . . . . . . . . . . > where sc.cid="03" and sc.score>80;
+--------------+----------------+
| student.sid | student.sname |
+--------------+----------------+
| 01 | 赵雷 |
| 07 | 郑竹 |
+--------------+----------------+
2、根据data1.txt数据做如下练习:(行转列)
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e
根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
id a b c d e f
1 1 1 1 0 1 0
2 1 0 1 1 0 1
3 1 1 1 0 1 0
建表:
0: jdbc:hive2://hadoop:11240> create table data1(id int,course string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/data1.txt'
. . . . . . . . . . . . . . > into table data1;
0: jdbc:hive2://hadoop:11240> select * from data1;
+-----------+---------------+
| data1.id | data1.course |
+-----------+---------------+
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | e |
| 2 | a |
| 2 | c |
| 2 | d |
| 2 | f |
| 3 | a |
| 3 | b |
| 3 | c |
| 3 | e |
+-----------+---------------+
查询:
0: jdbc:hive2://hadoop:11240> select id,
. . . . . . . . . . . . . . > sum(case when course="a" then 1 else 0 end) a,
. . . . . . . . . . . . . . > sum(case when course="b" then 1 else 0 end) b,
. . . . . . . . . . . . . . > sum(case when course="c" then 1 else 0 end) c,
. . . . . . . . . . . . . . > sum(case when course="d" then 1 else 0 end) d,
. . . . . . . . . . . . . . > sum(case when course="e" then 1 else 0 end) e,
. . . . . . . . . . . . . . > sum(case when course="f" then 1 else 0 end) f
. . . . . . . . . . . . . . > from data1
. . . . . . . . . . . . . . > group by id;
+-----+----+----+----+----+----+----+
| id | a | b | c | d | e | f |
+-----+----+----+----+----+----+----+
| 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 2 | 1 | 0 | 1 | 1 | 0 | 1 |
| 3 | 1 | 1 | 1 | 0 | 1 | 0 |
+-----+----+----+----+----+----+----+
3、根据data2.txt数据做如下练习:(列转行)
uid tags
1 1,2,3
2 2,3
3 1,2
编写sql实现如下结果:
uid | tag |
1 | 1 |
1 | 2 |
1 | 3 |
2 | 2 |
2 | 3 |
3 | 1 |
3 | 2 |
建表:
0: jdbc:hive2://hadoop:11240> create table data2(uid string,tags string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/data2.txt'
. . . . . . . . . . . . . . > into table data2;
0: jdbc:hive2://hadoop:11240> select * from data2;
+------------+-------------+
| data2.uid | data2.tags |
+------------+-------------+
| 1 | 1,2,3 |
| 2 | 2,3 |
| 3 | 1,2 |
+------------+-------------+
查询:
0: jdbc:hive2://hadoop:11240> select uid,tag from data2
. . . . . . . . . . . . . . > lateral view explode(split(tags,',')) tags as tag;
+------+------+
| uid | tag |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
+------+------+
4、根据data3.txt数据做如下练习:
数据:
id tag flag
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8
编写sql实现如下结果:
id | tag | flag |
a | b | 1|2|3 |
c | d | 6|8 |
建表:
0: jdbc:hive2://hadoop:11240> create table data3(id string,tag string,flag string)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by '\t';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/data3.txt'
. . . . . . . . . . . . . . > into table data3;
0: jdbc:hive2://hadoop:11240> select * from data3;
+-----------+------------+-------------+
| data3.id | data3.tag | data3.flag |
+-----------+------------+-------------+
| a | b | 2 |
| a | b | 1 |
| a | b | 3 |
| c | d | 6 |
| c | d | 8 |
| c | d | 8 |
+-----------+------------+-------------+
查询:
0: jdbc:hive2://hadoop:11240> select id,tag,concat_ws('|',collect_set(flag)) as flag from data3
. . . . . . . . . . . . . . > group by id,tag;
+-----+------+--------+
| id | tag | flag |
+-----+------+--------+
| a | b | 2|1|3 |
| c | d | 6|8 |
+-----+------+--------+
5、根据data4.txt数据做如下练习:(case when)
// 字段解释:
//id, 学号, 课程, 成绩
1,1,yuwen,43
2,1,shuxue,55
3,2,yuwen,77
4,2,shuxue,88
5,3,yuwen,98
6,3,shuxue,65
求:所有数学课程成绩 大于 语文课程成绩的学生的学号
建表:
0: jdbc:hive2://hadoop:11240> create table data4(id int,sno int,course string,score int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/data4.txt'
. . . . . . . . . . . . . . > into table data4;
0: jdbc:hive2://hadoop:11240> select * from data4;
+-----------+------------+---------------+--------------+
| data4.id | data4.sno | data4.course | data4.score |
+-----------+------------+---------------+--------------+
| 1 | 1 | yuwen | 43 |
| 2 | 1 | shuxue | 55 |
| 3 | 2 | yuwen | 77 |
| 4 | 2 | shuxue | 88 |
| 5 | 3 | yuwen | 98 |
| 6 | 3 | shuxue | 65 |
+-----------+------------+---------------+--------------+
查询:
0: jdbc:hive2://hadoop:11240> select * from
. . . . . . . . . . . . . . > (select sno,
. . . . . . . . . . . . . . > sum(case course when "yuwen" then score else 0 end) as yuwen,
. . . . . . . . . . . . . . > sum(case course when "shuxue" then score else 0 end) as shuxue
. . . . . . . . . . . . . . > from data4 group by sno) t1
. . . . . . . . . . . . . . > where t1.shuxue > t1.yuwen;
+---------+-----------+------------+
| t1.sno | t1.yuwen | t1.shuxue |
+---------+-----------+------------+
| 1 | 43 | 55 |
| 2 | 77 | 88 |
+---------+-----------+------------+
6、根据data5.txt数据做如下练习:(窗口函数)
data5.txt数据:
店铺,月份,金额
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250
建表:
0: jdbc:hive2://hadoop:11240> create table data5(shopname string,mon string,price int)
. . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . > fields terminated by ',';
0: jdbc:hive2://hadoop:11240> load data local inpath '/home/xiaokang/hivedata/data5.txt'
. . . . . . . . . . . . . . > into table data5;
0: jdbc:hive2://hadoop:11240> select * from data5;
+-----------------+------------+--------------+
| data5.shopname | data5.mon | data5.price |
+-----------------+------------+--------------+
| a | 01 | 150 |
| a | 01 | 200 |
| b | 01 | 1000 |
| b | 01 | 800 |
| c | 01 | 250 |
| c | 01 | 220 |
| b | 01 | 6000 |
| a | 02 | 2000 |
| a | 02 | 3000 |
| b | 02 | 1000 |
| b | 02 | 1500 |
| c | 02 | 350 |
| c | 02 | 280 |
| a | 03 | 350 |
| a | 03 | 250 |
+-----------------+------------+--------------+
求出每个店铺的当月销售额和累计到当月的总销售额
0: jdbc:hive2://hadoop:11240> select shopname,mon,sp,
. . . . . . . . . . . . . . > sum(sp) over(distribute by shopname sort by mon) ssp
. . . . . . . . . . . . . . > from(
. . . . . . . . . . . . . . > select shopname,mon,
. . . . . . . . . . . . . . > sum(price) sp
. . . . . . . . . . . . . . > from data5
. . . . . . . . . . . . . . > group by shopname,mon
. . . . . . . . . . . . . . > ) t1;
+-----------+------+-------+--------+
| shopname | mon | sp | ssp |
+-----------+------+-------+--------+
| a | 01 | 350 | 350 |
| a | 02 | 5000 | 5350 |
| a | 03 | 600 | 5950 |
| b | 01 | 7800 | 7800 |
| b | 02 | 2500 | 10300 |
| c | 01 | 470 | 470 |
| c | 02 | 630 | 1100 |
+-----------+------+-------+--------+