hive面试题以及解题思路

题目1:

现有这么一批数据,现要求出:
每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数

三个字段的意思:
用户名,月份,访问次数

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11

最后结果展示:

用户	月份		最大访问次数	总访问次数		当月访问次数
A	2015-01		33			33		  33
A	2015-02		33			43		  10
A	2015-03		38			81		  38
B	2015-01		30			30		  30
B	2015-02		30			45                15
B	2015-03		44			89		  44

解题思路

1、求当月访问次数
select name,datee,sum(times) st from hw1 a  group by name,datee;
+-------+----------+-----+
| name  |  datee   | st  |
+-------+----------+-----+
| A     | 2015-01  | 33  |
| A     | 2015-02  | 10  |
| A     | 2015-03  | 38  |
| B     | 2015-01  | 30  |
| B     | 2015-02  | 15  |
| B     | 2015-03  | 44  |
+-------+----------+-----+

2、求总访问次数
select a.name,a.datee, sum(times)tt from hw1 b,
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)a
where b.datee <= a.datee and a.name = b.name
group by a.name,a.datee;
+---------+----------+-----+
| a.name  | a.datee  | tt  |
+---------+----------+-----+
| A       | 2015-01  | 33  |
| A       | 2015-02  | 43  |
| A       | 2015-03  | 81  |
| B       | 2015-01  | 30  |
| B       | 2015-02  | 45  |
| B       | 2015-03  | 89  |
+---------+----------+-----+

3、求最大访问次数
select a.name,a.datee, max(b.s1)mt from 
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)b,
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)a
where b.datee <= a.datee and a.name = b.name
group by a.name,a.datee;
+---------+----------+-----+
| a.name  | a.datee  | mt  |
+---------+----------+-----+
| A       | 2015-01  | 33  |
| A       | 2015-02  | 33  |
| A       | 2015-03  | 38  |
| B       | 2015-01  | 30  |
| B       | 2015-02  | 30  |
| B       | 2015-03  | 44  |
+---------+----------+-----+

4、连接3张表
select aa.name,aa.datee,cc.mt,aa.tt,bb.st from
(select a.name,a.datee, sum(times)tt from hw1 b,
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)a
where b.datee <= a.datee and a.name = b.name
group by a.name,a.datee) aa
join
(select name,datee,sum(times) st from hw1 a  group by name,datee)bb
on aa.name=bb.name and aa.datee=bb.datee
join
(select a.name,a.datee, max(b.s1)mt from 
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)b,
(select name,datee,sum(times) s1 from hw1 c  group by c.name,c.datee)a
where b.datee <= a.datee and a.name = b.name
group by a.name,a.datee) cc
on aa.name=cc.name and aa.datee=cc.datee;
+----------+-----------+--------+--------+--------+
| aa.name  | aa.datee  | cc.mt  | aa.tt  | bb.st  |
+----------+-----------+--------+--------+--------+
| A        | 2015-01   | 33     | 33     | 33     |
| A        | 2015-02   | 33     | 43     | 10     |
| A        | 2015-03   | 38     | 81     | 38     |
| B        | 2015-01   | 30     | 30     | 30     |
| B        | 2015-02   | 30     | 45     | 15     |
| B        | 2015-03   | 44     | 89     | 44     |
+----------+-----------+--------+--------+--------+

------------------------------------------------------------------------------------------------------------------------------

题目2:

// 建表语句:
CREATE TABLE `course` (
  `id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `sid` int(11) DEFAULT NULL,
  `course` varchar(255) DEFAULT NULL,
  `score` int(11) DEFAULT NULL 
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

// 插入数据
// 字段解释:id, 学号, 课程, 成绩
INSERT INTO `course` VALUES (1, 1, 'yuwen', 43);
INSERT INTO `course` VALUES (2, 1, 'shuxue', 55);
INSERT INTO `course` VALUES (3, 2, 'yuwen', 77);
INSERT INTO `course` VALUES (4, 2, 'shuxue', 88);
INSERT INTO `course` VALUES (5, 3, 'yuwen', 98);
INSERT INTO `course` VALUES (6, 3, 'shuxue', 65);

求:所有数学课程成绩 大于 语文课程成绩的学生的学号

解题思路:

 1、排序
 select * from hw2 order by sid,course;
+----------+-------------+------------+
| hw2.sid  | hw2.course  | hw2.score  |
+----------+-------------+------------+
| 1        | shuxue      | 55         |
| 1        | yuwen       | 43         |
| 2        | shuxue      | 88         |
| 2        | yuwen       | 77         |
| 3        | shuxue      | 65         |
| 3        | yuwen       | 98         |
+----------+-------------+------------+

 2、 拼接
 select a.sid,concat_ws('-',collect_list(a.course)) course,
  concat_ws('-',collect_list(cast(a.score as string))) score
  from (select * from hw2 order by sid,course)a group by a.sid;

+--------+---------------+--------+
| a.sid  |      _c1      |  _c2   |
+--------+---------------+--------+
| 1      | shuxue-yuwen  | 55-43  |
| 2      | shuxue-yuwen  | 88-77  |
| 3      | shuxue-yuwen  | 65-98  |
+--------+---------------+--------+

  3、比较选择
  select * from 
  (select a.sid,concat_ws('-',collect_list(a.course)) course,
  concat_ws('-',collect_list(cast(a.score as string))) score
  from (select * from hw2 order by sid,course)a group by a.sid)aa
  where substring_index(aa.score,'-',1)> substring_index(aa.score,'-',-1);

+---------+---------------+-----------+
| aa.sid  |   aa.course   | aa.score  |
+---------+---------------+-----------+
| 1       | shuxue-yuwen  | 55-43     |
| 2       | shuxue-yuwen  | 88-77     |
+---------+---------------+-----------+

--------------------------------------------------------------------------------------------------------------------------------

题目3:

2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023
比如:2010012325表示在2010年01月23日的气温为25度。现在要求使用hive,计算每一年出现过的最大气温的日期+温度。
要计算出每一年的最大气温对应的日期

解题:

1、拆分;
select substring(data,1,4)y,substring(data,1,8)m,substring(data,9,2)t from
 hw3table;
 +-------+-----------+-----+
|   y   |     m     |  t  |
+-------+-----------+-----+
| 2014  | 20140101  | 14  |
| 2014  | 20140102  | 16  |
| 2014  | 20140103  | 17  |
| 2014  | 20140104  | 10  |
| 2014  | 20140105  | 06  |
| 2012  | 20120106  | 09  |
| 2012  | 20120107  | 32  |
| 2012  | 20120108  | 12  |
| 2012  | 20120109  | 19  |
| 2012  | 20120110  | 23  |
| 2001  | 20010101  | 16  |
| 2001  | 20010102  | 12  |
| 2001  | 20010103  | 10  |
| 2001  | 20010104  | 11  |
| 2001  | 20010105  | 29  |
| 2013  | 20130106  | 19  |
| 2013  | 20130107  | 22  |
| 2013  | 20130108  | 12  |
| 2013  | 20130109  | 29  |
| 2013  | 20130110  | 23  |
| 2008  | 20080101  | 05  |
| 2008  | 20080102  | 16  |
| 2008  | 20080103  | 37  |
| 2008  | 20080104  | 14  |
| 2008  | 20080105  | 16  |
| 2007  | 20070106  | 19  |
| 2007  | 20070107  | 12  |
| 2007  | 20070108  | 12  |
| 2007  | 20070109  | 99  |
| 2007  | 20070110  | 23  |
| 2010  | 20100101  | 14  |
| 2010  | 20100102  | 16  |
| 2010  | 20100103  | 17  |
| 2010  | 20100104  | 10  |
| 2010  | 20100105  | 06  |
| 2015  | 20150106  | 49  |
| 2015  | 20150107  | 22  |
| 2015  | 20150108  | 12  |
| 2015  | 20150109  | 99  |
+-------+-----------+-----+
2、分组求各年最大值
select a.y,max(a.t)t from
(select substring(data,1,4)y,substring(data,1,8)m,substring(data,9,2)t from
 hw3table )a 
group by a.y;
+-------+-----+
|  a.y  |  t  |
+-------+-----+
| 2001  | 29  |
| 2007  | 99  |
| 2008  | 37  |
| 2010  | 17  |
| 2012  | 32  |
| 2013  | 29  |
| 2014  | 17  |
| 2015  | 99  |
+-------+-----+


3、两表联合,求出最高温对应的具体时间
select b.* from 
(select substring(data,1,4)y,substring(data,1,8)m,substring(data,9,2)t from
 hw3table )b 
join
(select a.y,max(a.t)t from
(select substring(data,1,4)y,substring(data,1,8)m,substring(data,9,2)t from
 hw3table )a 
group by a.y)aa
on b.y = aa.y and b.t =aa.t
order by b.y;
+-------+-----------+------+
|  b.y  |    b.m    | b.t  |
+-------+-----------+------+
| 2001  | 20010105  | 29   |
| 2007  | 20070109  | 99   |
| 2008  | 20080103  | 37   |
| 2010  | 20100103  | 17   |
| 2012  | 20120107  | 32   |
| 2013  | 20130109  | 29   |
| 2014  | 20140103  | 17   |
| 2015  | 20150109  | 99   |
+-------+-----------+------+

-----------------------------------------------------------------------------------------------------------------------

题目4:

现有一份以下格式的数据:
表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门:

id course 
1,a 
1,b 
1,c 
1,e 
2,a 
2,c 
2,d 
2,f 
3,a 
3,b 
3,c 
3,e


编写Hive的HQL语句来实现以下结果:
表中的1表示选修,表中的0表示未选修

id	a	b	c	d	e	f
1	1	1	1	0	1	0
2	1	0	1	1	0	1
3	1	1	1	0	1	0

解题:

1、组合
select cid,concat_ws(',',collect_list(course))c from hw4 group by cid;
+------+--------------+
| cid  |      c       |
+------+--------------+
| 1    | a ,b ,c ,e   |
| 2    | a ,c ,d ,f   |
| 3    | a ,b ,c ,e   |
+------+--------------+
2、判断
select a.cid,
(case when a.c like '%a%' then 1 else 0 end)a ,
(case when a.c like '%b%' then 1 else 0 end)b ,
(case when a.c like '%c%' then 1 else 0 end)c ,
(case when a.c like '%d%' then 1 else 0 end)d,
(case when a.c like '%e%' then 1 else 0 end)e ,
(case when a.c like '%f%' then 1 else 0 end)f  
from
 (select cid,concat_ws(',',collect_list(course))c from hw4 group by cid)a;
+--------+----+----+----+----+----+----+
| a.cid  | a  | b  | c  | d  | e  | f  |
+--------+----+----+----+----+----+----+
| 1      | 1  | 1  | 1  | 0  | 1  | 0  |
| 2      | 1  | 0  | 1  | 1  | 0  | 1  |
| 3      | 1  | 1  | 1  | 0  | 1  | 0  |
+--------+----+----+----+----+----+----+

方案2:



create table hw4_1 as
select aa.cid,aa.c,sort_array(bb.cs) cs from
(select a.cid,
collect_list(a.course)c
from hw4 a
group by a.cid)aa
join
(select collect_list(distinct course)cs
from hw4)bb;

select cid,
if (array_contains(a.c,a.cs[0]),1,0) as a,
if (array_contains(a.c,a.cs[1]),1,0) as b,
if (array_contains(a.c,a.cs[2]),1,0) as c,
if (array_contains(a.c,a.cs[3]),1,0) as d,
if (array_contains(a.c,a.cs[4]),1,0) as e,
if (array_contains(a.c,a.cs[5]),1,0) as f
from hw4_1 a;


----------------------------------------------------------------------------------------------------------------------------

题目5:

现有如下格式的一份数据:

店铺,月份,金额
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250

需求:编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额

解题:

1、当月销售额
select name,month,sum(money)s from hw5table group by name,month;
+-------+--------+-------+
| name  | month  |   s   |
+-------+--------+-------+
| a     | 01     | 350   |
| a     | 02     | 5000  |
| a     | 03     | 600   |
| b     | 01     | 7800  |
| b     | 02     | 2500  |
| c     | 01     | 470   |
| c     | 02     | 630   |
+-------+--------+-------+
2、累计到当月销售额
select a.name,a.month, sum(b.money)ss from hw5table b,
(select name,month,sum(money)s from hw5table group by name,month)a
where a.name = b.name and a.month >= b.month
group by a.name,a.month;
+---------+----------+--------+
| a.name  | a.month  |   ss   |
+---------+----------+--------+
| a       | 01       | 350    |
| a       | 02       | 5350   |
| a       | 03       | 5950   |
| b       | 01       | 7800   |
| b       | 02       | 10300  |
| c       | 01       | 470    |
| c       | 02       | 1100   |
+---------+----------+--------+
3、连接
select bb.*,aa.ss from
(select a.name,a.month, sum(b.money)ss from hw5table b,
(select name,month,sum(money)s from hw5table group by name,month)a
where a.name = b.name and a.month >= b.month
group by a.name,a.month)aa,
(select name,month,sum(money)s from hw5table group by name,month)bb
where aa.name = bb.name and aa.month =bb.month;
+----------+-----------+-------+--------+
| bb.name  | bb.month  | bb.s  | aa.ss  |
+----------+-----------+-------+--------+
| a        | 01        | 350   | 350    |
| a        | 02        | 5000  | 5350   |
| a        | 03        | 600   | 5950   |
| b        | 01        | 7800  | 7800   |
| b        | 02        | 2500  | 10300  |
| c        | 01        | 470   | 470    |
| c        | 02        | 630   | 1100   |
+----------+-----------+-------+--------+






  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值