Hive QL的经典题详解

最新推荐文章于 2024-03-12 21:31:16 发布

戴着鸭舌帽上台

最新推荐文章于 2024-03-12 21:31:16 发布

阅读量323

点赞数

分类专栏： BigData_SQL 文章标签： hive sql hadoop

本文链接：https://blog.csdn.net/Dream_aju/article/details/120684664

版权

第一题：

a_test、b_test、c_test三个hive表 每个表中都只有一列int类型且列名相同，求三个表中互不重复的数
a.txt   b.txt   c.txt
1       2       1
2       3       2
3       11      3
4       12      11
5       14      5
6       15      6
7       16      7
8       18      8
9       35      20
        6       30
        7       40
        8
    
create table a_test(
id int
);
load data local inpath './data/a.txt' overwrite into table a_test;

create table b_test(
id int
);
load data local inpath './data/b.txt' overwrite into table b_test;


create table c_test(
id int
);
load data local inpath './data/c.txt' overwrite into table c_test;

答案粘贴处：

分析：求三个表中互不重复的数
     使用的是union all 进行连接变成总再进行查询
     三张表联合成一张总表，在外层再套一张表进行查询，用总表的id分组，再过滤having count(A.id)=1

select A.id
from 
(select id from a_test
union all
select id from b_test
union all
select id from c_test) A
group by A.id
having count(A.id)=1;

result:
OK
4
9
12
14
15
16
18
20
30
35
40

第二题(*)：

需求：求出连续七天登陆的总人数

数据: login_test表
uid  dt  login_status(1登录成功,0异常)
1   2019-07-11  1
1   2019-07-12  1
1   2019-07-13  1
1   2019-07-14  1
1   2019-07-15  1
1   2019-07-16  1
1   2019-07-17  1
1   2019-07-18  1
2   2019-07-11  1
2   2019-07-12  1
2   2019-07-13  0
2   2019-07-14  1
2   2019-07-15  1
2   2019-07-16  0
2   2019-07-17  1
2   2019-07-18  0
3   2019-07-11  1
3   2019-07-12  1
3   2019-07-13  1
3   2019-07-14  1
3   2019-07-15  1
3   2019-07-16  1
3   2019-07-17  1
3   2019-07-18  1

create table login_test(
uid int,
dt string,
login_status int
) 
row format delimited fields terminated by '\t';
load data local inpath './data/login_test' into table login_test;

答案粘贴处：

第一种写法：Me
select count(*)
from
(
select distinct uid 
from(
select uid,dt,lag(dt,6) over(partition by uid order by dt) pre_dt,
sum(login_status) over(partition by uid order by dt rows between 6 preceding and current row) total
from login_test
) t 
where date_sub(dt,6)=pre_dt and t.total=7) t1;

第二种写法：answer
select count(distinct a.uid)
from
(select uid,login_status,
datediff(dt,lag(dt,6) over(distribute by uid sort by dt asc)) b
from login_test
where login_status=1) a
where a.b=6
;

第三题：

2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023

create table if not exists temperature(
data string
)
row format delimited fields terminated by '\t'
lines terminated by '\n';
load data local inpath './data/temperature' into table temperature;

数据解析：比如：2010012325表示在2010年01月23日的气温为25度。

需求：现在要求使用hive，计算每一年的最大气温的日期+温度。效果如下：
20010105        29
20070109        99
20080103        37
20100103        17
20120107        32
20130109        29
20140103        17
20150109        99

答案粘贴处:

分析：计算每一年的最大气温的日期+温度
查询前面八位是年月日，后面两位是气温
按年进行分组通过气温进行排序
只选总表的气温=1的 就是排在第一名的
第一种写法:
select t.dt,t.temp from
(select substr(data,0,8) dt,
substr(data,9,10) temp,
rank() over(partition by substr(data,0,4) 
order by substr(data,9,10) desc) rn
from temperature) t
where t.rn=1;

第二种写法：
select B.dt,B.temp
from (
select A.year,A.temp,A.dt,dense_rank() over(partition by A.year order by A.temp desc) rk
from 
(select substr(data,0,4) year,substr(data,9,2) temp,substr(data,0,8) dt from temperature
) A) B where B.rk=1;
分析：总体思路是一样的
      利用substr前四位是年，后两位是气温进行切割

第四题：

最低0.47元/天解锁文章

戴着鸭舌帽上台

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hive QL的经典题详解

第一题：a_test、b_test、c_test三个hive表每个表中都只有一列int类型且列名相同，求三个表中互不重复的数a.txt b.txt c.txt1 2 12 3 23 11 34 12 115 14 56 15 67 16 78 18 89 35 20
复制链接

扫一扫