【Hive】笔试题 03 （求每一年最大气温的那一天 + 温度）

最新推荐文章于 2024-04-26 18:40:55 发布

云祁

最新推荐文章于 2024-04-26 18:40:55 发布

阅读量1.7k

点赞数 6

分类专栏： # ---- Hive 文章标签： hive

本文链接：https://blog.csdn.net/BeiisBei/article/details/104878785

版权

---- Hive 专栏收录该内容

37 篇文章 31 订阅

订阅专栏

文章目录

1、说明

具体数据

数据解释

2010012325表示在2010年01月23日的气温为25度

2、需求

比如：2010012325表示在2010年01月23日的气温为25度。现在要求使用hive，计算每一年出现过的最大气温的日期+温度。
要计算出每一年的最大气温。我用
select substr(data,1,4),max(substr(data,9,2)) from table2 group by substr(data,1,4);
出来的是年份 + 温度这两列数据例如 2015 99

但是如果我是想select 的是：具体每一年最大气温的那一天 + 温度。例如 20150109 99
请问该怎么执行hive语句。
group by 只需要substr(data,1,4)，
但是select substr(data,1,8)，又不在group by 的范围内。
是我陷入了思维死角。一直想不出所以然。求大神指点一下。
在select 如果所需要的。不在group by的条件里。这种情况如何去分析？

3、解析

（1）创建一个临时表tmp_weather，将数据切分

create table tmp_weather as 
select substr(data,1,4) years,substr(data,5,2) months,substr(data,7,2) days,substr(data,9,2) temp from weather;

select * from tmp_weather;

+--------------------+---------------------+-------------------+-------------------+--+
| tmp_weather.years  | tmp_weather.months  | tmp_weather.days  | tmp_weather.temp  |
+--------------------+---------------------+-------------------+-------------------+--+
| 2014               | 01                  | 02                | 16                |
| 2014               | 01                  | 04                | 10                |
| 2012               | 01                  | 06                | 09                |
| 2012               | 01                  | 08                | 12                |
| 2012               | 01                  | 10                | 23                |
| 2001               | 01                  | 02                | 12                |
| 2001               | 01                  | 04                | 11                |
| 2013               | 01                  | 06                | 19                |
| 2013               | 01                  | 08                | 12                |
| 2013               | 01                  | 10                | 23                |
| 2008               | 01                  | 02                | 16                |
| 2008               | 01                  | 04                | 14                |
| 2007               | 01                  | 06                | 19                |
| 2007               | 01                  | 08                | 12                |
| 2007               | 01                  | 10                | 23                |
| 2010               | 01                  | 02                | 16                |
| 2010               | 01                  | 04                | 10                |
| 2015               | 01                  | 06                | 49                |
| 2015               | 01                  | 08                | 12                |
| 2015               | 01                  | 10                | 23                |
+--------------------+---------------------+-------------------+-------------------+--+

（2）创建一个临时表tmp_year_weather

create table tmp_year_weather as 
select substr(data,1,4) years,max(substr(data,9,2)) max_temp from weather group by substr(data,1,4);

select * from tmp_year_weather;

+-------------------------+----------------------------+--+
| tmp_year_weather.years  | tmp_year_weather.max_temp  |
+-------------------------+----------------------------+--+
| 2001                    | 12                         |
| 2007                    | 23                         |
| 2008                    | 16                         |
| 2010                    | 16                         |
| 2012                    | 23                         |
| 2013                    | 23                         |
| 2014                    | 16                         |
| 2015                    | 49                         |
+-------------------------+----------------------------+--+

（3）将2个临时表进行连接查询

select * from tmp_year_weather a join tmp_weather b on a.years=b.years and a.max_temp=b.temp;

+----------+-------------+----------+-----------+---------+---------+--+
| a.years  | a.max_temp  | b.years  | b.months  | b.days  | b.temp  |
+----------+-------------+----------+-----------+---------+---------+--+
| 2014     | 16          | 2014     | 01        | 02      | 16      |
| 2012     | 23          | 2012     | 01        | 10      | 23      |
| 2001     | 12          | 2001     | 01        | 02      | 12      |
| 2013     | 23          | 2013     | 01        | 10      | 23      |
| 2008     | 16          | 2008     | 01        | 02      | 16      |
| 2007     | 23          | 2007     | 01        | 10      | 23      |
| 2010     | 16          | 2010     | 01        | 02      | 16      |
| 2015     | 49          | 2015     | 01        | 06      | 49      |
+----------+-------------+----------+-----------+---------+---------+--+

4、其他方法

使用了窗口函数，可以看出还是非常迅速的！

select data from (
select data, rank() over (partition by substr(data, 0,4) order by substr(data, 9,10) desc) as rn from weather order by data) as temp
where temp.rn = 1;

+-------------+--+
|    data     |
+-------------+--+
| 2001010212  |
| 2007011023  |
| 2008010216  |
| 2010010216  |
| 2012011023  |
| 2013011023  |
| 2014010216  |
| 2015010649  |
+-------------+--+

也看到了其他朋友的方法，嗯，操作大同小异，这里给大家看下：

存原数据
create table tempertrue(
data string comment ‘row data’
);
视图切分日期与温度：
create view temp as select substr(data,1,8) as time,substr(data,9) as tep from tempertrue;
分区排序再筛选：
select time,tep from (select time,tep,rank() over(partition by substr(time,1,4) order by tep desc) as num from temp) as t1 where t1.num=1;
同理，num=2可取第二高温度

云祁

关注

6
点赞
踩
6

收藏

觉得还不错? 一键收藏
5
评论
【Hive】笔试题 03 （求每一年最大气温的那一天 + 温度）

背景是求每一年最大气温的那一天 + 温度文章目录1、说明2、需求3、解析（1）创建一个临时表tmp_weather，将数据切分（2）创建一个临时表tmp_year_weather（3）将2个临时表进行连接查询1、说明具体数据201401021620140104102012010609201201081220120110232001010212200101041120130...
复制链接

扫一扫