hive常见习题解析

我玩的很开心

已于 2024-02-10 21:00:01 修改

阅读量477

点赞数

分类专栏： hive常见习题文章标签： hive 数据库

于 2020-07-28 18:59:37 首次发布

本文链接：https://blog.csdn.net/and52696686/article/details/107642628

版权

hive常见习题专栏收录该内容

1 篇文章 0 订阅

订阅专栏

hive常见习题解析

已有shop_id,item_id,num三列, 使用HiveSQL计算得到a,b列(温馨提示: 按照shop_id分组, a为num值/每组num的和, b为a的组内排序)

解析：

建表

create table shops(
shop_id char(1),
item_id char(1), 
num int
);

插入数据

Insert into shops values
("A","a",10),
("A","b",12),
("B","a",8),
("A","c",5),
("B","c",8),
("C","b",9);

查询：

select shop_id,item_id,num,
num/sum(num) over (distribute by shop_id) a,
rank() over (distribute by shop_id sort by num desc)b
from shops;

使用hive计算num列的sum值

解析： 需结合侧视图使用行转列函数explode

建表

create table test2(
item char(1),
num int
stored textfile;

插入数据

insert into test2 values
("A",1,2,3,4),
("B",2,5,1);

查询：

select item,num,sum(tmp.c) sum from test2
lateral view explode(split(num,",")) tmp as c
group by item,num;

货拉拉2004-2006年订单收入 hll_city_income 如下：

年份	大区	城市	收入(单位千万)
t_year	t_region	t_city	t_money
2004	华南	深圳	70
2005	华南	深圳	80
2006	华南	深圳	100
2004	华南	广州	40
2005	华南	广州	90
2006	华南	广州	110
2004	华北	北京	60
2005	华北	北京	80
2006	华北	北京	120

写一段SQL，输出如下结果，结果保留两位小数

年份	华南总收入	华北总收入	华南平均收入	华北平均收入	收入第一名
t_year	t_south_money	t_north_money	t_south_avg_money	t_north_avg_money	t_first_city
2004	110.00	60.00	55.00	60.00	深圳
2005	170.00	80.00	85.00	80.00	广州
2006	210.00	120.00	105.00	120.00	北京

解析：

编辑 hll.txt 文件，内容如下：

2004,华南,深圳,70
2005,华南,深圳,80
2006,华南,深圳,100
2004,华南,广州,40
2005,华南,广州,90
2006,华南,广州,110
2004,华北,北京,60
2005,华北,北京,80
2006,华北,北京,120

将 hll.txt 上传至 hdfs： /data/hll0902/ 目录下

建表

CREATE EXTERNAL TABLE `hll_city_income`(
`t_year` string,
`t_region` string,
`t_city` string,
`t_money` int)
COMMENT 'huolala interview'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile
location '/data/hll0902';

查询：

方法①：使用round保留两位小数有时候并不生效，推荐使用方法②

with t as
(
select * ,
row_number() over(distribute by t_year sort by t_money desc) drn
from hll_city_income
)
select 
t_year,
round(sum(case when t_region='华南' then t_money else 0 end),2) t_south_money,
round(sum(case when t_region='华北' then t_money else 0 end),2) t_north_money,
round(sum(case when t_region='华南' then t_money else 0 end)/sum(case when t_region='华南' then 1 else 0 end),2) t_south_avg_money,
round(sum(case when t_region='华北' then t_money else 0 end)/sum(case when t_region='华北' then 1 else 0 end),2) t_north_avg_money,
max(case when t.drn=1 then t_city else 0 end) t_first_city
from t group by t.t_year;

结果如下：
在这里插入图片描述

方法②：使用 cast转型，将数据转为 decimal 类型

select 
t.t_year,
cast(sum(case when t.t_region='华南' then t.t_money else 0 end ) as decimal(10,2)) t_south_money,
cast(sum(case when t.t_region='华北' then t.t_money else 0 end ) as decimal(10,2)) t_north_money,
cast(sum(case when t.t_region='华南' then t.t_money else 0 end )/sum(case when t.t_region='华南' then 1 else 0 end )as decimal(10,2)) t_south_avg_money,
cast(sum(case when t.t_region='华北' then t.t_money else 0 end )/sum(case when t.t_region='华北' then 1 else 0 end )as decimal(10,2)) t_north_avg_money,
max(case when t.drn=1 then t.t_city else 0 end ) t_first_city 
from 
(
select 
tci.*,
row_number() over(distribute by tci.t_year sort by tci.t_money desc ) drn 
from hll_city_income tci
) t
group by t.t_year
;

结果如下：
在这里插入图片描述

我玩的很开心

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
hive常见习题解析

hive常见习题解析已有shop_id,item_id,num三列, 使用HiveSQL计算得到a,b列(温馨提示: 按照shop_id分组, a为num值/每组num的和, b为a的组内排序)解析：建表create table shops(shop_id char(1),item_id char(1), num int);插入数据Insert into shops values("A","a",10),("A","b",12),("B","a",8),("A","c",
复制链接

扫一扫

专栏目录