Hive窗口函数的使用场景

RANK() 并列 排名不连续, 1 1 1 4
DENSE_RANK() 并列 排名连续,1 1 1 2
ROW_NUMBER() 顺序排名 1 2 3 4

select tt.vin, tt.insurance_source, tt.business_end
from (select t.vin,t.insurance_source,
cast(cast(t.business_end as TIMESTAMP) as string) business_end,
RANK() over(PARTITION by t.vin order by t.business_end desc,t.insurance_item_id desc) as x
from sbpopt.tt_insurance_item t
where t.business_end is not null
and substr(net_code, -1) = 0
and t.vin is not null
and cast(t.create_time as TIMESTAMP) >= date_add(CURRENT_DATE,-1)
and cast(t.create_time as TIMESTAMP) < CURRENT_DATE) tt
where x = 1

(1)oracle使用max() keep和row_number() over转换

select dept_id
,max(sale_num)keep ( dense_rank first order by sale_date) sale_num
,max(sale_cnt)keep ( dense_rank first order by sale_date) sale_cnt
from criss_sales
group by dept_id

改成HIVE语句后:

select aa.dept_id,aa.sale_num,aa.sale_cnt from
(
select dept_id
,max(sale_num) over (partition by dept_id order by sale_date) sale_num
,max(sale_cnt) over (partition by dept_id order by sale_date) sale_cnt
from criss_sales
) aa
group by aa.dept_id,aa.sale_num,aa.sale_cnt

然后发现并不是分组取最大值,而是先取最新的日期,如果最新的日期中有多条再取较大的值!!!
最终HIVE实现为:

select a.dept_id,max(a.sale_num),max(a.sale_cnt) from
(
select dept_id,sale_num,sale_cnt,
dense_rank() over (partition by dept_id order by sale_date) rn
from criss_sales
) a where a.rn=1
group by a.dept_id

(2)group by和max函数一起使用的坑:
一张剧目演出表,表中有(剧目ID,剧目名称,演出单位,票房)
问:查询出每个演出单位票房最高的剧目名称。(听起来很简单吧)

现在我们看看数据:
(name是剧目名称,desp是演出单位,price是票房)
在这里插入图片描述
错误的使用方式:
在这里插入图片描述
对比后发现name字段信息不对。。。
虽然找到的是最大的price,但是与之匹配的剧目名称name却不对应,而是group by分组后的第一条记录的基本信息。

最终正确Hive SQL方式:

select a.*
from (select name,desp,price,
row_number() over(partiton by desp order by price desc) rm from show_plan) a
where a.rm=1

窗口函数测试案例:
建表语句

drop table asmp.window_function_temp;
create table if not exists asmp.window_function_temp
(
uname string,
create_time string,
pv string
) 
row format delimited fields terminated by ',' lines terminated by '\n';

insert overwrite table asmp.window_function_temp
select 
split(detail,',')[0] as uname
,split(detail,',')[1] as create_time
,split(detail,',')[2] as pv
from
(select concat('测试用户,2019-10-02,7#测试用户,2019-10-05,4#测试用户,2019-10-07,5#测试客户,2019-10-03,6#测试用户,2019-10-04,3#测试用户,2019-10-01,3#测试客户,2019-10-06,4') as ct_str
) t
lateral view explode(split(ct_str,'#')) t2 as detail;

select * from asmp.window_function_temp;

测试语句

select 
uname
,create_time
,pv
,MAX(pv) over (partition by uname order by create_time) as sum_pv_1 --默认情况
,MAX(pv) over (partition by uname order by create_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_pv_2 --表示从起点到当前行
,MAX(pv) over (partition by uname) as sum_pv_3 --表示窗口内所有行
,MAX(pv) over (partition by uname order by create_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_pv_4 --表示起点到终点
,MAX(pv) over (partition by uname order by create_time ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) as sum_pv_5 --表示前2行到后面1行
from asmp.window_function_temp;

结果:
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值