关于窗口函数的使用

Hive 窗口函数在计算部分与整体之间的关系中常用到,可以达到简化SQL,减少Hive 执行计划产生的MR数。

实列:1

需求:统计各城市用户在总体的占比

数据结构:「user_city」表中有两个字段「 user_name」,「city」,表示用户姓名与所在城市

分析:上述需求中明显需要统计每个「城市」的用户数与「所有用户数」,然后做商得到最终结果

实现1:不适用窗口函数

select city,users/total as percentage
(
-- 计算每个城市的用户数
(select 
    count(user_name) as users,
    city,
from user_city
group by city) city
left join 
-- 计算用户总数
(select
    count(user_name) as total
from user_city) total
on 1=1
) a

实现2:使用窗口函数

select city,
    count(user_name)/sum(count(user_name)) over() as percentage
from user_city
group by city

实例:2

需求:统计每个城市的人数,且将用户按用户姓名排序编号

数据结构:同上

分析:最终需要的字段是「user_name,city,user_couty,rank」

实现1:分开使用窗口函数

select
    user.user_name,
    city.city,
    city.user_county,
    user.rank
from
 (select 
    city,
    count(user_name) as user_county
  from user_city
  group by city) city 
  left join 
 (select
    user_name,
    city,
    row_number() over(partition by city order by user_name) as rank
  from user_city
  ) user on user.city = city.city

实现2:使用两个窗口函数

select 
    user_name,
    city,
    count(user_name) over(partition by city) as user_county,
    row_number() over(partition by city order by user_name) as rank
from user_city

补充:

-- 窗口函数中的 partition by 与 order by
partition by作用是限定逻辑计算的边界,border by是排序

user_name    income    month
a            10        1
a            10        2
a            10        3
b            20        1
b            20        2
b            20        3

需求:结果展示 user_name,income,month,累计收入(截止到当前月份的收入总额)

select
     user_name,
    income,month,
    sum(income) over(partition name,month order by month) 
from t_test;

**order by 在这里面起到排序并使计算从起始行到当前行的作用


select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按name分组,组内数据相加
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )  as sample4 ,--和sample3一样,由起点到当前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   and current row) as sample5, --当前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING   AND 1 FOLLOWING  ) as sample6,--当前行和前边一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行
from t_window;

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值