Hive——Hive窗口分析函数总结

最新推荐文章于 2020-12-21 08:55:30 发布

Southwest-

最新推荐文章于 2020-12-21 08:55:30 发布

阅读量147

点赞数

分类专栏： Hive 文章标签： Hive 窗口函数

本文链接：https://blog.csdn.net/lovetechlovelife/article/details/110247159

版权

Hive 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

窗口分析函数

窗口分析函数

函数中用到的表数据如下图：
在这里插入图片描述

1. 分析函数

row_number()
rank()
dense_rank()

这3个函数通常用在组内排序中，但实现的效果却不相同，用法如下：

select 
    name,subject,score,
    row_number() over(partition by name order by score) rn, 
    rank() over(partition by name order by score) rk,
    dense_rank() over(partition by name order by score) dr
from grade;

排序之后编号对比，如下图：
在这里插入图片描述

2. 聚合函数

count()组内计数
sum()组内求和
avg()组内求平均值
max()&min()组内求最大最小值

下面SQL以sum函数为例展示聚合函数的用法，其他函数的用法类似。

select 
	name,subject,score,
	sum(score) over(partition by name order by subject) as sum1, -- 分组内起始行到当前行的和
	sum(score) over(partition by name order by subject rows between 1 preceding and current row) as sum2, --分组内当前行与前一行的和
	sum(score) over(partition by name order by subject rows between 1 preceding and 2 following) as sum3, --分组内当前行与后两行的和
	sum(score) over(partition by name order by subject rows between unbounded preceding and current row) as sum4, --分组内起始行到当前行的和
	sum(score) over(partition by name order by subject rows between current row and unbounded following) as sum5 ---分组内当前行到终止行的和
from grade;

计算结果如下图：
聚合函数

3. 窗口函数

lag(col, n, default) 表示分组内列(col)的当前行之前的第n行, default为默认值
lead(col, n, default) 表示分组内列(col)的当前行之后的第n行, default为默认值

用法如下图：

select 
    name,subject,score,
    lag(score, 1, 0) over(partition by name order by score) lag, -- lag(score, 1, 0)表示分组内列(score)的当前行之前的第1行, 0为默认值
    lead(score, 1, 0) over(partition by name order by score) lead -- lead(score, 1, 0)表示分组内列(score)的当前行之后的第1行, 0为默认值
from grade;

结果如下图：
在这里插入图片描述

first_value(col) 组内排序第一个值
last_value(col) 组内排序最后一个值

用法如下图：

select 
    name,subject,score,
    first_value(score) over(partition by name order by score) f
from grade;

在这里插入图片描述

Southwest-

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive——Hive窗口分析函数总结

文章目录窗口分析函数1. 分析函数2. 聚合函数3. 窗口函数窗口分析函数函数中用到的表数据如下图：1. 分析函数row_number()rank()dense_rank()这3个函数通常用在组内排序中，但实现的效果却不相同，用法如下：select name,subject,score, row_number() over(partition by name order by score) rn, rank() over(partition by name
复制链接

扫一扫