Hive窗口函数之preceding and following

最新推荐文章于 2023-05-25 13:39:07 发布

HappyRocking

最新推荐文章于 2023-05-25 13:39:07 发布

阅读量5.1k

点赞数 4

分类专栏：大数据/hive 文章标签： hive

本文链接：https://blog.csdn.net/HappyRocking/article/details/105369558

版权

大数据/hive 专栏收录该内容

9 篇文章 6 订阅

订阅专栏

Hive窗口函数中，有一个功能是统计当前行之前或之后指定行作为一个聚合，关键字是 preceding 和 following，举例说明其使用方法。

一、加载测试数据

在 hive 环境中创建临时表：

create table tmp_student
(
   name           string,
   class          tinyint,
   cooperator_name   string,
   score          tinyint
)
row format delimited fields terminated by '|';

然后加载测试数据：

load data local inpath 'text.txt' into table tmp_student;

其中，text.txt 内容如下：

adf|3|测试公司1|45
asdf|3|测试公司2|55
cfe|2|测试公司2|74
3dd|3|测试公司5|n
fda|1|测试公司7|80
gds|2|测试公司9|92
ffd|1|测试公司10|95
dss|1|测试公司4|95
ddd|3|测试公司3|99
gf|3|测试公司9|99

查看是否加载成功：

hive> select * from tmp_student;
OK
adf	3	测试公司1	45
asdf	3	测试公司2	55
cfe	2	测试公司2	74
3dd	3	测试公司5	NULL
fda	1	测试公司7	80
gds	2	测试公司9	92
ffd	1	测试公司10	95
dss	1	测试公司4	95
ddd	3	测试公司3	99
gf	3	测试公司9	99
Time taken: 1.314 seconds, Fetched: 10 row(s)

二、测试窗口函数

执行sql：

select
    name,
    score,
    sum(score) over(order by score range between 2 preceding and 2 following) s1, -- 当前行的score值加减2的范围内的所有行
    sum(score) over(order by score rows between 2 preceding and 2 following) s2, -- 当前行+前后2行，一共5行
    sum(score) over(order by score range between unbounded preceding and unbounded following) s3, -- 全部行，不做限制
    sum(score) over(order by score rows between unbounded preceding and unbounded following) s4, -- 全部行，不做限制
    sum(score) over(order by score) s5, -- 第一行到当前行（和当前行相同score值的所有行都会包含进去）
    sum(score) over(order by score rows between unbounded preceding and current row) s6, -- 第一行到当前行（和当前行相同score值的其他行不会包含进去，这是和上面的区别）
    sum(score) over(order by score rows between 3 preceding and current row) s7, -- 当前行+往前3行
    sum(score) over(order by score rows between 3 preceding and 1 following) s8, --当前行+往前3行+往后1行
    sum(score) over(order by score rows between current row and unbounded following) s9 --当前行+往后所有行
from
    tmp.tmp_student
order by 
    score;

结果如下：

name	score	s1	s2	s3	s4	s5	s6	s7	s8	s9
3dd	NULL	NULL	100	734	734	NULL	NULL	NULL	45	734
adf	45	45	174	734	734	45	45	45	100	734
asdf	55	55	254	734	734	100	100	100	174	689
cfe	74	74	346	734	734	174	174	174	254	634
fda	80	80	396	734	734	254	254	254	346	560
gds	92	92	436	734	734	346	346	301	396	480
ffd	95	190	480	734	734	536	536	362	461	293
dss	95	190	461	734	734	536	441	341	436	388
ddd	99	198	293	734	734	734	734	388	388	99
gf	99	198	388	734	734	734	635	381	480	198
说明	score升序排列	95加减2，包括95，所以需要加起来，等于190	当前行+前后2行	全部行	全部行	第一行到当前行（和当前行相同score值的所有行都会包含进去）	第一行到当前行	当前行+往前3行	当前行+往前3行+往后1行	当前行+往后所有行

备注：
1、对于 score 相同的行，其order by之后的顺序会不确定，因此这两行的窗口函数结果可以互换，比如 s9 的倒数两行，按照内置计算逻辑，应该是倒数第二行为 198，倒数第一行为 99。

HappyRocking

关注

4
点赞
踩
22

收藏

觉得还不错? 一键收藏
2
评论
Hive窗口函数之preceding and following

Hive窗口函数中，有一个功能是统计当前行之前或之后指定行作为一个聚合，关键字是 preceding 和 following，举例说明其使用方法。一、加载测试数据在 hive 环境中创建临时表：create table tmp_student( name string, class tinyint, cooperator_name ...
复制链接

扫一扫