hive窗口函数

建立一张测试表

CREATE TABLE test (cookieid STRING, create_time STRING, pv INT) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

查看表数据

select * from test;

+----------------+-------------------+----------+--+
| test.cookieid  | test.create_time  | test.pv  |
+----------------+-------------------+----------+--+
| cookieid1      | 2019-01-01        | 1        |
| cookieid1      | 2019-01-02        | 2        |
| cookieid1      | 2019-01-03        | 3        |
| cookieid1      | 2019-01-03        | 3        |
| cookieid1      | 2019-01-04        | 4        |
| cookieid2      | 2019-01-01        | 1        |
| cookieid2      | 2019-01-02        | 2        |
| cookieid2      | 2019-01-03        | 3        |
| cookieid2      | 2019-01-03        | 3        |
| cookieid2      | 2019-01-04        | 4        |
+----------------+-------------------+----------+--+
SUM(), MIN(),MAX(),AVG()等聚合函数
  • SUM(col) 对窗口期的指定列进行求和
  • MIN(col) 取窗口期的指令列的最小值
  • MAX(col) 取窗口期的指定列的最大值
  • AVG(col) 取窗口期的指定列的平均值

查询示例

select *,
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and current row),
SUM(pv) over (partition by cookieid order by create_time ROWS between 2 preceding and 1 following),
SUM(pv) over (partition by cookieid order by create_time),
MIN(pv) over (partition by cookieid order by create_time),
MAX(pv) over (partition by cookieid order by create_time),
AVG(pv) over (partition by cookieid order by create_time)
 from test;

+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  | _wcol4  | _wcol5  |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | 1       | 3       | 1       | 1       | 1       | 1.0     |
| cookieid1      | 2019-01-02        | 2        | 3       | 6       | 3       | 1       | 2       | 1.5     |
| cookieid1      | 2019-01-03        | 3        | 6       | 9       | 9       | 1       | 3       | 2.25    |
| cookieid1      | 2019-01-03        | 3        | 8       | 12      | 9       | 1       | 3       | 2.25    |
| cookieid1      | 2019-01-04        | 4        | 10      | 10      | 13      | 1       | 4       | 2.6     |
| cookieid2      | 2019-01-01        | 1        | 1       | 3       | 1       | 1       | 1       | 1.0     |
| cookieid2      | 2019-01-02        | 2        | 3       | 6       | 3       | 1       | 2       | 1.5     |
| cookieid2      | 2019-01-03        | 3        | 6       | 9       | 9       | 1       | 3       | 2.25    |
| cookieid2      | 2019-01-03        | 3        | 8       | 12      | 9       | 1       | 3       | 2.25    |
| cookieid2      | 2019-01-04        | 4        | 10      | 10      | 13      | 1       | 4       | 2.6     |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+---------+--+
NTILE(), ROW_NUMBER(), RANK(), DENSE_RANK()

NTILE(num) 根据over里的partition by 来切分分区,并将分区切分为num类
ROW_NUMBER() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[不对重复数据做特殊处理]
DENSE_RANK() 根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,但保证行号连续]
RANK()根据over里的partition by 来切分分区,并为每个分区数据添加行号[重复数据取相同行号,如果有n行数据重复,则后一行数据行号为 x+n ]

查询示例

select *,
NTILE(2) over (partition by cookieid order by create_time),
ROW_NUMBER() over (partition by cookieid order by create_time),
RANK() over (partition by cookieid order by create_time),
DENSE_RANK() over (partition by cookieid order by create_time)
from test;

+----------------+-------------------+----------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | 1       | 1       | 1       | 1       |
| cookieid1      | 2019-01-02        | 2        | 1       | 2       | 2       | 2       |
| cookieid1      | 2019-01-03        | 3        | 1       | 3       | 3       | 3       |
| cookieid1      | 2019-01-03        | 3        | 2       | 4       | 3       | 3       |
| cookieid1      | 2019-01-04        | 4        | 2       | 5       | 5       | 4       |
| cookieid2      | 2019-01-01        | 1        | 1       | 1       | 1       | 1       |
| cookieid2      | 2019-01-02        | 2        | 1       | 2       | 2       | 2       |
| cookieid2      | 2019-01-03        | 3        | 1       | 3       | 3       | 3       |
| cookieid2      | 2019-01-03        | 3        | 2       | 4       | 3       | 3       |
| cookieid2      | 2019-01-04        | 4        | 2       | 5       | 5       | 4       |
+----------------+-------------------+----------+---------+---------+---------+---------+--+
LAG, LEAD, FIRST_VALUE, LAST_VALUE

LAG(col,num) 对某一行进行向后错行
LEAD(col,num)对某一行进行向前错行
FIRST_VALUE(col) 取窗口期内第一次看到该列的值
LAST_VALUE(col) 取窗口期内最后一次看到该列的值

查询示例

select *,
LAG(pv,2) over (partition by cookieid order by create_time ),
LEAD(pv,2) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time),
FIRST_VALUE(pv) over (partition by cookieid order by create_time rows between 1 preceding and current row),
LAST_VALUE(pv) over (partition by cookieid order by create_time)
from test;

+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| test.cookieid  | test.create_time  | test.pv  | _wcol0  | _wcol1  | _wcol2  | _wcol3  | _wcol4  |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
| cookieid1      | 2019-01-01        | 1        | NULL    | 3       | 1       | 1       | 1       |
| cookieid1      | 2019-01-02        | 2        | NULL    | 3       | 1       | 1       | 2       |
| cookieid1      | 2019-01-03        | 3        | 1       | 4       | 1       | 2       | 3       |
| cookieid1      | 2019-01-03        | 3        | 2       | NULL    | 1       | 3       | 3       |
| cookieid1      | 2019-01-04        | 4        | 3       | NULL    | 1       | 3       | 4       |
| cookieid2      | 2019-01-01        | 1        | NULL    | 3       | 1       | 1       | 1       |
| cookieid2      | 2019-01-02        | 2        | NULL    | 3       | 1       | 1       | 2       |
| cookieid2      | 2019-01-03        | 3        | 1       | 4       | 1       | 2       | 3       |
| cookieid2      | 2019-01-03        | 3        | 2       | NULL    | 1       | 3       | 3       |
| cookieid2      | 2019-01-04        | 4        | 3       | NULL    | 1       | 3       | 4       |
+----------------+-------------------+----------+---------+---------+---------+---------+---------+--+
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值