提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
目录
前言
求数据表中第N行数据波动值,第n行数据的值见去前N行数据的平均值:
公式 = (An - avg)/avg
提示:以下是本篇文章正文内容,下面案例可供参考
一、hive窗口函数rows between使用
window 子句 rows between
- preceding:往前
- following:往后
- current row:当前行
- unbounded:起点
- unbounded preceding:表示从前面的起点
- unbounded following:表示到后面的终点
二、使用步骤
1.数据准备
1、造测试数据
CREATE TABLE `test_product` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`save_year` char(4) NOT NULL,
`save_month` char(1) NOT NULL,
`product_num` int NOT NULL,
`day_num` int DEFAULT NULL,
`save_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb3;
INSERT INTO test_product (save_year,save_month,product_num,day_num,save_date)
VALUES
( '2020', '1', 3200, 3210, '2020-01-11 08:00:00' ),
( '2020', '2', 3210, 3220, '2020-02-11 09:00:00' ),
( '2020', '3', 3220, 3230, '2020-03-11 09:00:00' ),
( '2020', '4', 3230, 3240, '2020-04-11 09:00:00' ),
( '2021', '1', 3240, 3250, '2021-01-11 09:00:00' ),
( '2021', '2', 3250, 3260, '2021-02-11 00:00:00' ),
( '2021', '3', 3260, 3270, '2021-03-11 09:00:00' ),
( '2021', '4', 3270, 3280, '2021-04-11 00:00:00' );
2.rows between的基本使用
1、查询前n条数据平局值
select *,avg(product_num) over(order by save_date desc rows BETWEEN unbounded preceding and current row) as avg_n from test_product
2、查询当前行数据前2条到当前行数的和
select *,sum(product_num) over(order by save_date desc rows BETWEEN 2 preceding and current row) as sum_n from test_product
3、查询当前行数据之后的所有行数据的和
select *,sum(product_num) over(order by save_date desc rows BETWEEN current row and unbounded following) as sum_n from test_product
4、查询当前行前1行后1行当前行,3行数据的和
select *,sum(product_num) over(order by save_date desc rows BETWEEN 1 preceding and 1 following) as sum_n from test_product
3.查询波动值
1、查询第N行信息, 以及前n行数据的sum和前n行的AVG。 avg_n为前N行数据平均值,sum_n为前N行数据的和
select *,avg(product_num) over(order by save_date desc rows BETWEEN unbounded preceding and current row) as avg_n,
sum(product_num) over(order by save_date desc rows BETWEEN unbounded preceding and current row) as sum_n
from test_product
2、 查询数据波动值
select save_date,product_num,day_num,(product_num-avg_n) as bdz
from(
select *,avg(product_num) over(order by save_date desc rows BETWEEN unbounded preceding and current row) as avg_n
from test_product
)t