一 分析函数定义
在数据库中,分析函数是指对一组行计算聚合值的函数。与对一组行返回单个聚合值的聚合函数不同,分析函数对一组输入行计算函数,从而对每一行返回一个值。
简化语法:
analytic_function_name ( [ argument_list ] )
OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, …] ]
[ window_frame_clause ]
)
分析函数需使用 OVER 子句,后者定义分析函数要评估的 window frame。OVER 子句包含下面三个可选子句。
PARTITION BY 子句将输入行划分成多个分区。该子句与 GROUP BY 类似,但并不会实际合并具有相同键的行。
ORDER BY 子句指定每个分区中的排序方式。
window_frame_clause 定义当前分区中的 window frame。
OVER 子句还可为空 (OVER());此情况下,window frame 包括所有输入行。
系统会先进行聚合(GROUP BY 和非分析聚合函数),再评估分析函数。
二 示例数据库
先创建一个简单数据库:
CREATE TABLE [dbo].[order_header] (
[ORDER_NO] INT NOT NULL,
[ORDER_TYPE] NVARCHAR (50) NULL,
[ORDER_TEXT] NVARCHAR (50) NULL,
[ORDER_QTY] INT NULL
);
插入数据:
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (1, N’A’, N’Apple’, 1)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (2, N’C’, N’Pear’, 20)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (3, N’A’, N’Kiwi’, 30)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (4, N’B’, N’Watermelon’, 40)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (5, N’D’, N’Banana’, 100)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (6, N’C’, N’Apple’, 50)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (7, N’A’, N’Pear’, 62)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (8, N’B’, N’Pear’, 73)
INSERT INTO [dbo].[order_header] ([ORDER_NO], [ORDER_TYPE], [ORDER_TEXT], [ORDER_QTY]) VALUES (9, N’A’, N’Apple’, 84)
数据如下:
三 分析函数汇总
-
count() over() :统计分区中各组的行数,partition by 可选,order by 可选
-
sum() over() :统计分区中记录的总和,partition by 可选,order by 可选
select order_no,order_type,order_text,order_qty ,sum(order_qty) over() as allquanty from order_header ; --总累计求和
select order_no,order_type,order_text,order_qty ,sum(order_qty) over(order by order_no) as allquanty from order_header ; --递加累计求和
select ename,esex,eage,sum(salary) over(partition by esex) from emp; --分组累计求和
select order_no,order_type,order_text,order_qty ,sum(order_qty) over(partition by order_type order by order_no) as allquanty from order_header --分组递加累计求和
-
avg() over() :统计分区中记录的平均值,partition by 可选,order by 可选
select order_no,order_type,order_text,order_qty ,avg(order_qty) over() as allquanty from order_header; --总平均值
select order_no,order_type,order_text,order_qty ,avg(order_qty) over(order by order_no) as allquanty from order_header; --递加求平均值
select order_no,order_type,order_text,order_qty ,avg(order_qty) over(partition by order_type) as allquanty from order_header; --分组求平均值
select order_no,order_type,order_text,order_qty ,avg(order_qty) over(partition by order_type order by order_no) as allquanty from order_header --分组递加求平均值 -
min() over() :统计分区中记录的最小值,partition by 可选,order by 可选
max() over() :统计分区中记录的最大值,partition by 可选,order by 可选
select order_no,order_type,order_text,order_qty ,max(order_qty) over(partition by order_type) as quantity from order_header–分组求最大值
select order_no,order_type,order_text,order_qty ,max(order_qty) over(partition by order_type order by order_no) as quantity from order_header–分组累计求最大值 -
rank() over() 和dense_rank() over,row_number(): 一个是跳跃排序一个是连续排序(可以有重复值),row_number则是无重复值的排序
select order_no,order_type,order_text,order_qty ,rank() over(partition by order_type order by order_no) as quantity from order_header -
first_value() over() :取出分区中第一条记录的字段值,partition by 可选,order by 可选
last_value() over() :取出分区中最后一条记录的字段值,partition by 可选,order by 可选
select order_no,order_type,order_text,order_qty ,first_value(ORDER_QTY) over(partition by order_type order by order_no) as quantity from order_header
也可以选择加入一个窗口来求窗口中第一行或者最后一行的值:
select order_no,order_type,order_text,order_qty ,first_value(ORDER_QTY) over(order by order_no ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as quantity from order_header
-
lag() over() :取出前n行数据,partition by 可选,order by 必选
lead() over() :取出后n行数据,partition by 可选,order by 必选
select order_no,order_type,order_text,order_qty ,lag(ORDER_QTY,1) over(order by order_no) as quantity from order_header
-
ntile() over():NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值,一般可以用来取经常用来取带有百分之多少比例的记录。
select order_no,order_type,order_text,order_qty ,ntile(2)
over(partition by order_type order by order_qty desc) as nt
from order_header
总结
借助这些函数,OVER 子句只是附加到聚合函数调用上;函数调用语法保持不变。与其对应的聚合函数一样,这些分析函数执行聚合操作,特别是针对每一行的相关窗口框架执行聚合操作。而且,这些分析函数的结果数据类型与其对应的聚合函数相同。