SQL Server中的窗口函数 - CareySon - 博客园 http://www.cnblogs.com/CareySon/p/3411176.html
1. 窗口计算简介
为了支持窗口计算,SQL server提供了OVER子句和窗口函数。窗口计算的两个主要应用就是对每组内的数据进行排序和聚合计算。
因此,开窗函数也分为排名开窗函数与聚合开窗函数。排名开窗函数如ROW_NUMBER, RANK; 聚合开窗函数如AVG和SUM。
开窗函数支持分区、排序和框架三种元素,其语法格式如下:
函数名称(<参数>)OVER (
[ <PARTITION BY clause> ]
[ <ORDER BY clause> ]
[ <ROW or RANGE clause> ]
OVER Clause (Transact-SQL) https://msdn.microsoft.com/en-us/library/ms189461.aspx)
(1)窗口的分区
通过分区,可以将窗口限定为与当前行的分区列具有相同值的那些行。分区与分组比较类似,但是分组指定后对于整个SELECT语句只能按照这个分组,但是
分区可以在一条语句中指定不同的分区。
<PARTITION BY clause> ::=
PARTITION BY value_expression , ... [ n ]
(2)窗口的排序
分区之后可以指定排序列,那么在窗口计算之前,各个窗口的行的逻辑顺序将确定。
<ORDER BY clause> ::= ORDER BY order_by_expression [ COLLATE collation_name ] [ ASC | DESC ] [ ,...n ](3)窗口的框架
框架是对窗口进行进一步的分区,框架有两种范围限定方式:一种是使用ROWS子句,通过指定当前行之前或之后的固定数目的行来限制分区中的行数;
另一种是RANGE子句,按照排序列的当前值,根据相同值来确定分区中的行数。
<ROW or RANGE clause> ::= { ROWS | RANGE } <window frame extent>
<window frame extent> ::=
{ <window frame preceding>
| <window frame between>
}
<window frame between> ::=BETWEEN <window frame bound> AND <window frame bound>
<window frame bound> ::=
{ <window frame preceding>
| <window frame following>
}
<window frame preceding> ::=
{
UNBOUNDED PRECEDING
| <unsigned_value_specification> PRECEDING
| CURRENT ROW
}
<window frame following> ::=
{
UNBOUNDED FOLLOWING
| <unsigned_value_specification> FOLLOWING
| CURRENT ROW
}
<unsigned value specification> ::={ <unsigned integer literal>
其中UNBOUNDED PRECEDING指的是窗口分区后的第一行,<unsigned_value_specification> PRECEDING指的是窗口当前行之前的第n行,
CURRENT ROW指的是当前行。
相反地,
UNBOUNDED FOLLOWING指的是窗口分区后的最后一行,<unsigned_value_specification> FOLLOWING指的是窗口当前行之后的第n行。
注意:
- 如果没有指定分区(PARTITION BY),那么窗口函数将对查询所得的所有行进行计算;
- 如果没有指定排序(ORDER BY),那么整个分区将作为窗口框架,但窗口函数只能使用不需要ORDER BY的函数,如SUM等
- 当使用框架时,必须要有ORDER BY子句,如果仅指定了ORDER BY子句而未指定框架,那么默认框架将采用 RANGE UNBOUNDED PRECEDING AND CURRENT ROW,但是这仅适用于能够使用ROWS/RANGE子句的窗口函数,例如排序窗口函数不能包含ROWS/RANGE子句,那么它将忽视这种默认框架。
- RANGE 只支持使用 UNBOUNDED 和 CURRENT ROW 窗口框架分隔符。
- 如果ROWS/RANGE子句采用 <window frame preceding>,那么CURRENT ROW 作为框架的默认结束行,例如:“ROWS 5 PRECEDING” 等价于 “ROWS BETWEEN 5 PRECEDING AND CURRENT ROW”.
2. 基于窗口的排名计算
对于排名函数来说,OVER子句中可以包含PARTITION BY与ORDER BY子句,其中必须包含ORDER BY子句,显然,对于排名而言,没有顺序的排名没有任何意义。
排名函数包含4个:ROW_NUMBER(),RANK(),DENSE_RANK(),NTILE()
(1)ROW_NUMBER()
返回分区内行的序列号,分区内的每行的序列号是唯一的,且从1开始。
(2)
RANK(),DENSE_RANK()
ROW_NUMBER()用于编号,它与排名具有不同的概念,排名具有确定性,相同的排序值总会被分配到相同的排名值。其中RANK()在排名时是有间隔的,比如两个第一名,那么第二名排名值是3,
而DENSE_RANK()则为密集排民,名次之间是无间隔的,比如两个第一名,那么第二名排名值是2.
USE AdventureWorks2012; GO SELECT i.ProductID, p.Name, i.LocationID, i.Quantity ,RANK() OVER (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank FROM Production.ProductInventory AS i INNER JOIN Production.Product AS p ON i.ProductID = p.ProductID WHERE i.LocationID BETWEEN 3 AND 4 ORDER BY i.LocationID; GO
ProductID Name LocationID Quantity Rank ----------- ---------------------- ------------ -------- ---- 494 Paint - Silver 3 49 1 495 Paint - Blue 3 49 1 493 Paint - Red 3 41 3 496 Paint - Yellow 3 30 4 492 Paint - Black 3 17 5 495 Paint - Blue 4 35 1 496 Paint - Yellow 4 25 2 493 Paint - Red 4 24 3 492 Paint - Black 4 14 4 494 Paint - Silver 4 12 5 (10 row(s) affected)
USE AdventureWorks2012; GO SELECT i.ProductID, p.Name, i.LocationID, i.Quantity ,DENSE_RANK() OVER (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank FROM Production.ProductInventory AS i INNER JOIN Production.Product AS p ON i.ProductID = p.ProductID WHERE i.LocationID BETWEEN 3 AND 4 ORDER BY i.LocationID; GO
ProductID Name LocationID Quantity Rank ----------- ---------------------------------- ---------- -------- ----- 494 Paint - Silver 3 49 1 495 Paint - Blue 3 49 1 493 Paint - Red 3 41 2 496 Paint - Yellow 3 30 3 492 Paint - Black 3 17 4 495 Paint - Blue 4 35 1 496 Paint - Yellow 4 25 2 493 Paint - Red 4 24 3 492 Paint - Black 4 14 4 494 Paint - Silver 4 12 5 (10 row(s) affected)
(3)NTILE()
用于把行分发到指定数目的组中,各个组有编号,编号从1开始,对于每一行,将返回此行所属组的编号。NTILE函数可以接受一个代表组数量的参数,分组的方式为“均分”原则,
比如:假设一个分区有10行,那么需要分成2组,那么每个组有5行,其编号分别为1,2;如果表有11行,需要分成3个组,这时候是无法均分的,此时的分配方法为先得到一个能整除的基组
大小(11/3=3),每组应当分配3行,剩余的2行会被再次均分到前面的2组中。
USE AdventureWorks2012; GO SELECT p.FirstName, p.LastName ,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile ,CONVERT(nvarchar(20),s.SalesYTD,1) AS SalesYTD , a.PostalCode FROM Sales.SalesPerson AS s INNER JOIN Person.Person AS p ON s.BusinessEntityID = p.BusinessEntityID INNER JOIN Person.Address AS a ON a.AddressID = p.BusinessEntityID WHERE TerritoryID IS NOT NULL AND SalesYTD <> 0; GO
3. 基于窗口的分布计算
从SQLServer 2012开始,提供了四个排名分布函数,包括PERCENT_RANK(),CUME_DIST(),PERCENTILE_CONT()和PERCENTILE_DISC().
其中PERCENT_RANK()用于计算某行的相对排名,CUME_DIST()用于计算行的累积分布(即相对位置),PERCENTILE_CONT()和PERCENTILE_DISC()
用于根据指定的比例返回组中相应的数值,如中位值。换句话说,PERCENT_RANK()与CUME_DIST()是根据数值计算比例,而PERCENTILE_CONT()和PERCENTILE_DISC()
则是根据比例计算数值,类似于PERCENT_RANK()与CUME_DIST()的逆运算。
(1)PERCENT_RANK()
用于计算某行的相对排名,返回一个0~1之间的一个小数值,其语法为:
PERCENT_RANK( ) OVER ( [ partition_by_clause ] order_by_clause )
USE AdventureWorks2012; GO SELECT Department, LastName, Rate, CUME_DIST () OVER (PARTITION BY Department ORDER BY Rate) AS CumeDist, PERCENT_RANK() OVER (PARTITION BY Department ORDER BY Rate ) AS PctRank FROM HumanResources.vEmployeeDepartmentHistory AS edh INNER JOIN HumanResources.EmployeePayHistory AS e ON e.BusinessEntityID = edh.BusinessEntityID WHERE Department IN (N'Information Services',N'Document Control') ORDER BY Department, Rate DESC;
(2)CUME_DIST()
用于计算某个值在一组值内的累积分布,也就是说该值在一组值中的相对位置。函数返回一个0~1之间的一个小数值。
CUME_DIST( ) OVER ( [ partition_by_clause ] order_by_clause )
(2)PERCENTILE_CONT()和PERCENTILE_DISC()
用于根据指定的比例返回组中相应的数值,所指定的比例应当是一个个0~1之间的小数值。其中的WITHIN GROUP子句用于指定要排序的数值列表,
OVER子句则可以指定一个分区。
PERCENTILE_CONT/PERCENTILE_DISC() ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <partition_by_clause> ] )
USE AdventureWorks2012; SELECT DISTINCT Name AS DepartmentName ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ph.Rate) OVER (PARTITION BY Name) AS MedianCont ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ph.Rate) OVER (PARTITION BY Name) AS MedianDisc FROM HumanResources.Department AS d INNER JOIN HumanResources.EmployeeDepartmentHistory AS dh ON dh.DepartmentID = d.DepartmentID INNER JOIN HumanResources.EmployeePayHistory AS ph ON ph.BusinessEntityID = dh.BusinessEntityID WHERE dh.EndDate IS NULL;
值得注意的是,
PERCENTILE_CONT()对于偶数行,将计算平均值,而
PERCENTILE_DISC()返回的永远是已知值。
4. 基于窗口的偏移计算
SQL Server 2012 引入了4个偏移函数:LAG和LEAD,FIRST_VALUE和LAST_VALUE,分别用于从当前行的某个偏移量或者一个框架的开头或者结尾的行返回一个元素。
其中LAG和LEAD支持窗口分区和窗口排序子句,FIRST_VALUE和LAST_VALUE在支持窗口分区和排序子句的基础上,还支持窗口框架子句。
(1)LAG和LEAD
LAG函数用于在当前行之前查找,而LEAD函数用于在当前行之后查找。函数的使用语法为:
LAG / LEAD (scalar_expression [,offset] [,default]) OVER ( [ partition_by_clause ] order_by_clause )
默认值(如果未指定,默认为NULL)
USE AdventureWorks2012; GO SELECT BusinessEntityID, YEAR(QuotaDate) AS SalesYear, SalesQuota AS CurrentQuota, LAG(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate)) AS PreviousQuota FROM Sales.SalesPersonQuotaHistory WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');
(2)FIRST_VALUE和LAST_VALUE
用于从窗口框架返回第一行和最后一行的值。这些函数支持窗口分区、排序和框架子句。
FIRST_VALUE ( [scalar_expression ] ) OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
USE AdventureWorks2012; GO SELECT JobTitle, LastName, VacationHours, FIRST_VALUE(LastName) OVER (PARTITION BY JobTitle ORDER BY VacationHours ASC ROWS UNBOUNDED PRECEDING ) AS FewestVacationHours FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID ORDER BY JobTitle;