SQL基础知识-Window Function

1 Window Function

1.1 Window Function Definition

In SQL, a window function is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.

Window functions apply to aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does two things :

  • Partitions rows to form a set of rows. (PARTITION BY clause is used)
  • Orders rows within those partitions into a particular order. (ORDER BY clause is used)

1.2 Window Function Sample

As an example, here is a query which uses a window function to compare the salary of each employee with the average salary of their department (example from the PostgreSQL documentation):

Consider the following employee table :
在这里插入图片描述

SELECT Name, Age, Department, Salary, 
 AVG(Salary) OVER( PARTITION BY Department) AS Avg_Salary
 FROM employee

Output:
在这里插入图片描述

The PARTITION BY clause groups rows into partitions, and the function is applied to each partition separately. If the PARTITION BY clause is omitted (such as with an empty OVER() clause), then the entire result set is treated as a single partition. For this query, the average salary reported would be the average taken over all rows.

Window functions are evaluated after aggregation (after the GROUP BY clause and non-window aggregate functions, for example).

1.3 Types of Window Functions

1.3.1 Aggregate Window Functions

Various aggregate functions such as SUM(), COUNT(), AVERAGE(), MAX(), and MIN() applied over a particular window (set of rows) are called aggregate window functions.

We should be careful while adding order by clauses to window functions with aggregates. Let’s consider another case:

SELECT Name, Age, Department, Salary, 
 AVG(Salary) OVER( PARTITION BY Department ORDER BY Age) AS Avg_Salary
 FROM employee

Here we order the records within the partition as per age values and hence the average values change as per the sorted order. The output of above query will be :
在这里插入图片描述

1.3.2 Ranking Window Functions

Ranking functions are: RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE() .

  • RANK(): As the name suggests, the rank function assigns rank to all the rows within every partition. Rank is assigned such that rank 1 given to the first row and rows having same value are assigned same rank. For the next rank after two same rank values, one rank value will be skipped. For instance, if two rows share­ rank 1, the next row gets rank 3, not 2.
  • DENSE_RANK() : It assigns rank to each row within partition. Just like rank function first row is assigned rank 1 and rows having same value have same rank. The difference between RANK() and DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same rank, consecutive integer is used, no rank is skipped.
  • ROW_NUMBER() : gives e­ach row a unique number. It numbers rows from one­ to the total rows. The rows are put into groups base­d on their values. Each group is called a partition. In e­ach partition, rows get numbers one afte­r another. No two rows have the same­ number in a partition. This makes ROW_NUMBER() differe­nt from RANK() and DENSE_RANK(). ROW_NUMBER() uniquely identifies e­very row with a sequential inte­ger number. This helps with diffe­rent kinds of data analysis.
  • NTILE(n): 将分组数据按照顺序切成n片,为每一行分配一个片num。

Sample:

SELECT 
    ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS emp_row_no, 
    Name,  
    Department, 
    Salary,
    RANK() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_rank,
    DENSE_RANK() OVER(PARTITION BY Department ORDER BY Salary DESC) AS emp_dense_rank
FROM 
    employee;

Output:
在这里插入图片描述

1.3.3 Time-Series Window Functions

Time-Series Window Functions are: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

Time-Series Window Functions are used to access data from rows that come after, or before the current row within a result set based on a specific column order.

Think of LEAD() as a function that lets you peek into the future , and LAG() as a way to glance into the past. They’re like time-travel functions for your data! We frequently use this to create a 7-day moving average metric, or a 28-day rolling count metric.

FIRST_VALUE(): 取分组内排序后,截止到当前行,第一个值;
LAST_VALUE(): 取分组内排序后,截止到当前行,最后一个值;

Consider the following stock_prices table :
在这里插入图片描述

SELECT
  date, close,
  LEAD(close) OVER (ORDER BY date) AS next_month_close,
  LEAD(close) OVER (ORDER BY date) - close as difference
FROM stock_prices
WHERE EXTRACT(YEAR FROM date) = 2023  AND ticker = 'GOOG';

Output:
在这里插入图片描述

References

【1】https://en.wikipedia.org/wiki/Window_function_(SQL)
【2】https://www.geeksforgeeks.org/window-functions-in-sql/
【3】https://datalemur.com/sql-tutorial/sql-time-series-window-function-lead-lag

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值