Oracle 开窗函数(Window Functions)允许在不合并行的前提下对数据进行复杂分析,常用于排名、累计计算、前后行对比等场景。
一、核心语法结构
函数名() OVER (
[PARTITION BY 分区列]
[ORDER BY 排序列 [ASC|DESC]]
[窗口帧子句 (ROWS | RANGE BETWEEN ... AND ...)]
)
PARTITION BY: 将数据划分为独立的分区,函数在每个分区内单独计算。
ORDER BY: 定义分区内的排序规则。
窗口帧子句: 指定计算范围(如当前行前 N 行、后 N 行等)。
二、常用开窗函数分类
1. 排名函数
(1) ROW_NUMBER()
功能: 为每行分配唯一的连续序号(相同值序号不同)。
-- 语法:
ROW_NUMBER() OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例:
SELECT
name, score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM students;
结果:
name score rank
Alice 95 1
Bob 90 2
Carol 90 3
(2) RANK()
**功能:**允许并列排名,后续序号跳跃(如 1, 2, 2, 4)。
-- 语法:
RANK() OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例:
SELECT
name, score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM students;
结果:
name score rank
Alice 95 1
Bob 90 2
Carol 90 2
Dave 85 4
(3) DENSE_RANK()
**功能:**允许并列排名,后续序号连续(如 1, 2, 2, 3)。
-- 语法:
DENSE_RANK() OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例:
SELECT
name, score,
DENSE_RANK() OVER (ORDER BY score DESC) AS rank
FROM students;
结果:
name score rank
Alice 95 1
Bob 90 2
Carol 90 2
Dave 85 3
2. 聚合函数 + 窗口
(1) SUM()
功能: 计算分区内的总和。
-- 语法:
SUM(col) OVER ([PARTITION BY col1] [ORDER BY col2] [窗口帧子句])
示例(累计销售额):
SELECT
month, sales,
SUM(sales) OVER (ORDER BY month) AS cumulative_sales
FROM monthly_sales;
结果:
month sales cumulative_sales
2023-01 100 100
2023-02 200 300
2023-03 150 450
(2) AVG()
功能: 计算分区内的平均值。
-- 语法:
AVG(col) OVER ([PARTITION BY col1] [ORDER BY col2] [窗口帧子句])
-- 示例(部门平均工资):
SELECT
department, salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary
FROM employees;
(3) COUNT()
功能: 计算分区内的行数。
-- 语法:
COUNT(col) OVER ([PARTITION BY col1] [ORDER BY col2] [窗口帧子句])
-- 示例(统计每个部门的员工数):
SELECT
department, employee_id,
COUNT(*) OVER (PARTITION BY department) AS dept_emp_count
FROM employees;
3. 前后行分析函数
(1) LAG(col, n, default)
功能: 获取当前行前第 n 行的值(默认 n=1)。
-- 语法:
LAG(col, n, default) OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例(计算环比增长):
SELECT
month, sales,
LAG(sales, 1, 0) OVER (ORDER BY month) AS prev_sales,
sales - LAG(sales, 1, 0) OVER (ORDER BY month) AS growth
FROM monthly_sales;
(2) LEAD(col, n, default)
功能: 获取当前行后第 n 行的值(默认 n=1)。
-- 语法:
LEAD(col, n, default) OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例(预测下月销售额):
SELECT
month, sales,
LEAD(sales, 1, 0) OVER (ORDER BY month) AS next_sales
FROM monthly_sales;
4. 首尾值函数
(1) FIRST_VALUE(col)
功能: 返回窗口内的第一个值。
-- 语法:
FIRST_VALUE(col) OVER ([PARTITION BY col1] ORDER BY col2 [窗口帧子句])
-- 示例(获取部门最高工资):
SELECT
department, salary,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_salary
FROM employees;
(2) LAST_VALUE(col)
功能: 返回窗口内的最后一个值(需结合窗口帧子句)。
-- 语法:
LAST_VALUE(col) OVER ([PARTITION BY col1] ORDER BY col2 [窗口帧子句])
-- 示例(默认范围需扩展至末尾):
SELECT
month, sales,
LAST_VALUE(sales) OVER (
ORDER BY month
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS last_sales
FROM monthly_sales;
(3) NTH_VALUE(col, n)
功能: 返回窗口内的第 n 个值。
-- 语法:
NTH_VALUE(col, n) OVER ([PARTITION BY col1] ORDER BY col2 [窗口帧子句])
-- 示例(获取每个部门第二高的工资):
SELECT
department, salary,
NTH_VALUE(salary, 2) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_highest
FROM employees;
5. 分布函数
(1) NTILE(n)
功能: 将数据分为 n 个桶,并为每行分配桶号。
-- 语法:
NTILE(n) OVER ([PARTITION BY col1] ORDER BY col2)
--示例(将学生分为三等):
SELECT
student_id, score,
NTILE(3) OVER (ORDER BY score DESC) AS tier
FROM students;
(2) CUME_DIST()
功能: 计算累积分布值(当前行值在分区内的相对位置,范围 [0,1])。
-- 语法:
CUME_DIST() OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例(计算工资累积分布):
SELECT
employee_id, salary,
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;
(3) PERCENT_RANK()
功能: 计算百分比排名(基于 RANK(),范围 [0,1])。
-- 语法:
PERCENT_RANK() OVER ([PARTITION BY col1] ORDER BY col2)
-- 示例:
SELECT
student_id, score,
PERCENT_RANK() OVER (ORDER BY score) AS percent_rank
FROM students;
6. 窗口帧子句详解
(1) ROWS BETWEEN
范围定义:基于物理行偏移。
常用选项:
UNBOUNDED PRECEDING: 分区第一行。
n PRECEDING: 当前行前 n 行。
CURRENT ROW: 当前行。
n FOLLOWING: 当前行后 n 行。
UNBOUNDED FOLLOWING: 分区最后一行。
示例(计算近3个月移动平均):
sql
SELECT
month, sales,
AVG(sales) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM monthly_sales;
(2) RANGE BETWEEN
范围定义:基于逻辑值范围(如日期差值)。
-- 常用选项:
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
-- 示例(计算近7天累计销售额):
SELECT
sale_date, sales,
SUM(sales) OVER (
ORDER BY sale_date
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
) AS weekly_sales
FROM daily_sales;
三、总结
需求 | 推荐函数 |
---|---|
唯一连续序号 | ROW_NUMBER() |
允许并列的排名 | RANK() 或 DENSE_RANK() |
累计计算(总和/平均) | SUM() OVER() 或 AVG() OVER() |
前后行数据对比 | LAG() 或 LEAD() |
分组首尾值 | FIRST_VALUE() 或 LAST_VALUE() |
数据分桶 | NTILE(n) |
累积分布分析 | CUME_DIST() 或 PERCENT_RANK() |
四、注意事项
性能优化:
避免过度分区(PARTITION BY 列过多)。
结合索引优化排序字段(ORDER BY)。
窗口帧范围:
默认范围:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。
显式指定 ROWS 或 RANGE 以控制计算逻辑。
NULL 处理:
多数函数忽略 NULL,但需注意 ORDER BY 中 NULL 的排序位置。
通过灵活组合开窗函数,可以高效实现复杂分析需求,提升数据处理能力!