1.概述
语法:
window_function (expression) OVER (
[ PARTITION BY part_list ]
[ ORDER BY order_list ]
[ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] )
window_function(expression) 可以是下面的函数:
-
Aggregate Functions: 常用聚合函数,比如:count(*)、sum(...)、min(...)、avg(...) 等
-
Sort Functions: 数据排序函数, 比如 :dense_rank()、rank(...)、row_number(...)等
-
Analytics Functions: 取值函数, 比如:lead(...)、lag(...)、 first_value(...)等
具体还要结合实际情况使用。
注意:
1. 窗口子句必须和order by 子句同时使用,且如果指定了order by 子句未指定窗口子句,则默认为RANGE BETWEEN unbounded preceding AND CURRENT ROW (有order by 作用域:第一行~当前行,如果order by这个字段的值相同,那汇总值也是相同的)
2. 当ORDER BY和窗口从句都缺失,窗口规范默认是:ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (无order by 作用域:整个partition下)
注:默认框架为 range BETWEEN UNBOUNDED PRECEDING and current row,row和range的区别是rows按照行进行计算,如当求第一行的时候,求和为第一行-第一行,当求第二行的时候,求和为第一行-第二行;而range是按照值进行计算,如sale_price, 当sale_price=100,求和范围为100-100,当sale_price=500,求和范围为100-500。
3. range是逻辑窗口,是指定当前行对应值的范围取值,列数不固定,只要行值在范围内,对应列都包含在内。注意:range的值是根据order by的字段来确定的
4. rows是物理窗口,即根据order by 子句排序后,取的前N行及后N行的数据计算(与当前行的值无关,只与排序后的行号相关)
5. 聚合函数可以和开窗结合使用,例如求 每天活跃用户量的环比值
6.
rows between ... and ...控制窗口函数的范围
unbounded 无边界
preceding 往前
following 往后
unbounded preceding 往前所有行
2 preceding 往前2行
unbounded following 往后所有行
2 following 往后2行
current row 当前行
————————————————
-- rows(考虑的是行)
--从(无边界)第一行到当前行进行累加
select score,sum(score) over(order by score rows between unbounded preceding and current row) as s from tmp;
--从前2行到当前行进行累加
select score,sum(score) over(order by score rows between 2 preceding and current row) as s from tmp;
--前2行+当前行+后面所有行进行累加
select score,sum(score) over(order by score rows between 2 preceding and unbounded following) as s from tmp;
--当前行到后一行进行累加
select score,sum(score) over(order by score rows between current row and 1 following) as s from tmp;
--第一行到当前行进行累加,如果当前行的值和后一行的值相同的话,当前行的累加结果取的是后一行的累加结果(和当前行相同score值的所有行都会包含进去)
select score,sum(score) over(order by score) as s from tmp;
-- range(考虑的是值)
-- 统计近某用户近三个月的工资,例如 1,2,4,5中,3月份缺失,则5月只统计4+5月的工资,如果需要前三行,请使用rows
sum(salary) over(partition by id order by month range 2 preceding ) as salary
SELECT
date
, count(user_id) as a -- 活跃用户量
, lag(count(user_id),1) over(partition by date order by date) as b -- 昨天活跃用户量,这里partiton by 可以省略
, count(user_id)/lag(count(user_id),1) over(partition by date order by date) as yoy -- 环比
FROM user_active_table
GROUP BY date -- 如果开窗函数中含有聚合函数,需要接group by
2.窗口函数应用(举栗子)
今天接触到一个sql题目,也是对ROW_NUMBER() OVER PARTITION BY 的使用,下面实战一下吧!
题目:统计各个班级中学生各个科目总成绩排名第一的情况
建表
CREATE TABLE tmp_20191101( --创建一个学生表
CLASS INTEGER, --学生id
SNAME VARCHAR2(20), --学生姓名
SSEX VARCHAR2(10), --学生性别
SUBJECT VARCHAR2(20), --科目名称
SCORE INTEGER --对应科目成绩
);
插入数据
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小朱','男','英语',97);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','英语',91);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小王','男','英语',80);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小朱','男','语文',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小王','男','数学',76);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小陶','男','数学',89);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小薛','男','语文',94);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小王','女','语文',93);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小朱','女','数学',61);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小苏','女','语文',99);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小龚','女','数学',92);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小刘','女','英语',81);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','英语',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','英语',96);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','英语',88);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','语文',99);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','数学',73);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','数学',85);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','语文',93);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','语文',95);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','数学',63);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','语文',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','数学',92);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','英语',86);
tmp_20191101 全部数据截图:
3. 执行sql
select * from tmp_20191101
select * from(
select
t.sname, t.class, t.SumScore,
ROW_NUMBER() OVER (PARTITION BY t.class order by t.SumScore desc) rn --根据class分区,sumscore排序
from
(
select sname,class,sum(score) as SumScore --每班同学的各自总分
from tmp_20191101
group by class,sname) t) RN
where RN.rn <= 1
执行sql 结果截图:
4.思考
如果有并列的情况,怎么看处理呢? ROW_NUMBER()是不考虑并列的,你看
相同总分,排名没有并列,那该怎么办呢?Oracle中提供了Rank()可以用 ,如图
如果 想要名次之间没有“间隔”, 可以使用 dense_rank()函数,例如: