窗口函数的使用

1.概述

    语法:

window_function (expression) OVER (
   [ PARTITION BY part_list ]
   [ ORDER BY order_list ]
   [ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] )

   window_function(expression) 可以是下面的函数:

  • Aggregate Functions: 常用聚合函数,比如:count(*)、sum(...)、min(...)、avg(...) 等

  • Sort Functions: 数据排序函数, 比如 :dense_rank()、rank(...)、row_number(...)等

  • Analytics Functions: 取值函数, 比如:lead(...)、lag(...)、 first_value(...)等

     具体还要结合实际情况使用。

注意:  

1. 窗口子句必须和order by 子句同时使用,且如果指定了order by 子句未指定窗口子句,则默认为RANGE BETWEEN unbounded preceding AND CURRENT ROW    (有order by  作用域:第一行~当前行,如果order by这个字段的值相同,那汇总值也是相同的)

2. 当ORDER BY和窗口从句都缺失,窗口规范默认是:ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (无order by 作用域:整个partition下)

注:默认框架为 range BETWEEN UNBOUNDED PRECEDING and current row,row和range的区别是rows按照进行计算,如当求第一行的时候,求和为第一行-第一行,当求第二行的时候,求和为第一行-第二行;而range是按照进行计算,如sale_price, 当sale_price=100,求和范围为100-100,当sale_price=500,求和范围为100-500。

3. range是逻辑窗口,是指定当前行对应的范围取值,列数不固定,只要行值在范围内,对应列都包含在内。注意:range的值是根据order by的字段来确定的

4. rows是物理窗口,即根据order by 子句排序后,取的前N行及后N行的数据计算(与当前行的值无关,只与排序后的行号相关)

5. 聚合函数可以和开窗结合使用,例如求 每天活跃用户量的环比值

6. 

rows between ... and ...控制窗口函数的范围
unbounded 无边界
preceding 往前
following 往后
unbounded preceding 往前所有行
2 preceding 往前2行
unbounded following 往后所有行
2 following 往后2行
current row 当前行
————————————————

-- rows(考虑的是行)
--从(无边界)第一行到当前行进行累加
select score,sum(score) over(order by score rows between unbounded preceding and current row) as s from tmp; 
--从前2行到当前行进行累加
select score,sum(score) over(order by score rows between 2 preceding and current row) as s from tmp; 
--前2行+当前行+后面所有行进行累加
select score,sum(score) over(order by score rows between 2 preceding and unbounded following) as s from tmp; 
--当前行到后一行进行累加
select score,sum(score) over(order by score rows between current row and 1 following) as s from tmp;
--第一行到当前行进行累加,如果当前行的值和后一行的值相同的话,当前行的累加结果取的是后一行的累加结果(和当前行相同score值的所有行都会包含进去)
select score,sum(score) over(order by score) as s from tmp;

-- range(考虑的是值)
-- 统计近某用户近三个月的工资,例如 1,2,4,5中,3月份缺失,则5月只统计4+5月的工资,如果需要前三行,请使用rows
sum(salary) over(partition by id order by month  range  2 preceding ) as salary

SELECT
  date
  , count(user_id) as a    -- 活跃用户量
  , lag(count(user_id),1) over(partition by date order by date) as b    -- 昨天活跃用户量,这里partiton by 可以省略
  , count(user_id)/lag(count(user_id),1) over(partition by date order by date) as yoy    -- 环比
  FROM user_active_table
  GROUP BY date    -- 如果开窗函数中含有聚合函数,需要接group by

2.窗口函数应用(举栗子)

Image

    

     今天接触到一个sql题目,也是对ROW_NUMBER() OVER PARTITION BY 的使用,下面实战一下吧!

 题目:统计各个班级中学生各个科目总成绩排名第一的情况

  建表

CREATE TABLE tmp_20191101(  --创建一个学生表
  CLASS INTEGER,          --学生id
  SNAME VARCHAR2(20),   --学生姓名
  SSEX VARCHAR2(10),    --学生性别
  SUBJECT VARCHAR2(20), --科目名称
  SCORE INTEGER         --对应科目成绩
);

  插入数据

INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小朱','男','英语',97);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','英语',91);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小王','男','英语',80);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小朱','男','语文',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小王','男','数学',76);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小陶','男','数学',89);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小薛','男','语文',94);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小王','女','语文',93);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小朱','女','数学',61);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小苏','女','语文',99);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小龚','女','数学',92);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小刘','女','英语',81);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','英语',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','英语',96);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','英语',88);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','语文',99);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(12,'小张','男','数学',73);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','数学',85);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(10,'小李','男','语文',93);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','语文',95);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(11,'小欣','女','数学',63);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','语文',90);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','数学',92);
INSERT INTO tmp_20191101(CLASS,SNAME,SSEX,SUBJECT,SCORE)VALUES(13,'小萌','女','英语',86);

tmp_20191101 全部数据截图:

3. 执行sql 

select * from tmp_20191101

select * from(
 select 
       t.sname, t.class, t.SumScore,
       ROW_NUMBER() OVER (PARTITION BY t.class order by t.SumScore desc) rn   --根据class分区,sumscore排序
       from
        (
          select sname,class,sum(score) as  SumScore    --每班同学的各自总分
          from tmp_20191101 
          group by class,sname) t) RN
where RN.rn <= 1

执行sql 结果截图:

4.思考

 如果有并列的情况,怎么看处理呢?  ROW_NUMBER()是不考虑并列的,你看

相同总分,排名没有并列,那该怎么办呢?Oracle中提供了Rank()可以用 ,如图

如果  想要名次之间没有“间隔”, 可以使用 dense_rank()函数,例如:

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值