window frames自定义窗口:
基本语法:
select id,count(id)over(order by id rows between upper_bound and lower_bound)from table;
例如:
SELECT
id,
total_price,
SUM(total_price) OVER(
ORDER BY placed
ROWS UNBOUNDED PRECEDING) as `sum`
FROM single_order
在between…and之间上限(upper_bound)和下限(lower_bound)的取值情况:
- unbounded preceding–对上限无限制
- preceding–当前行之前的第n行(n,如:5 preceding)
- current row–仅当前行
- following–当前之后的第n行(n,如:5 following)
- unbounded following–对下无限制
注:当只有上限时,between…and可以省略
- ROWS UNBOUNDED PRECEDING 等价于
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- ROWS n PRECEDING 等价于
BETWEEN n PRECEDING AND CURRENT ROW
- ROWS CURRENT ROW 等价于
BETWEEN CURRENT ROW AND CURRENT ROW
这种简略写法不适合following
案例:
SELECT
id,
total_price,
SUM(total_price) OVER(ORDER BY placed ROWS UNBOUNDED PRECEDING) AS running_total,
SUM(total_price) OVER(ORDER BY placed ROWS between 3 PRECEDING and 3 FOLLOWING) AS sum_3_before_after
FROM single_order
rows与range的区别:
rows(范围)考虑的是具体的行:
SELECT
id,
placed,
total_price,
SUM(total_price) OVER (ORDER BY placed ROWS UNBOUNDED PRECEDING) AS 'running_'
FROM single_order;
range(范围)考虑的是具体的值:
SELECT
id,
placed,
total_price,
SUM(total_price) OVER (ORDER BY placed range UNBOUNDED PRECEDING) AS 'running_'
FROM single_order;
比较:
rows:物理窗口,只考虑行号本身,已当前行号作为参考,和order by字段没有关系
range:逻辑窗体,已当前行中order by 字段值相关,通过这个值来确定窗口的大小
SELECT
id,
placed,
total_price,
SUM(total_price) OVER(ORDER BY placed rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS `sum1`,
SUM(total_price) OVER(ORDER BY placed RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `sum2`
FROM single_order;
默认的window frames:
- 如果在OVER(…)中没有ORDER BY子句,则所有行视为一个window frames
- 如果在OVER(…)中指定了ORDER BY子句,则会默认添加
RANGE UNBOUNDED PRECEDING
作为window frames
lead()函数:
lead(x):
SELECT
name,
opened,
LEAD(name) OVER(ORDER BY opened)
FROM website;
LEAD中传入name列作为参数,将以 ORDER BY
排序后的顺序,返回当前行的下一行name
列所对应的值,并在新列中显示
注意:
-
最后一列没有下一列结果所以这里显示NULL
-
LEAD() 中传入的列名与排序的列可以不同
lead(x,y):
LEAD函数还可以传入两个参数:
-
参数1 跟传入一个参数时的情况一样:一列的列名
-
参数2 代表了偏移量,如果传入2 就说明要以当前行为基准,向前移动两行作为返回值
SELECT
name,
opened,
LEAD(opened,2) OVER(ORDER BY opened)
FROM website;
lead(x,y,z):
lead函数也可以接收三个参数,第三个参数用来传入默认值,应用场景是当使用lead函数返回null的时候,可以用第三个参数传入的默认值进行填充
SELECT
day,
users,
LEAD(users, 7, -1) OVER(ORDER BY day) AS `lead`
FROM statistics
WHERE website_id = 2
AND day BETWEEN '2016-05-01' AND '2016-05-14';
lag(x)函数:
LAG(x)函数与LEAD(x)用法类似,区别是,LEAD返回当前行后面的值(前进),LAG返回当前行之前的值(滞后)
SELECT
day,
clicks,
impressions,
clicks / impressions * 100 AS conversion,
LAG(clicks) OVER(ORDER BY day) / LAG(impressions) OVER(ORDER BY day) * 100 AS previous_conversion
FROM statistics
WHERE website_id = 1
AND day BETWEEN '2016-05-15' AND '2016-05-31';
first_value()和last_value():
first_value():
返回指定列的第一个值:
SELECT
name,
opened,
budget,
FIRST_VALUE(budget) OVER(ORDER BY opened)
FROM website;
last_value():
返回指定列的最后一个值
-
当OVER子句中包含ORDER BY时,如果我们不显式定义window frame,SQL会自动带上默认的window frame语句:
-
RANGE UNBOUNDED PRECEDING, 意味着我们的查询范围被限定在第一行到当前行(current row)
-
如果想通过LAST_VALUE 与ORDER BY配合得到所有数据排序后的最后一个值,需要吧window frame语句RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING或者ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
SELECT
name,
opened,
LAST_VALUE(opened) OVER(
ORDER BY opened
RANGE BETWEEN UNBOUNDED PRECEDINGAND AND UNBOUNDED FOLLOWING
) AS `last_value`
FROM website;
nth_value(x,n):
NTH_VALUE(x,n) 函数返回 x列,按指定顺序的第n个值
SELECT
name,
opened,
NTH_VALUE(opened, 2) OVER(
ORDER BY opened
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM website;