函数 + over()
over() :开窗口
1.Row_Number,Rank,Dense_Rank 这三个窗口函数的使用场景非常多
ntile, row_number(), rank(), dense_rank() 这几个函数会新增加一列数据,可以由此进行操作
1.1 row_number()
对分区内的数据排序,多出来一列数据,从1到分区结尾
多用来求 连续天数,会用到date_sub函数
例子:
– 求有连续4天销售记录的店铺
1)
select
*,
row_number() over(partition by name order by ctime)
from
shop ;
结果:
+------------+-------------+------------+----------------------+
| shop.name | shop.ctime | shop.cost | row_number_window_0 |
+------------+-------------+------------+----------------------+
| a | 2017-02-05 | 200.0 | 1 |
| a | 2017-02-06 | 300.0 | 2 |
| a | 2017-02-07 | 200.0 | 3 |
| a | 2017-02-08 | 400.0 | 4 |
| a | 2017-02-10 | 600.0 | 5 |
| a | 2017-03-01 | 200.0 | 6 |
| a | 2017-03-02 | 300.0 | 7 |
| a | 2017-03-03 | 200.0 | 8 |
| a | 2017-03-04 | 400.0 | 9 |
| a | 2017-03-05 | 600.0 | 10 |
| b | 2017-02-05 | 200.0 | 1 |
| b | 2017-02-06 | 300.0 | 2 |
| b | 2017-02-08 | 200.0 | 3 |
| b | 2017-02-09 | 400.0 | 4 |
| b | 2017-02-10 | 600.0 | 5 |
| c | 2017-01-31 | 200.0 | 1 |
| c | 2017-02-01 | 300.0 | 2 |
| c | 2017-02-02 | 200.0 | 3 |
| c | 2017-02-03 | 400.0 | 4 |
| c | 2017-02-10 | 600.0 | 5 |
+------------+-------------+------------+----------------------+
select
name ,
diff,
count(*) cc
from
(
select
*,
**date_sub(ctime , n) diff**
from
(
select
*,
**row_number() over(partition by name order by ctime) n**
from
shop
)t1
)t2
group by name , diff
having cc >=4 ;
结果:
+-------+-------------+-----+
| name | diff | cc |
+-------+-------------+-----+
| a | 2017-02-04 | 4 |
| a | 2017-02-23 | 5 |
| c | 2017-01-30 | 4 |
+-------+-------------+-----+
select
distinct name
from
(
select
name ,
diff,
count(*) cc
from
(
select
*,
date_sub(ctime , n) diff
from
(
select
*,
row_number() over(partition by name order by ctime) n
from
shop
)t1
)t2
group by name , diff
having cc >=4
)t3 ;
结果:
+-------+
| name |
+-------+
| a |
| c |
+-------+
1.2 Rank , Dense_Rank
三个函数比较:
SQL:
select
name,
subject,
score,
rank() over(partition by subject order by score desc) rank,
dense_rank() over(partition by subject order by score desc) dense_rank,
row_number() over(partition by subject order by score desc) row_number
from score;
结果:
name subject score rank dense_rank row_number
孙悟空 数学 95 1 1 1
宋宋 数学 86 2 2 2
婷婷 数学 85 3 3 3
娜娜 数学 56 4 4 4
宋宋 英语 84 1 1 1
娜娜 英语 84 1 1 2
婷婷 英语 78 3 2 3
孙悟空 英语 68 4 3 4
娜娜 语文 94 1 1 1
孙悟空 语文 87 2 2 2
婷婷 语文 65 3 3 3
宋宋 语文 64 4 4 4
总结:
1.row_number()的值不会存在重复,只排序
2.rank():并列第一,则都为第一,没有第二名,直接第三名
3.dense_rank(): 并列第一,则都为第一,有第二名
2.SUM(), MIN(),MAX(),AVG()等聚合函数
窗口函数主要是控制运算范围
SQL:
select
* ,
sum(cost) over() , -- 起点到终点 所有的行累加
sum(cost) over(partition by name), --每个人累加
sum(cost) over(partition by name order by createtime desc) -- 每个人按照时间累加
from
orders ;
3.ntile(n)
ntile函数可以用发来求 数据的前n% 数据,ntile(n)是把数据分成n等分,查询时用where n = ? 就可以取出对用的那份数据–
SQL:
-- 查询前20%时间的订单信息
select
*
from
(
select
*,
ntile(5) over(order by cday) n
from orders
)t
where t.n = 1;
4.lag(col,n) 和 lead(col,n)
lag(column , n , “默认值”) 参数一 数据 参数二 取前n行 参数三 没有数据时的默认值
lead(column , n , “默认值”) 后面的数据
lag函数可以用来求顾客上次购买时间,上次登录时间的问题
SQL:
select
name,
orderdate,
cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1,
lag(orderdate,2) over (partition by name order by orderdate) as time2
from business;