mysql在8.0之后版本支持窗口函数。
使用的数据集为python中自带的红酒数据集。红酒数据集包括13个特征,178个样本,最后将红酒分为三类。
#导入数据
USE mydb;
DROP TABLE IF EXISTS wine
CREATE TABLE wine(alcohol FLOAT, malic_acid FLOAT, ash FLOAT, alcalinity_of_ash FLOAT, magnesium INT,
total_phenols FLOAT, flavanoids FLOAT, nonflavanoid_phenols FLOAT,
proanthocyanins FLOAT, color FLOAT, hue FLOAT, od280 FLOAT, proline INT, target INT)
LOAD DATA LOCAL INFILE "C://wine.csv" INTO TABLE wine
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\n';
窗口函数大体可以分为以下两种:
1.能够作为窗口函数的聚合函数(sum,avg,count,max,min)
2.rank,dense_rank。row_number等专用窗口函数
语法格式
<窗口函数> over ([partition by <列清单>]
order by <排序用列清单>)
一、sum
partition by 根据某些条件分开求和,order by 根据某条件是累加求和。
条件可以为一个也可以为多个。
#以taeget作为分类标准,将酒精度数求和
SELECT alcohol,target,
SUM(alcohol) over(PARTITION BY target)AS al_sum
FROM wine
SELECT alcohol,target,
SUM(alcohol) over(ORDER BY target)AS al_sum
FROM wine
SELECT alcohol,target,ash,
SUM(alcohol) over(PARTITION BY target,ash )AS al_sum
FROM wine
SELECT alcohol,target,ash,
SUM(alcohol) over(PARTITION BY target ORDER BY ash )AS al_sum
FROM wine
二、avg
partition by 分组后的均值,order by 是累加求和后计算得到的均值。
count,max,min同理。
SELECT alcohol,target,ash,
AVG(alcohol) over(PARTITION BY target,ash )AS al_avg
FROM wine
SELECT alcohol,target,ash,
AVG(alcohol) over(ORDER BY target,ash)AS al_avg
FROM wine
三、rank
1.rank函数:计算排序时,如果存在相同位次的记录,则会跳过之后的位次。
2.dense_rank函数:同样是计算排序,即使存在相同位次的记录,也不会跳过之后的位次。
3.row_number函数:赋予唯一的连续位次。
SELECT ash,
rank() over(ORDER BY ash)AS ash_rank,
dense_rank() over(ORDER BY ash)AS ash_dense_rank,
row_number() over (ORDER BY ash) AS ash_row_num
FROM wine
四、其他
unbounded:无界限
preceding:从分区第一行头开始,则为 unbounded。 N为:相对当前行向前的偏移量
following :与preceding相反,到该分区结束,则为 unbounded。N为:相对当前行向后的偏移量
current row:顾名思义,当前行,偏移量为0
SELECT alcohol,ash,
COUNT(alcohol) over(PARTITION BY ash ROWS BETWEEN unbounded preceding AND current ROW)AS pv1,
COUNT(alcohol) over(PARTITION BY ash ROWS BETWEEN 3 preceding AND current ROW) AS pv2,
COUNT(alcohol) over(PARTITION BY ash ROWS BETWEEN 3 preceding AND 1 following) AS pv3,
COUNT(alcohol) over(PARTITION BY ash ROWS BETWEEN current ROW AND unbounded following) AS pv4
FROM wine
我把代码都写在上面了,如果有看不太懂的可以自己运行试一下,改改语法什么的,很容易就可以理解了。