1.分析函数的语法
function_name() over(partition by(分组子句) order by(排序子句) rows/range/(窗口子句))
over( ):开窗函数
partition by:分组子句,数据分组
order by:排序子句,数据排序
窗口子句:定义分析函数在操作行的集合,三种开窗方式:rows、range、Specifying
注意:有窗口子句一定要有排序子句
2.以sum(字段名)为例,max,min,count,avg含义一致
案例1:
select t.first_name,t.salary,t.hire_date,
sum(t.salary) over() a1,--整张表求和
sum(t.salary) over(order by t.hire_date) a2--整张表递加累计求和,窗口范围:第一行到当前行
from emp_tmp t;
案例2:
select t.first_name,t.salary,t.hire_date,
sum(t.salary) over(partition by t.first_name ) a3,--分组求和
sum(t.salary) over(partition by t.first_name order by t.hire_date) a4--分组递加累计求和,窗口范围:组内第一行到当前行
from emp_tmp t;
3.排序函数:rank(),row_number(),dense_rank()
3.1 row_number() over():排序,无重复值,partition by 可选,order by 必选
案例一:
select t.first_name,t.salary,t.hire_date,
row_number() over(order by t.hire_date) a1--整张表排序
from emp_tmp t;
案例二:
select t.first_name,t.salary,t.hire_date,
row_number() over(partition by t.first_name order by t.hire_date) a2--分组内部排序
from emp_tmp t;
3.2 rank():跳跃排序,partition by 可选,order by 必选
案例:
select t.first_name,t.salary,t.hire_date,
rank() over(partition by t.first_name order by t.hire_date) a2--分组内部排序
from emp_tmp t;
3.3 dense_rank():连续排序,partition by 可选,order by 必选
select t.first_name,t.salary,t.hire_date,
dense_rank() over(partition by t.first_name order by t.hire_date) a2--分组内部排序
from emp_tmp t;
4.窗口子句
4.1窗口子句语法
格式:
{rows|range}
{between
{unbounded preceding|current row |<value_expr>{preceding|following}
}and
{unbounded preceding|current row |<value_expr>{preceding|following}
}}
(1)rows|range:此关键字定义了一个window。
(2)between…and…:为窗品指定一个起点和终点。
(3)unbounded preceding:指明窗口是从分区(partition)的第一行开始。
(4)current row:指明窗口是从当前行开始。
注:order by为窗口子句的必须条件,没有order by将会报错:
4.2 rows between XX and XX窗口子句,理解比较简单,rows属于物理窗口,窗口大小固定
有一张月销售额表的结构跟数据如下:
4.2.1 rows between unbounded preceding and unbounded following:窗口指向整张表
在月销售额表中查询出每个月的销售额和所有月份的销售额之和:
select month,sales,
sum(sales) over( order by sales rows between unbounded preceding and unbounded following) total_sale
from month_sale;
4.2.2 rows between 1 preceding and 1 following:窗口范围:当前行和当前行的前一行和当前行的后一行
查询月销售额表中的每个月及前一个月和后一个月的销售额之和
select month,sales,
sum(sales) over( order by month rows between 1 preceding and 1 following) total_sale
from month_sale;
4.2.3 rows between unbonded preceding and current row:窗口范围:当前行及当前行之前的所有行
查询每月及之前月份的销售额总和
select month,sales,
sum(sales) over( order by month rows between unbounded preceding and current row) total_sale
from month_sale;
4.3 range between xx and xx,逻辑窗口,范围是活动的(真是看的头疼)
窗口范围是活动的:与rank()排完序的值有关
select month,sales,
rank() over( order by month ) rn,
sum(sales) over( order by month range between 1 preceding and current row) total_sale1
from month_sale_tmp;
range是逻辑窗口,是指定当前行对应值的范围取值,列数不固定,只要行值在范围内,对应列都包含在内,如上例中total_sale1(即range 1 preceing and current row)例的分析结果:
当rownun=1时,是sum为1-1<=rn<=1 的sales和,即sum=1111+222+33=1366(取rn为1的sales列的值之和);
当rownun=4时,是sum为4-1<=rn<=4 的和,即sum=55(取rn为4对应的sales列的值之和);
当rownun=6时,是sum为6-1<=id<=6 的和,即sum=666(取id为6);