目录
1.Order by 注意点
当ORDER BY后面缺少窗口从句条件,窗口规范默认是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
当ORDER BY和窗口从句都缺失, 窗口规范默认是 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING。
查询的结果和ORDER BY相关,默认为升序
2.sum函数的作用范围
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
3.ROWS BETWEEN
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:无界限(起点或终点)
UNBOUNDED PRECEDING:表示从前面的起点
UNBOUNDED FOLLOWING:表示到后面的终点
4.常用的分析函数
4.1 RANk、DENSE_RANK
RANK()在出现等级相同的元素时预留为空,DENSE_RANK()不会。
某产品类型有两个并列第一
RANK():第一二为1,第三位3
DENSE_RANK():第一二为1,第三位2
4.2 ROW_NUMBER
ROW_NUMBER()从1开始,为每条记录返回一个数字
select a,b,row_number()over(partition by a desc a) as rn from t
4.3 SUM
例如,按照部门ID大小排升序,依次统计所有工资部门的累计工资支出。
SELECT deptid,SUM(salary) salary_amount,
SUM(SUM(salary)) OVER( ORDER BY deptid ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_amount
FROM employee
GROUP BY deptid
ORDER BY deptid;
例如,按照部门ID大小排升序,依次统计当前部门和之前1个部门的工资部门的累计工资支出。
SELECT deptid,SUM(salary) salary_amount,SUM(SUM(salary)) OVER( ORDER BY deptid ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS cumulative_amount FROM employee GROUP BY deptid ORDER BY deptid;
计算前3个月之间的和
SUM( SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS cumulative_amount
前后一个月之间的和
SUM( SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS cumulative_amount
4.4 FIRST_VALUE、LAST_VALUE
FIRST_VALUE:取分组内排序后,窗口中第一个值
LAST_VALUE: 取分组内排序后,窗口中最后一个值
例如:
前一行和后一行之间的第一个值
FIRST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS xxxx;
前一行和后一行之间的最后一个值
LAST_VALUE(SUM(amount)) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS xxxx;
4.4 LAG、LEAD
LEAD(col,n,DEFAULT) :用于统计窗口内往下第n行值。第一个参数为列名,第二个参数为往下第n行(可选,默认为1),
第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)
LAG(col,n,DEFAULT) :与lead相反,用于统计窗口内往上第n行值。第一个参数为列名,
第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)
LAG(column_name1,1) OVER(ORDER BY column_name2)
LEAD(column_name1,1) OVER(ORDER BY column_name2)
这样就获得前一条、后一条的数据
select *,LAG(empid,1) over (order by empid asc) from employee;
5.GROUPING SETS、GROUPING__ID
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL,其中的GROUPING__ID,表示结果属于哪一个分组集合。
select
user_type,
sales,
count(user_id) as pv,
GROUPING__ID
from
order_detail
group by
user_type,sales
GROUPING SETS(user_type,sales)
ORDER BY
GROUPING__ID;
+------------+--------+-----+---------------+--+
| user_type | sales | pv | grouping__id |
+------------+--------+-----+---------------+--+
| old | NULL | 3 | 1 |
| new | NULL | 7 | 1 |
| NULL | 6 | 1 | 2 |
| NULL | 5 | 2 | 2 |
| NULL | 3 | 2 | 2 |
| NULL | 2 | 2 | 2 |
| NULL | 1 | 3 | 2 |
+------------+--------+-----+---------------+--+
6. CUBE
根据GROUP BY的维度的所有组合进行聚合。
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM lxw1234
GROUP BY month,day
WITH CUBE
ORDER BY GROUPING__ID;
等价于
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM lxw1234
UNION ALL
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
结果:
month day uv GROUPING__ID
--------------------------------------------
NULL NULL 7 0
2020-03 NULL 5 1
2020-04 NULL 6 1
NULL 2020-04-12 2 2
NULL 2020-04-13 3 2
NULL 2020-04-15 2 2
NULL 2020-04-16 2 2
NULL 2020-03-10 4 2
NULL 2020-03-12 1 2
2020-03 2020-03-10 4 3
2020-03 2020-03-12 1 3
2020-04 2020-04-16 2 3
2020-04 2020-04-12 2 3
2020-04 2020-04-13 3 3
2020-04 2020-04-15 2 3
7.ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。
可以通过ROLLU实现这上钻过程,如:月天的UV->月的UV->总UV。
比如,以month维度进行层级聚合:
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM lxw1234
GROUP BY month,day
WITH ROLLUP
ORDER BY GROUPING__ID;
等价于
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM lxw1234
UNION ALL
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day;
结果:
month day uv GROUPING__ID
---------------------------------------------------
NULL NULL 7 0
2020-03 NULL 5 1
2020-04 NULL 6 1
2020-03 2020-03-10 4 3
2020-03 2020-03-12 1 3
2020-04 2020-04-12 2 3
2020-04 2020-04-13 3 3
2020-04 2020-04-15 2 3
2020-04 2020-04-16 2 3