sql 聚合技巧总汇

最新推荐文章于 2024-07-24 14:28:41 发布

sukeee

最新推荐文章于 2024-07-24 14:28:41 发布

阅读量505

点赞数 1

分类专栏：实践小笔记文章标签： sql

本文链接：https://blog.csdn.net/sukeee/article/details/103001242

版权

实践小笔记专栏收录该内容

18 篇文章 0 订阅

订阅专栏

参考资料：《Effective SQL 编写高质量sql语句的61个有效方法》进度：3/61

聚合使用说明

1 GROUP BY 后面跟随的字段不可以用别名

2 GROUP BY 后select字段必须是GROUP BY可以表达的

可以：select A+B from T group by A，B
可以：select A+B+constant from T group by A+B
不可以：select A，B from T group by A+B
不可以：select A+constant+B from T group by A+B

3 聚合函数
count/sum/avg/min/max／var_pop 总体方差／var_samp 样本方差／stddev-pop 总体标准差／stddev_samp 样本标准差（总体还是样本是看平均时候除以N还是N-1，样本）

4 在使用GROUP BY之前应该用where把尽可能多的数据过滤掉，减少聚合过程中需要计算的内容

5 HAVING用于筛选聚合后的结果，使用时不能用select 中写的字段别名，需要把聚合的表达式重新写一遍

6 sql-92标准中需要把所有列都加在group by 后面，sql-99标准可以支持不用加在后面（mysql支持，其他支持的很少）
ps：听说sql-99有一千页，非常大块头。

7 建议不要把相关列放在groupby里，建议放在子查询里。提高聚合效率，同时也便于他人理解sql

select a.studentID，b.studentName,b.studentAge,sum(a.score)
from T1 as a 
group by a.studentID
left join T2 as b on a.studentID=b.studentID

其他聚合方式

原始表格：

维度1	维度2	指标
A	a	9
A	b	10
B	a	4

ROLLUP 上卷

维度1	维度2	指标
A	a	9
A	b	10
B	a	4
A	null	19
B	null	4
null	null	23

CUBE

维度1	维度2	指标
A	a	9
A	b	10
B	a	4
A	null	19
B	null	4
null	a	13
null	b	10
null	null	23

GROUPING SET

维度1	维度2	指标
A	null	19
B	null	4
null	a	13
null	b	10
null	null	23

使用方式

select 维度1，维度2，sum（指标）
from table
group by ROLLUP （维度1，维度2）／CUBE（维度1，维度2）／GROUPING SET （（维度1），（维度2），（））

mysql 不支持rollup和cube

使用HAVING的四个例题

第一题

找到平均缴获时间超过所有供应商的平均缴获时间的供应商（挑选低效率的供应商）
数据源表Orders结构：OrderID订单，VendorID供应商，OrderDate订单时间，DeliveryDate缴获时间

select  VendorID，avg(datadiff(OrderDate-DeliveryDate))as DeliveryTime
from Orders
where DeliveryDate is not Null
and OrderDate between A and B
group by VendorID
having sum(datadiff(OrderTime-DeliveryTime))< (
select avg(datadiff(OrderDate-DeliveryDate)) from Orders
where DeliveryDate is not Null and OrderDate between A and B
)

第二题

列出在一定时间内的总销售额大于某一品类别下的所有产品的平均销售额的产品（安好啊类别查找畅销产品）
数据源表Orders结构：OrderID订单，CategoryID品类，ProductID产品，Price销售价，OrderDate 日期

select  CategoryID，ProductID，sum(Price)
from Orders AS O1
where OrderDate between A and B
group by CategoryID，ProductID
having sum(Price)>(
	select  avg(sumPrice)
	from(
		select CategoryID，ProductID,sum(Price) as sumPrice 
		from Orders AS O2
		where OrderDate between A and B 
		and O2.CategoryID=O1.CategoryID
		group by CategoryID，ProductID)
	group by CategoryID
)

可以用CTE 可以作为临时表，简化代码。注意，使用CTE允许你定义一次复杂的链接和日期过滤，然后在外部子查询中用。

第三题

显示在任何一天订单总额超过1000美元的客户（列出每天的大额消费客户）
数据源表Orders结构：OrderID订单，CustomerID客户，Price消费金额，OrderDate 日期

select  OrderDate，CustomID，sum(Price)
from Orders 
where OrderDate between A and B
group by OrderDate，CustomerID
having sum(Price)>1000

第四题

计算最后一个季度单个订单的销售百分比
数据源表Orders结构：OrderID订单，ProductID产品，Price消费金额，OrderDate 日期

select  
	OrderID，
	sum(Price)／(select sum(Price)from Orders where OrderDate between A and B)
from Orders 
where OrderDate between A and B
group by OrderID

存在疑问的点：
？为什么不建议用select distinct
A：因为distinct效率比group by低
？ hive，sparksql，presto用的是sql92的标准么？
？什么情况下可以用CTE？
？什么范围的sql引擎可以用？
？CTE只能用一次还是可以使用多次？
? 第二题为什么这样写就可以对齐

sukeee

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
sql 聚合技巧总汇

参考资料：《Effective SQL 编写高质量sql语句的61个有效方法》进度：3/61？为什么不建议用select distinct？可以sum(case a=1 then b else 0 end)么？？ hive，sparksql，presto用的是sql92的标准么？？什么情况下可以用CTE？？什么范围的sql引擎可以用？？CTE只能用一次还是可以使用多次？? 第二题为什...
复制链接

扫一扫

专栏目录