group by和distinct 统计比较效率

最新推荐文章于 2023-04-27 18:26:35 发布

liuxianwen1990

最新推荐文章于 2023-04-27 18:26:35 发布

阅读量800

点赞数

分类专栏： postgresql mysql 文章标签： postgresql

本文链接：https://blog.csdn.net/liuxianwen1990/article/details/123006930

版权

postgresql 同时被 2 个专栏收录

10 篇文章 1 订阅

订阅专栏

mysql

4 篇文章 0 订阅

订阅专栏

分析对比，测试说明：device_id是varchar类型，current_user_id 是int类型，测试数据4千万条，device_id和device_id只有2万多，其他都是重复
第一：用explain分析执行数据
explain
select count(DISTINCT device_id) from device_status_log_bak
Aggregate (cost=14128.71..14128.72 rows=1 width=8)
-> Seq Scan on device_status_log (cost=0.00..12821.77 rows=522777 width=17)
--时间 31.95
explain
select count(0) from (select DISTINCT device_id from device_status_log_bak ) a
Aggregate (cost=910444.03..910444.04 rows=1 width=8)
-> Unique (cost=0.56..910361.13 rows=6632 width=20)
-> Index Only Scan using index_device_id on device_status_log_bak (cost=0.56..800011.17 rows=44139984 width=20)
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true

第二：varchar比较sql
select count(DISTINCT device_id) from device_status_log_bak
--时间 31.95
select count(0) from (select DISTINCT device_id from device_status_log_bak ) a
-- 时间 7.57
select count(0) from (
select count(0) from device_status_log_bak GROUP BY device_id) b
-- 时间2.469

第三：int比较sql
select count(DISTINCT current_user_id) from device_status_log_bak
--时间 12.26
select count(0) from (select DISTINCT current_user_id from device_status_log_bak ) a
-- 时间 6.32
select count(0) from (
select count(0) from device_status_log_bak GROUP BY current_user_id) b
-- 时间2.17
总结：用DISTINCT统计，int比varchar要快一些
用 GROUP BY和DISTINCT统计，不管是int还是varchar都要快

liuxianwen1990

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
group by和distinct 统计比较效率

分析对比，测试说明：device_id是varchar类型，current_user_id 是int类型，测试数据4千万条，device_id和device_id只有2万多，其他都是重复第一：用explain分析执行数据explainselect count(DISTINCT device_id) from device_status_log_bakAggregate (cost=14128.71..14128.72 rows=1 width=8) -> Seq Scan on d...
复制链接

扫一扫