分析对比,测试说明:device_id是varchar类型 ,current_user_id 是int类型 ,测试数据4千万条,device_id和device_id只有2万多,其他都是重复
第一:用explain分析执行数据
explain
select count(DISTINCT device_id) from device_status_log_bak
Aggregate (cost=14128.71..14128.72 rows=1 width=8)
-> Seq Scan on device_status_log (cost=0.00..12821.77 rows=522777 width=17)
--时间 31.95
explain
select count(0) from (select DISTINCT device_id from device_status_log_bak ) a
Aggregate (cost=910444.03..910444.04 rows=1 width=8)
-> Unique (cost=0.56..910361.13 rows=6632 width=20)
-> Index Only Scan using index_device_id on device_status_log_bak (cost=0.56..800011.17 rows=44139984 width=20)
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true
第二:varchar比较sql
select count(DISTINCT device_id) from device_status_log_bak
--时间 31.95
select count(0) from (select DISTINCT device_id from device_status_log_bak ) a
-- 时间 7.57
select count(0) from (
select count(0) from device_status_log_bak GROUP BY device_id) b
-- 时间2.469
第三:int比较sql
select count(DISTINCT current_user_id) from device_status_log_bak
--时间 12.26
select count(0) from (select DISTINCT current_user_id from device_status_log_bak ) a
-- 时间 6.32
select count(0) from (
select count(0) from device_status_log_bak GROUP BY current_user_id) b
-- 时间2.17
总结:用DISTINCT统计,int比varchar要快一些
用 GROUP BY和DISTINCT统计,不管是int还是varchar都要快