一、背景
需求:求【昨日触达】、【本周拜访】、【本月活动数】。
背景:三个维度的数据都按日统计在一张表里。
首先想到的就是left join +group by。
1.1 优化前语句
select
user_id,
user_name,
org_id,
org_name,
sum(reach_num) dayReachNum,
sum(visit_num) weekVisitNum,
sum(activity_num) mouthActivityNum
from
(
select
srsm.user_id,
srsm.user_name,
srsm.org_id,
srsm.org_name,
srsd.reach_num,
srsw.visit_num,
srsm.activity_num
from
standard_reach_statistics srsm
left join standard_reach_statistics srsd on
srsd.user_id = srsm.user_id
and srsd.statistical_date between '2024-01-01' and '2024-01-01'
left join standard_reach_statistics srsw on
srsm.user_id = srsw.user_id
and srsw.statistical_date between '2024-01-01' and '2024-01-07'
where
srsm.statistical_date between '2024-01-01' and '2024-01-31'
) tt
group by
user_id,
user_name,
org_id,
org_name
速度太慢,standard_reach_statistics表总的数据量也不过是3w条,需要50s才能跑完,这在系统中肯定是不被允许的。
分析这条语句存在的问题:
- 大表驱动小表;
- 三张3w的表进行left join,产生的连接次数为3w*3w,已经接近9亿了,这是万万不能接受的;
- 连表字段的索引已经建,这条没有问题;
二、优化后语句
思路:
- 解决后两个left join表数据过大问题:可以先把后两个left join语句sum出来,在left join;
select
srsm.user_id,
reach_num_sum dayReachNum,
visit_num_sum weekVisitNum ,
sum(srsm.activity_num) mouthActivityNum
from
standard_reach_statistics srsm
left join (
select
srsd.user_id,
sum(srsd.reach_num) reach_num_sum
from
standard_reach_statistics srsd
where
srsd.statistical_date between '2024-01-01' and '2024-01-01'
group by
user_id ) sum_day on
srsm.user_id = sum_day.user_id
left join (
select
srsw.user_id,
sum(srsw.visit_num)visit_num_sum
from
standard_reach_statistics srsw
where
srsw.statistical_date between '2024-01-01' and '2024-01-07'
group by
user_id ) sum_month on
srsm.user_id = sum_month.user_id
where
srsm.statistical_date between '2024-01-01' and '2024-01-31'
group by
user_id;
优化后效率由原来的50s,变成了86ms,效率提升明显。