当大表之前的连表查询
如果业务是可以限制数量的话
我们的连表逻辑应该是获取限制数量的主表结果集再去连表查询出相关的数据
而优化前这种是连表完成后再去完成 group by、ORDER BY、LIMIT
当命中的数据量是百万级别的时候 我们可以看业务限制数据量来提高查询效率
数据量大的时候我们会进行分表操作 会按照分表条件例如时间拆分主表
如果我们还需要rpt_visit_page的多个分表的结果数据
此时我们可以放弃每个分表连表查询获取相关数据 而是在经过分表的多表结果排序最后确定返回的2000再去获取相关的关联表数据
这个看具体的业务
优化前
select url_id,SUM(page_views) page_views,SUM(unique_visitors) unique_visitors,
SUM(bounce) bounce,SUM(total_duration) total_duration,url_reference.url page_url,url_reference.title
page_title
from
rpt_visit_page a
left join url_reference on a.url_id=url_reference.hashcode
where
a.channel_id =1
AND a.report_type = 1
AND a.report_date >= '2020-05-17 00:00:00'
AND a.report_date <= '2020-07-17 00:00:00'
group by a.url_id
ORDER BY page_views DESC
LIMIT 2000
优化后
EXPLAIN SELECT
a.url_id,
a.page_views,
a.unique_visitors,
a.bounce,
a.total_duration,
url_reference.url AS page_url,
url_reference.title AS page_title
FROM (
SELECT
url_id,
SUM( page_views ) AS page_views,
SUM( unique_visitors ) AS unique_visitors,
SUM( bounce ) AS bounce,
SUM( total_duration ) AS total_duration
FROM
rpt_visit_page
WHERE
channel_id = 1
AND report_type = 1
AND report_date >= '2020-05-17 00:00:00'
AND report_date <= '2020-07-17 00:00:00'
GROUP BY
url_id
ORDER BY
page_views DESC
LIMIT 2000
) a
LEFT JOIN url_reference ON a.url_id = url_reference.hashcode