项目中碰到一个慢查询,里面既有COUNT(DISTINCT),又有GROUP BY,查询性能很差,26万条数据查询下来需要18秒,sql如下
SELECT
a.channel_code AS channelCode,
a.channel_name AS channelName,
DATE_FORMAT(a.create_date, '%Y') AS orderDate,
COUNT(DISTINCT a.order_no) AS orderCount,
COUNT(DISTINCT a.user_id) AS userCount,
SUM(a.payment) AS totalAmount,
SUM(a.content_rate_fee) AS rateAmount
FROM
user_order a
WHERE
a.del_flag = '0'
AND a.create_date >= '2017'
AND a.create_date <= '2018'
GROUP BY
a.channel_code,
DATE_FORMAT(a.create_date, '%Y') DESC
LIMIT 80,20
以上sql运行时间18秒
网上找到的优化方案都是简单的COUNT(DISTINCT),没有GROUP BY子句,然后让优化成GROUP BY的形式。但是上面的例子比较复杂,既有COUNT(DISTINCT)又有GROUP BY,这样就不好优化了。有些人建议后台写个定时任务,查出来存起来,这样虽然可行,但是并不是直接解决方案,还需要额外的存储,不灵活。
分析慢查询发现主要因为group by形成了很多分组,然后COUNT(DISTINCT)对每个分组进行虑重,最后只取分页的那20条,其他都丢弃了,这个很浪费性能
经过探索,发现了如下写法,可以极大的优化性能
SELECT
b.channel_code AS channelCode,
b.channel_name AS channelName,
DATE_FORMAT(b.create_date, '%Y') AS orderDate,
COUNT(DISTINCT b.order_no) AS orderCount,
COUNT(DISTINCT b.user_id) AS userCount,
SUM(b.payment) AS totalAmount,
SUM(b.content_rate_fee) AS rateAmount
FROM
user_order b
JOIN (
SELECT
a.channel_code AS channelCode,
DATE_FORMAT(a.create_date, '%Y') AS orderDate
FROM
user_order a
WHERE
a.del_flag = '0'
AND a.create_date >= '2017'
AND a.create_date <= '2018'
GROUP BY
a.channel_code,
DATE_FORMAT(a.create_date, '%Y') DESC
LIMIT 80,20
) aa ON b.channel_code = aa.channelCode
AND DATE_FORMAT(b.create_date, '%Y') = aa.orderDate
AND b.del_flag = '0'
GROUP BY
b.channel_code,
DATE_FORMAT(b.create_date, '%Y') DESC
以上sql直行时间3.5秒,比老方案快了将近15秒。
分析新方案,先不进行COUNT(DISTINCT),而是将数据范围先缩小,就是先只取分页需要的那20个数据,然后进行join,这样查出来的分组就很少了,再进行COUNT(DISTINCT),对很少量的分组进行DISTINCT,性能大大的提升了。