记录一次sql优化

文章讲述了SQL查询中的优化技巧,重点讨论了如何通过减少JOIN操作、预过滤数据、避免子查询嵌套和使用DISTINCT来提高效率。作者强调了在处理大数据量时,子查询有时优于JOIN,特别是在条件筛选方面,以及合理利用分组聚合和排序的重要性。
摘要由CSDN通过智能技术生成

一定要看到最后!!!首先贴出原sql

SELECT
	tag.id,
	tag.item_id,
	tag.table_question AS tableQuestion,
	tag.subject_name AS subjectName,
	tag.subject_id,
	tag.question_type AS subjectType,
	tag.question_type AS questionType,
	GROUP_CONCAT( CASE tag.type WHEN 'Unit' THEN tag.title END ) AS tableUnit,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.title END ) AS tableTopic,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.label END ) AS TopicNumber,
	GROUP_CONCAT( CASE tag.type WHEN 'Skill' THEN tag.title END ) AS tableSkill,
	GROUP_CONCAT( CASE tag.type WHEN 'source' THEN tag.title END ) AS tableSource 
FROM
	(
	SELECT
		q.id,
		q.item_id,
		q.table_question,
		q.subject_name,
		q.subject_id,
		q.question_type,
		t.type,
		GROUP_CONCAT( t.title ORDER BY t.id ASC SEPARATOR ', ' ) AS title,
		GROUP_CONCAT( t.label ORDER BY t.id ASC SEPARATOR ', ' ) AS label 
	FROM
		td_ap_manage_class_room_question q
		LEFT JOIN td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id 
		AND q.item_id = t.item_id 
	WHERE
		q.id IN (
		SELECT
			q.id 
		FROM
			td_ap_manage_class_room_question q
			LEFT JOIN td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id 
			AND q.item_id = t.item_id
			LEFT JOIN td_ap_manage_question_label ql ON q.id = ql.question_id
			LEFT JOIN td_ap_manage_label l ON l.id = ql.label_id 
		WHERE q.subject_id = 1 AND ql.question_source = 1 AND ......
		) 
	GROUP BY
		q.id,
		t.type  
	ORDER BY
		t.id 
	) tag 
GROUP BY
	id;

分析最初的这段sql,主要涉及到的操作包括连接(join)、条件筛选、分组聚合以及条件嵌套查询。所以可优化的点初步分析为简化JOIN操作,避免或优化子查询

  • 预过滤数据:在JOIN之前根据需要过滤不必要的数据,以减少需要处理的数据量。
  • 将子查询转换为JOIN:在某些情况下,可以将子查询重写为JOIN操作,这样可以提高效率,因为数据库优化器通常更擅长优化JOIN操作。

说干就干

SELECT
	tag.id,
	tag.item_id,
	tag.table_question AS tableQuestion,
	tag.subject_name AS subjectName,
	tag.subject_id,
	tag.question_type AS subjectType,
	tag.question_type AS questionType,
	GROUP_CONCAT( CASE tag.type WHEN 'Unit' THEN tag.title END ) AS tableUnit,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.title END ) AS tableTopic,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.label END ) AS TopicNumber,
	GROUP_CONCAT( CASE tag.type WHEN 'Skill' THEN tag.title END ) AS tableSkill,
	GROUP_CONCAT( CASE tag.type WHEN 'source' THEN tag.title END ) AS tableSource 
FROM
	(
	SELECT
		q.id,
		q.item_id,
		q.table_question,
		q.subject_name,
		q.subject_id,
		q.question_type,
		t.type,
		GROUP_CONCAT( t.title ORDER BY t.id ASC SEPARATOR ', ' ) AS title,
		GROUP_CONCAT( t.label ORDER BY t.id ASC SEPARATOR ', ' ) AS label 
	FROM
		(
		SELECT
			q.id 
		FROM
			td_ap_manage_class_room_question q
			JOIN td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id 
			AND q.item_id = t.item_id
			LEFT JOIN td_ap_manage_question_label ql ON q.id = ql.question_id 
			AND ql.question_source = 1
			LEFT JOIN td_ap_manage_label l ON l.id = ql.label_id 
		WHERE
			q.subject_id = 1 AND ......
		) m
		JOIN td_ap_manage_class_room_question q ON m.id = q.id
		JOIN td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id 
		AND q.item_id = t.item_id 
	GROUP BY
		q.id,
		t.type 
	ORDER BY
		t.id 
	) tag 
GROUP BY
	id;

发现运行后更慢了,奇怪啊,明明感觉解开了一个子查询,而且使用join连接,左连接又提前筛选了数据,部分左连接也换成了内连接,不确定,explain一下 ! ! !

不看不知道,一看吓一跳,上面这段明显就是数据库没学明白,子查询非但没有解开,甚至改为了子查询之后,又多过了一遍join,经过教训,于是尝试彻底解开子查询和join

  1. 避免重复JOIN:SQL中有对td_ap_manage_class_room_question和td_ap_manage_class_room_question_tags表的重复JOIN操作。可以尝试合并这些操作,减少不必要的重复处理
  2. 减少子查询:SQL使用了多层子查询,导致数据库执行效率降低。

说干就干!!

SELECT
		q.id,
		q.item_id,
		q.table_question AS tableQuestion,
		q.subject_name AS subjectName,
		q.subject_id,
		q.question_type AS subjectType,
		q.question_type AS questionType,
		GROUP_CONCAT(DISTINCT CASE t.type WHEN 'Unit' THEN t.title END ORDER BY t.id) AS tableUnit,
		GROUP_CONCAT(DISTINCT CASE t.type WHEN 'Topic' THEN t.title END ORDER BY t.id) AS tableTopic,
		GROUP_CONCAT(DISTINCT CASE t.type WHEN 'Topic' THEN t.label END ORDER BY t.id) AS TopicNumber,
		GROUP_CONCAT(DISTINCT CASE t.type WHEN 'Skill' THEN t.title END ORDER BY t.id) AS tableSkill,
		GROUP_CONCAT(DISTINCT CASE t.type WHEN 'source' THEN t.title END ORDER BY t.id) AS tableSource
FROM
    td_ap_manage_class_room_question q
JOIN
    td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id AND q.item_id = t.item_id
LEFT JOIN
    td_ap_manage_question_label ql ON q.id = ql.question_id AND ql.question_source = 1
LEFT JOIN
    td_ap_manage_label l ON l.id = ql.label_id
WHERE
    q.subject_id = 1 AND ......
GROUP BY
    q.id,
    q.item_id,
    q.table_question,
    q.subject_name,
    q.subject_id,
    q.question_type
ORDER BY
    q.id, t.id;

这一次就清晰很多了,通过GROUP BY把该查的都用GROUP_CONCAT处理一下,每个表只用join连接一次,并且左连接提前筛选

注意:

  • DISTINCT去重影响效率:DISTINCT关键字在GROUP_CONCAT中避免重复值的产生,因为我的需求是这样的,如果这不是必要的,可以去除以提高性能。
  • *适当的添加索引:*比如q.subject_id、q.item_id、t.type上,来增加查询效率
  • *排序也会影响效率:*我这里都是主键索引,必须order by的情况下也可以通过索引优化

最逆天的来了: 因为我的数据量大概前两张表在20w左右,所以会发现优化后的速度竟然没有第一个快!?没错,测试了n多次,就是要慢个0.04-0.1s,explain了一下确实也是优化后的优势大,并且很简洁,但是忽略了GROUP_CONCAT,tags表被关联的数据非常多,一次性将如此多的数据聚合排序,开销比预聚合和排序大得多,所以看似简洁的优化后版本,比优化前还慢!

最终优化!!!

SELECT
	tag.id,
	tag.item_id,
	tag.table_question AS tableQuestion,
	tag.subject_name AS subjectName,
	tag.subject_id,
	tag.question_type AS subjectType,
	tag.question_type AS questionType,
	GROUP_CONCAT( CASE tag.type WHEN 'Unit' THEN tag.title END ) AS tableUnit,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.title END ) AS tableTopic,
	GROUP_CONCAT( CASE tag.type WHEN 'Topic' THEN tag.label END ) AS TopicNumber,
	GROUP_CONCAT( CASE tag.type WHEN 'Skill' THEN tag.title END ) AS tableSkill,
	GROUP_CONCAT( CASE tag.type WHEN 'source' THEN tag.title END ) AS tableSource 
FROM
	(
	SELECT
		q.id,
		q.item_id,
		q.table_question,
		q.subject_name,
		q.subject_id,
		q.question_type,
		t.type,
		GROUP_CONCAT( t.title ORDER BY t.id ASC SEPARATOR ', ' ) AS title,
		GROUP_CONCAT( t.label ORDER BY t.id ASC SEPARATOR ', ' ) AS label 
	FROM
		td_ap_manage_class_room_question q
		LEFT JOIN td_ap_manage_class_room_question_tags t ON q.subject_id = t.subject_id 
		AND q.item_id = t.item_id 
	WHERE
		q.id IN (
		SELECT
			q.id 
		FROM
			td_ap_manage_class_room_question q
			LEFT JOIN td_ap_manage_question_label ql ON q.id = ql.question_id AND ql.question_source = 1
			LEFT JOIN td_ap_manage_label l ON l.id = ql.label_id 
		WHERE q.subject_id = 1 AND 。。。。
		) 
	GROUP BY
		q.id,
		t.type  
	ORDER BY
		t.id 
	) tag 
GROUP BY
	id;

没错,是的,很眼熟,这不就是没有优化的时候少join了一个表吗?是这样的,下面直接给出结论,以此篇文章来纪念我又混了一天工资

结论:SQL优化要看联表、分组聚合、或者其他操作产生的数据量。子查询有的时候比联表产生的关联数据更少,分组聚合时预聚合比全部聚合更快,其他操作亦是如此。

  1. join固然好,但是有的时候用子查询反而少关联一张表,就比如q.id IN (子查询),这里就很关键!如果不用这个子查询,而是直接四张表关联,数据量一下就上来了。打个比方,用了where(子查询)就相当于:联表(表A*表B)+子查询(表A*表C*表D);而全部联表相当于:(表A*表B*表C*表D);因为我表A为主表,且表A表B数据量异常大,在没办法小表在前的情况下,子查询做条件真是不错的选择!(20w*30w*5*10 > 20w*30w+20w*5*10),简单的算数罢了,岂能拿下我大学生,虽说真实联表不是这么算的,但是话粗理不粗,大差不差就完了
  2. 分组聚合排序注意尺度,以上操作都会影响速度,预处理一下让数据量少下来才是王道,而不是一味追求简洁!!
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值