文章目录
单表查询优化
建表
CREATE TABLE IF NOT EXISTS `article` (
`id` INT(10) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`author_id` INT(10) UNSIGNED NOT NULL,
`category_id` INT(10) UNSIGNED NOT NULL,
`views` INT(10) UNSIGNED NOT NULL,
`comments` INT(10) UNSIGNED NOT NULL,
`title` VARBINARY(255) NOT NULL,
`content` TEXT NOT NULL
);
INSERT INTO `article`(`author_id`, `category_id`, `views`, `comments`, `title`, `content`) VALUES
(1, 1, 1, 1, '1', '1'),
(2, 2, 2, 2, '2', '2'),
(1, 1, 3, 3, '3', '3');
SELECT * FROM article;
没建索引的情况下分析该sql语句:
EXPLAIN SELECT id,author_id FROM `article` WHERE category_id = 1
AND comments >1 ORDER BY views DESC LIMIT 1;
由comments可知,范围后面索引失效,所以索引这样建:
CREATE INDEX idx_article_cv ON article(category_id,views);
执行explain:
EXPLAIN SELECT id,author_id FROM `article` WHERE category_id = 1
AND comments >1 ORDER BY views DESC LIMIT 1;
特殊情况
该表的字段在这边文章:https://blog.csdn.net/sinat_28317385/article/details/109577550
EXPLAIN select * from staffs where name > '33'
where 第一个字段如果用">“或者”<“号,则第一个字段不能走索引,如果where第一个字段使用”=“号,第二个字段使用”>"号,那第二个字段会走索引
关联查询优化
建表
CREATE TABLE IF NOT EXISTS `class` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `book` (
`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`bookid`)
);
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
没建索引的情况下分析该sql语句:
EXPLAIN SELECT * FROM class LEFT JOIN book ON class.card = book.card;
左连接,一般在从表建索引合适
ALTER TABLE class ADD INDEX X (card);
执行explain:
EXPLAIN SELECT * FROM class LEFT JOIN book ON class.card = book.card;
JOIN语句的优化:
- 尽可能减少JOIN语句中的NestedLoop(嵌套循环)的总次数:永远都是小的结果集驱动大的结果集。
- 优先优化NestedLoop的内层循环。
- 保证JOIN语句中被驱动表上JOIN条件字段已经被索引。
- 当无法保证被驱动表的JOIN条件字段被索引且内存资源充足的前提下,不要太吝惜Join Buffer 的设置。
子查询优化
资料来源:https://blog.csdn.net/qq_27409289/article/details/85963089
用 exists 是否存在,存在返回一条记录,exists 是作为一个查询判断用,所以 select 后返回什么不重要。
EXISTS(subquery)子查询只返回true或者false,因此子查询中的SELECT *可以是SELECT 1 OR SELECT X,它们并没有区别。
exist:适合子查询中表数据大于外查询表中数据的业务场景
in:适合外部表数据大于子查询的表数据的业务场景
in 和 exists的区别:
如果子查询得出的结果集记录较少,主查询中的表较大且又有索引时应该用in, 反之如果外层的主查询记录较少,子查询中的表大,又有索引时使用exists。
其实我们区分in和exists主要是造成了驱动顺序的改变(这是性能变化的关键),如果是exists,那么以外层表为驱动表,先被访问,如果是IN,那么先执行子查询,所以我们会以驱动表的快速返回为目标,那么就会考虑到索引及结果集的关系了 ,另外IN时不对NULL进行处理。
in 是把外表和内表作hash 连接,而exists是对外表作loop循环,每次loop循环再对内表进行查询。一直以来认为exists比in效率高的说法是不准确的
无索引的情况下
大表驱动小表
in 和 exists 的性能应该是接近的,都比较糟糕,exists稍微好一点,不超过5% ,但是inner join 优于使用了 join buffer 所以快很多,第二次运行inner join语句的话时间会下降
如果left join 则最慢
案例1
EXPLAIN select * from sb_boss where job in (select job from sb_boss where age = 3) # 表数据order by关键字优化有
案例2
EXPLAIN select sb.* from sb_boss sb INNER JOIN sb_boss boss on sb.job = boss.job where boss.age=3
有索引的情况下
用inner join是最好的,其次是in,exists最糟糕
创建索引
create index idx_age on sb_boss(age)
案例1
EXPLAIN select * from sb_boss where job in (select job from sb_boss where age = 3)
案例2
EXPLAIN select sb.* from sb_boss sb INNER JOIN sb_boss boss on sb.job = boss.job where boss.age=3
通常情况
案例1
EXPLAIN select * from sb_boss where age in (1,2,3,44)
没加索引的情况
加索引的情况下
- 结论:in后面为常数,无法使用索引
案例2
EXPLAIN select * from sb_boss where age in (select age from sb_boss where id = 1)
没加索引
加了索引
- 结论:子查询使用主键索引,外层in可以使用到索引
案例3
EXPLAIN select * from sb_boss where age in (select age from sb_boss where name = '123')
没有加索引
age加索引
给name加上索引
删除age索引
- 结论:当子查询使用普通索引的时候,外层in没有使用到索引
order by关键字优化
建表
CREATE TABLE `sb_boss` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`age` int(11) NOT NULL,
`job` varchar(255) NOT NULL,
`screen_name` varchar(255) NOT NULL,
`content` varchar(255) DEFAULT NULL,
`fuck` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_ageJobScreenName` (`age`,`job`,`screen_name`) # 索引
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
INSERT INTO `sb_boss` VALUES ('1', '123', '3', 'ww', 'sds', '32s', '23');
INSERT INTO `sb_boss` VALUES ('2', '33', '5', 'wadw', 'adsa', 'ccc', 'sd');
INSERT INTO `sb_boss` VALUES ('3', '3das', '6', 'saf', 'xzcv', 'dsfa', 'wewe');
INSERT INTO `sb_boss` VALUES ('4', 'ewew', '44', 'ad', 'xzdsfcv', 'asdf', 'wewadsfe');
案例1:
EXPLAIN select * from sb_boss where age > 20 order by age
案例2:
EXPLAIN select * from sb_boss where age > 20 order by age,job
案例3:
EXPLAIN select * from sb_boss where age > 20 order by job
案例4
EXPLAIN select * from sb_boss where age > 20 order by job,age
案例5
EXPLAIN select * from sb_boss order by age
案例6
EXPLAIN select * from sb_boss where job > '' order by job
案例7
EXPLAIN select * from sb_boss where job > '' order by age
案例8
EXPLAIN select age,job from sb_boss where job > '' ORDER BY age
案例9
EXPLAIN select age,job from sb_boss order by age asc ,job asc
案例10
EXPLAIN select age,job from sb_boss order by age asc ,job desc
结论
order by语句使用索引进行排序,并没有用来查找数据,所以key_len字段的长度没有变化,从上面案例可看出,需要搭配where语句来使用,
而且覆盖索引可以一定程序上消除FileSort,但是,如果sql语句使用了混合排序,还是会出现FileSort
排序算法
如果不在索引列上,File Sort有两种算法:MySQL就要启动双路排序算法和单路排序算法
双路排序算法
MySQL4.1之前使用双路排序,字面意思就是两次扫描磁盘,最终得到数据,读取行指针和ORDER BY列,対他们进行排序,然后扫描已经排序好的列表,按照列表中的值重新从列表中读取对应的数据输出。一句话,从磁盘取排序字段,在buffer中进行排序,再从磁盘取其他字段。
取一批数据,要对磁盘进行两次扫描,众所周知,IO是很耗时的,所以在MySQL4.1之后,出现了改进的算法,就是单路排序算法。
单路排序算法
从磁盘读取查询需要的所有列,按照ORDER BY列在buffer対它们进行排序,然后扫描排序后的列表进行输出,它的效率更快一些,避免了第二次读取数据。并且把随机IO变成了顺序IO,但是它会使用更多的空间,因为它把每一行都保存在内存中了。
由于单路排序算法是后出的,总体而言效率好过双路排序算法。
但是单路排序算法有问题:如果SortBuffer缓冲区太小,导致从磁盘中读取所有的列不能完全保存在SortBuffer缓冲区中,这时候单路复用算法就会出现问题,会导致了大量的I/O操作,反而性能不如双路复用算法。
单路复用算法的优化策略:
- 增大sort_buffer_size参数的设置,用于单路排序的内存大小
- 增大max_length_for_sort_data参数的设置,单次排序字段大小。(单次排序请求)
- 去掉select后面不需要的字段
提高ORDER BY排序的速度:
- ORDER BY时使用SELECT *是大忌,查什么字段就写什么字段,这点非常重要。
在这里的影响是:
- 当查询的字段大小总和小于max_length_for_sort_data而且排序字段不是TEXT|BLOB类型时,会使用单路排序算法,否则使用多路排序算法。
- 两种排序算法的数据都有可能超出sort_buffer缓冲区的容量,超出之后,会创建tmp临时文件进行合并排序,导致多次IO,但是单路排序算法的风险会更大一些,所以要增大sort_buffer_size参数的设置。
-
尝试提高sort_buffer_size:不管使用哪种算法,提高这个参数都会提高效率,当然,要根据系统的能力去提高,因为这个参数是针对每个进程的。
-
尝试提高max_length_for_sort_data:提高这个参数,会增加用单路排序算法的概率。但是如果设置的太高,数据总容量sort_buffer_size的概率就增大,明显症状是高的磁盘IO活动和低的处理器使用率。
group by关键字优化
建表
create table if not exists employee(eid int auto_increment primary key,name varchar(5),age int,salary int,depart varchar(5),workage int);
insert into employee (name,age,salary,depart,workage)
values ('崔铭','25',1500,'研发部',3),
('佳伟','23',1000,'市场部',2),
('刘涵','30',10600,'人事部',6),
('孙铭泽','25',2000,'运营部',5),
('张吉龙','21',15000,'生产部',12),
('从好平','22',1500,'质量部',3),
('杨忠','22',5000,'财务部',4),
('芦淞','24',6000,'采购部',7),
('马玉','25',450000,'销售部',29),
('成林','21',12000,'安全部',10),
('张龙','32',17000,'研发部',21),
('王建业','25',11000,'研发部',7),
('王佳敏','22',10000,'市场部',9),
('姜佳伟','27',10000,'人事部',13),
('王国栋','20',10012,'研发部',2),
('周昌洋','38',10560,'研发部',1),
('刘鑫鑫','18',1900,'人事部',6),
('刘博','21',2000,'研发部',11),
('乔鑫','19',13000,'运营部',12),
('宇航','20',10500,'生产部',21),
('赵浩然','33',10400,'总经理',21),
('常盛','24',1000,'生产部',2),
('刘麟','25',3000,'武装部',8);
案例1
select depart,SUM(salary) from employee
group by depart;
加上索引
create index idx_depart on employee(depart)
- 索引起效
结论
-
GROUP BY实质是先排序后进行分组,遵照索引建的最佳左前缀。
-
当无法使用索引列时,会使用Using filesort进行排序,增大max_length_for_sort_data参数的设置和增大 sort_buffer_size参数的设置,会提高性能。
-
WHERE执行顺序高于HAVING,能写在WHERE限定条件里的就不要写在HAVING中了。
聚合函数的优化
案例1
EXPLAIN select MIN(salary) from employee
给salary加上索引
create index idx_salary on employee(salary)
- select tables optimized away说明通过索引而直接一次定位到所需的数据行完成整个查询
案例2
EXPLAIN select min(salary) from employee where workage=3
给salary加上索引
create index idx_salary on employee(salary)
删除索引,给workage加上索引
drop index idx_salary on employee
create index idx_salary on employee(workage)
注意“type”为ref
删除索引,给salary,workage加上索引
drop index idx_salary on employee
create index idx_salary on employee(salary,workage)
- 一对比,可看出直接给where后面字段加索引效果更好
案例3
EXPLAIN select min(salary) from employee where workage>3
给where加上索引
create index idx_salary on employee(workage)
运行explain
这种情况是因为单表查询优化中所说的特殊情况,给字段加索引,首个字段使用">“或”<",首字段的索引也会失效,使用"="就会起效
删除索引,salary字段加上索引
create index idx_salary on employee(salary,workage)
SELECT COUNT(*)会不会造成全表扫描呢?
EXPLAIN select count(*) from class
EXPLAIN select count(*) from emp
- 结论:
select count(*)如果在该表有索引的情况下并不会造成全表扫描,没有普通索引的情况下mysql会选择主键索引,如果有普通索引,mysql会选择成本最小的索引来使用
分页查询的优化-limit的优化
建表
/* 1.dept表 */
CREATE TABLE `dept` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`deptno` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '部门id',
`dname` varchar(20) NOT NULL DEFAULT '' COMMENT '部门名字',
`loc` varchar(13) NOT NULL DEFAULT '' COMMENT '部门地址',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='部门表'
/* 2.emp表 */
CREATE TABLE `emp` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`empno` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '员工编号',
`ename` varchar(20) NOT NULL DEFAULT '' COMMENT '员工名字',
`job` varchar(9) NOT NULL DEFAULT '' COMMENT '职位',
`mgr` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '上级编号',
`hiredata` date NOT NULL COMMENT '入职时间',
`sal` decimal(7,2) NOT NULL COMMENT '薪水',
`comm` decimal(7,2) NOT NULL COMMENT '分红',
`deptno` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '部门id',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='员工表'
- 创建函数
# 1、函数:随机产生字符串
DELIMITER $$
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwsyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
DECLARE return_str VARCHAR(255) DEFAULT '';
DECLARE i INT DEFAULT 0;
WHILE i < n DO
SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
SET i = i + 1;
END WHILE;
RETURN return_str;
END $$
# 2、函数:随机产生部门编号
DELIMITER $$
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
DECLARE i INT DEFAULT 0;
SET i = FLOOR(100 + RAND() * 10);
RETURN i;
END $$
- 创建存储过程
# 1、函数:向dept表批量插入
DELIMITER $$
CREATE PROCEDURE insert_dept(IN START INT(10),IN max_num INT(10))
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO dept(deptno,dname,loc) VALUES((START + i),rand_string(10),rand_string(8));
UNTIL i = max_num
END REPEAT;
COMMIT;
END $$
# 2、函数:向emp表批量插入
DELIMITER $$
CREATE PROCEDURE insert_emp(IN START INT(10),IN max_num INT(10))
BEGIN
DECLARE i INT DEFAULT 0;
SET autocommit = 0;
REPEAT
SET i = i + 1;
INSERT INTO emp(empno,ename,job,mgr,hiredata,sal,comm,deptno) VALUES((START + i),rand_string(6),'SALESMAN',0001,CURDATE(),2000,400,rand_num());
UNTIL i = max_num
END REPEAT;
COMMIT;
END $$
- 调用存储过程
# 1、调用存储过程向dept表插入10个部门。
DELIMITER ;
CALL insert_dept(100,10);
# 2、调用存储过程向emp表插入50万条数据。
DELIMITER ;
CALL insert_emp(100001,500000); # 这里电脑卡的可以调成10001,50000,一次性生成5w条,可重复运行,直至50w
列较少或列宽较小的情况
没有加索引,直接运行
- 没有使用inner join
SELECT SQL_NO_CACHE ename FROM emp ORDER BY deptno LIMIT 10000,40 # SQL_NO_CACHE 表明不使用缓存
- 使用inner join
SELECT SQL_NO_CACHE ename FROM emp INNER JOIN
(SELECT id FROM emp e ORDER BY deptno LIMIT 10000,40) a ON a.id=emp.id
创建索引
CREATE index idx_deptno_ename on emp(deptno,ename)
* 或者列较多的情况
SELECT SQL_NO_CACHE * FROM emp ORDER BY deptno LIMIT 10000,40
加上索引
CREATE index idx_deptno on emp(deptno)
修改语句为inner join
SELECT SQL_NO_CACHE * FROM emp INNER JOIN
(SELECT id FROM emp e ORDER BY deptno LIMIT 10000,40) a ON a.id=emp.id
结论:
在select返回列较少或列宽较小的时候,我们可以通过建立复合索引的方式优化分页查询,效果更佳,因为它不需要回表,但是列较多只能使用inner join优化了
优化技巧
该资料来源于:https://blog.csdn.net/qq_42046105/article/details/102813512
关联更新、删除
虽然 MySQL5.6 引入了物化特性,但需要特别注意它目前仅仅针对查询语句的优化。对于更新或删除需要手工重写成 JOIN。
比如下面 UPDATE 语句,MySQL 实际执行的是循环/嵌套子查询(DEPENDENT SUBQUERY),其执行时间可想而知。
UPDATE operation o
SET status = 'applying'
WHERE o.id IN (SELECT id
FROM (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t);
执行计划:
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |
| 2 | DEPENDENT SUBQUERY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
重写为 JOIN 之后,子查询的选择模式从 DEPENDENT SUBQUERY 变成 DERIVED,执行速度大大加快,从7秒降低到2毫秒。
UPDATE operation o
JOIN (SELECT o.id,
o.status
FROM operation o
WHERE o.group = 123
AND o.status NOT IN ( 'done' )
ORDER BY o.parent,
o.id
LIMIT 1) t
ON o.id = t.id
SET status = 'applying'
执行计划简化为:
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY | | | | | | | | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
EXISTS语句
MySQL 对待 EXISTS 子句时,仍然采用嵌套子查询的执行方式。如下面的 SQL 语句:
SELECT *
FROM my_neighbor n
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND EXISTS(SELECT 1
FROM message_info m
WHERE n.id = m.neighbor_id
AND m.inuser = 'xxx')
AND n.topic_type <> 5
执行计划为:
+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
| 1 | PRIMARY | n | ALL | | NULL | NULL | NULL | 1086041 | Using where |
| 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
去掉 exists 更改为 join,能够避免嵌套子查询,将执行时间从1.93秒降低为1毫秒。
SELECT *
FROM my_neighbor n
INNER JOIN message_info m
ON n.id = m.neighbor_id
AND m.inuser = 'xxx'
LEFT JOIN my_neighbor_apply sra
ON n.id = sra.neighbor_id
AND sra.user_id = 'xxx'
WHERE n.topic_status < 4
AND n.topic_type <> 5
新的执行计划:
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |
| 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |
| 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
条件下推
外部查询条件不能够下推到复杂的视图或子查询的情况有:
-
聚合子查询;
-
含有 LIMIT 的子查询;
-
UNION 或 UNION ALL 子查询;
-
输出字段中的子查询;
如下面的语句,从执行计划可以看出其条件作用于聚合子查询之后:
SELECT *
FROM (SELECT target,
Count(*)
FROM operation
GROUP BY target) t
WHERE target = 'rm-xxxx'
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 514 | const | 2 | Using where |
| 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
确定从语义上查询条件可以直接下推后,重写如下:
SELECT target,
Count(*)
FROM operation
WHERE target = 'rm-xxxx'
GROUP BY target
执行计划变为:
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
关于 MySQL 外部条件不能下推的详细解释说明请参考文章:
http://mysql.taobao.org/monthly/2016/07/08
提前缩小范围
先上初始 SQL 语句:
SELECT *
FROM my_order o
LEFT JOIN my_userinfo u
ON o.uid = u.uid
LEFT JOIN my_productinfo p
ON o.pid = p.pid
WHERE ( o.display = 0 )
AND ( o.ostaus = 1 )
ORDER BY o.selltime DESC
LIMIT 0, 15
该SQL语句原意是:先做一系列的左连接,然后排序取前15条记录。从执行计划也可以看出,最后一步估算排序记录数为90万,时间消耗为12秒。
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
| 1 | SIMPLE | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
由于最后 WHERE 条件以及排序均针对最左主表,因此可以先对 my_order 排序提前缩小数据量再做左连接。SQL 重写后如下,执行时间缩小为1毫秒左右。
SELECT *
FROM (
SELECT *
FROM my_order o
WHERE ( o.display = 0 )
AND ( o.ostaus = 1 )
ORDER BY o.selltime DESC
LIMIT 0, 15
) o
LEFT JOIN my_userinfo u
ON o.uid = u.uid
LEFT JOIN my_productinfo p
ON o.pid = p.pid
ORDER BY o.selltime DESC
limit 0, 15
再检查执行计划:子查询物化后(select_type=DERIVED)参与 JOIN。虽然估算行扫描仍然为90万,但是利用了索引以及 LIMIT 子句后,实际执行时间变得很小。
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL |
| 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
中间结果集下推
再来看下面这个已经初步优化过的例子(左连接中的主表优先作用查询条件):
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
那么该语句还存在其它问题吗?不难看出子查询 c 是全表聚合查询,在表数量特别大的情况下会导致整个语句的性能下降。
其实对于子查询 c,左连接最后结果集只关心能和主表 resourceid 能匹配的数据。因此我们可以重写语句如下,执行时间从原来的2秒下降到2毫秒。
SELECT a.*,
c.allocated
FROM (
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20) a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
但是子查询 a 在我们的SQL语句中出现了多次。这种写法不仅存在额外的开销,还使得整个语句显的繁杂。使用 WITH 语句再次重写:
WITH a AS
(
SELECT resourceid
FROM my_distribute d
WHERE isdelete = 0
AND cusmanagercode = '1234567'
ORDER BY salecode limit 20)
SELECT a.*,
c.allocated
FROM a
LEFT JOIN
(
SELECT resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
FROM my_resources r,
a
WHERE r.resourcesid = a.resourcesid
GROUP BY resourcesid) c
ON a.resourceid = c.resourcesid
总结
数据库编译器产生执行计划,决定着SQL的实际执行方式。但是编译器只是尽力服务,所有数据库的编译器都不是尽善尽美的。
上述提到的多数场景,在其它数据库中也存在性能问题。了解数据库编译器的特性,才能避规其短处,写出高性能的SQL语句。
程序员在设计数据模型以及编写SQL语句时,要把算法的思想或意识带进来。
编写复杂SQL语句要养成使用 WITH 语句的习惯。简洁且思路清晰的SQL语句也能减小数据库的负担 。