SQL优化（以Mysql为例Oracle可参考）

最新推荐文章于 2024-07-08 13:03:29 发布

键盘歌唱家

最新推荐文章于 2024-07-08 13:03:29 发布

阅读量194

点赞数

分类专栏： oracle/mysql 文章标签：数据库 mysql 索引 sql

本文链接：https://blog.csdn.net/MoastAll/article/details/117708694

版权

oracle/mysql 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

本文详细探讨了MySQL5.X版本的SQL优化，包括优化原因、语句分析、引擎选择（InnoDB与MyISAM）、索引原理与管理、SQL执行计划分析等。重点讲解了如何通过创建和管理索引来提升查询效率，以及如何通过EXPLAIN分析SQL执行计划，理解查询过程中的索引使用和数据访问策略。此外，还提供了多个实例来展示如何优化SQL语句，以提高数据库性能。

摘要由CSDN通过智能技术生成

本篇博客主要针对Mysql5.X版本
如果本篇博客内容不全，请看官网https://dev.mysql.com/doc/refman/8.0/en/optimization.html

一、SQL优化的原因

性能低、执行时间太长、等待时间太长、SQL语句欠佳（连接查询）、索引失效、服务器参数设置不合理（缓冲、线程数）

二、SQL语句的分析

通常我们书写一条查询SQL语句，比如下面的语句：

select dinstinct  ..from  ..join ..on ..where ..group by ...having ..order by ..limit ..

实际的解析顺序是下面的：

from .. on.. join ..where ..group by ....having ...select dinstinct ..order by limit ...

三、SQL优化之引擎

我们可以通过下面的命令查询mysql的引擎有哪些：

#支持哪些引擎？
show engines ;
#查看当前使用的引擎
show variables like '%storage_engine%' ;

我们主要分析两个引擎： InnoDB(默认) 、MyISAM

InnoDB(默认) ：事务优先（适合高并发操作；行锁）
MyISAM ：性能优先（表锁）

根据业务需求我们可以选用不同的引擎，下面是指定数据库对象的引擎的方法，在创建表的时候就指定好：

	create table tb(
		id int(4) auto_increment ,
		name varchar(5),
		dept varchar(5) ,
		primary key(id)		
	)ENGINE=MyISAM AUTO_INCREMENT=1
	 DEFAULT CHARSET=utf8   ;

四、SQL优化之索引

主要就是在优化索引

1.索引的描述

索引：
相当于书的目录
index是帮助MYSQL高效获取数据的数据结构。索引是数据结构（树：B树(默认)、Hash树…）
索引的弊端：
1.索引本身很大，可以存放在内存/硬盘（通常为硬盘）
2.索引不是所有情况均适用： a.少量数据 b.频繁更新的字段 c.很少使用的字段
3.索引会降低增删改的效率（增删改查）
索引的优势：
1提高查询效率（降低IO使用率）
2.降低CPU使用率（…order by age desc,因为 B树索引本身就是一个好排序的结构，因此在排序时可以直接使用）

2.索引分类

主键索引：不能重复。id 不能是null
唯一索引：不能重复。id 可以是null
单值索引：单列， age ;一个表可以多个单值索引,name。
复合索引：多个列构成的索引（相当于二级目录： z: zhao） (name,age) (a,b,c,d,…,n)

3.创建索引的方式

注意：如果一个字段是primary key，则改字段默认就是主键索引

方式一：create 索引类型索引名 on 表(字段)

#单值：
create index   dept_index on  tb(dept);
#唯一：
create unique index  name_index on tb(name) ;
#复合索引
create index dept_name_index on tb(dept,name);

方式二：alter table 表名索引类型索引名（字段）

#单值：
alter table tb add index dept_index(dept) ;
#唯一：
alter table tb add unique index name_index(name);
#复合索引
alter table tb add index dept_name_index(dept,name);

4.删除索引

drop index 索引名 on 表名 ;
drop index name_index on tb ;

5.查询索引

#查询索引：
show index from 表名 ;
show index from 表名 \G

五、SQL语句性能分析

分析SQL的执行计划 : explain ，可以模拟SQL优化器执行SQL语句，从而让开发人员知道自己编写的SQL状况

 explain +SQL查询语句

在这里插入图片描述
属性说明：

id : 编号
select_type ：查询类型
table ：表
type ：类型
possible_keys ：预测用到的索引
key ：实际使用的索引
key_len ：实际使用索引的长度
ref :表之间的引用
rows ：通过索引查询到的数据量
Extra :额外的信息

1.id、table以及多表查询时表的查询顺序

比如我们创建下面的三张表

create table course
(
cid int(3),
cname varchar(20),
tid int(3)
);
create table teacher
(
tid int(3),
tname varchar(20),
tcid int(3)
);

create table teacherCard
(
tcid int(3),
tcdesc varchar(200)
);

insert into course values(1,'java',1);
insert into course values(2,'html',1);
insert into course values(3,'sql',2);
insert into course values(4,'web',3);

insert into teacher values(1,'tz',1);
insert into teacher values(2,'tw',2);
insert into teacher values(3,'tl',3);

insert into teacherCard values(2,'twdesc') ;
insert into teacherCard values(3,'tldesc') ;

此时course表4条数据，teacher表3条数据，teachercard表2条数据

(1)id: id值相同（数据量越小的表越先查询）

#多表关联
explain select t.* from teacher t,course c,teacherCard tc where t.tid=c.tid and t.tcid=tc.tcid and (c.cid=2 or tc.tcid=3 );

在这里插入图片描述
其实这种情况就是看table表，从上往下的顺序！

（2）id值不同（id值越大越优先查询 (本质：在嵌套子查询时，先查内层再查外层)）

#子查询
explain select tc.tcdesc from teacherCard tc where tc.tcid = 
(select t.tcid from teacher t where  t.tid =  
	(select c.tid from course c where c.cname = 'sql')
);

在这里插入图片描述
根据上面的信息，我们知道course表的id最大，所以它最先被执行。

（3）id值有相同，又有不同（id值越大越优先；id值相同，从上往下顺序执行）

#子查询+多表
explain select t.tname ,tc.tcdesc from teacher t,teacherCard tc where t.tcid= tc.tcid and t.tid = (select c.tid from course c where cname = 'sql') ;

在这里插入图片描述

执行顺序 c->t->tc

2.select_type:查询类型

PRIMARY:包含子查询SQL中的主查询（最外层）
SUBQUERY：包含子查询SQL中的子查询（非最外层）
simple:简单查询（不包含子查询、union）
derived:衍生查询(使用到了临时表) explain select cr.cname from ( select * from course where tid in (1,2) ) cr ;
union:在from子查询中，如果有table1 union table2 ，则table1 就是derived,table2就是union ##explain select cr.cname from ( select * from course where tid = 1 union select * from course where tid = 2 ) cr ;
union result :告知开发人员，那些表之间存在union查询

3.type级别：

system > const > eq_ref > ref >（ fulltext > ref_or_null > index_merge > unique_subquery > index_subquery >） range > index > ALL

（1）system（忽略）: 只有一条数据的系统表或衍生表只有一条数据的主查询

create table test01
(
	tid int(3),
	tname varchar(20)
);

insert into test01 values(1,'a') ;
alter table test01 add constraint tid_pk primary key(tid) ;

在这里插入图片描述
注：有些版本的mysql运行是const

（2）const:仅仅能查到一条数据的SQL ,用于Primary key 或unique索引（类型与索引类型有关）

explain select tid from test01 where tid =1 ;

在这里插入图片描述

（3）eq_ref:唯一性索引：对于每个索引键的查询，返回匹配唯一行数据（有且只有1个，不能多、不能0）

alter table teacherCard add constraint pk_tcid primary key(tcid);
alter table teacher add constraint uk_tcid unique index(tcid) ;

explain select t.tcid from teacher t,teacherCard tc where t.tcid = tc.tcid ;

在这里插入图片描述
我们发现type是ref却不是eq_ref,其原因是因为teacher表比teachercard表多了一条数据

delete from teacher where tcid=1;

在这里插入图片描述

（4）ref：非唯一性索引，对于每个索引键的查询，返回匹配的所有行（0，多）

案例请看 （3） 中的错误举例！

（5）range：检索指定范围的行 ,where后面是一个范围查询(between ,> < >=, 特殊:in有时候会失效，从而转为无索引all)

在这里插入图片描述

（6）index：查询全部索引中数据

explain select tid from teacher ; --tid 是索引， 只需要扫描索引表，不需要所有表中的所有数据

在这里插入图片描述

（7）all：查询全部表中的数据

explain select cid from course ;  --cid不是索引，需要全表所有，即需要所有表中的所有数据

在这里插入图片描述

小结:

system/const: 结果只有一条数据
eq_ref:结果多条；但是每条数据是唯一的；
ref：结果多条；但是每条数据是0或多条；

4.possible_keys ：可能用到的索引，是一种预测，不准。

5.key ：实际使用到的索引

alter table  course add index cname_index (cname);

explain select tc.tcdesc from teacherCard tc,course c,teacher t 
where c.tid = t.tid and t.tcid = tc.tcid and c.cname = 'sql' ;

在这里插入图片描述

6.key_len ：索引的长度

作用：用于判断复合索引是否被完全使用（a,b,c）

create table test_kl
(
	name char(20) not null default ''
);
alter table test_kl add index index_name(name) ;
explain select * from test_kl where name ='' ;   -- key_len :60

因为在utf8中1个字符站3个字节！！！20*3=60
在这里插入图片描述
上面的name字段不能为null，那么如果字段可以为null又是怎么样的？

alter table test_kl add column name1 char(20) ;  --name1可以为null
alter table test_kl add index index_name1(name1) ;
explain select * from test_kl where name1 ='' ;

–如果索引字段可以为Null,则会使用1个字节用于标识。
在这里插入图片描述
那么联合索引又是怎么样的呢？

drop index index_name on test_kl ;
drop index index_name1 on test_kl ;
#增加一个复合索引 
alter table test_kl add index name_name1_index (name,name1) ; 

explain select * from test_kl where name1 = '' ; --121
explain select * from test_kl where name = '' ; --60

在这里插入图片描述

在讨论一下改成varchar类型又会是什么情况？

alter table test_kl add column name2 varchar(20) ; --可以为Null 
alter table test_kl add index name2_index (name2) ;

explain select * from test_kl where name2 = '' ;  --63

20*3=60 + 1(null) +2(用2个字节标识可变长度) =63
在这里插入图片描述

7. ref : 注意与type中的ref值区分

作用：指明当前表所参照的字段。
select …where a.c = b.x ;(其中b.x可以是常量，const)


alter table course  add index tid_index (tid) ;
explain select * from course c,teacher t where c.tid = t.tid  and t.tname ='tw' ;

在这里插入图片描述

8.rows: 被索引优化查询的数据个数 (实际通过索引而查询到的数据个数)

在这里插入图片描述

9.Extra

(i).using filesort ：性能消耗大；需要“额外”的一次排序（查询）。常见于 order by 语句中。

create table test02
(
	a1 char(3),
	a2 char(3),
	a3 char(3),
	index idx_a1(a1),
	index idx_a2(a2),
	index idx_a3(a3)
);
explain select * from test02 where a1 ='' order by a1 ;

explain select * from test02 where a1 =’’ order by a2 ; --using filesort
在这里插入图片描述
explain select * from test02 where a1 =’’ order by a1 ;

小结：对于单索引，如果排序和查找是同一个字段，则不会出现using filesort；如果排序和查找不是同一个字段，则会出现using filesort；
避免： where哪些字段，就order by那些字段2

复合索引（最佳左前缀）
创建一个复合索引（a,b,c）相当于创建了(a)、（a，b）、（a,b,c）三个索引

drop index idx_a1 on test02;
drop index idx_a2 on test02;
drop index idx_a3 on test02;

alter table test02 add index idx_a1_a2_a3 (a1,a2,a3) ;
explain select *from test02 where a1='' order by a3 ;  --using filesort
explain select *from test02 where a2='' order by a3 ; --using filesort
explain select *from test02 where a1='' order by a2 ;
explain select *from test02 where a2='' order by a1 ; --using filesort

小结：尽量满足 where和order by 按照复合索引的顺序使用，不要跨列或无序使用。

(ii). using temporary:性能损耗大，用到了临时表。一般出现在group by 语句中。

	explain select a1 from test02 where a1 in ('1','2','3') group by a1 ;
	explain select a1 from test02 where a1 in ('1','2','3') group by a2 ; --using temporary

在这里插入图片描述

(iii). using index :性能提升; 索引覆盖（覆盖索引）。

原因：不读取原文件，只从索引文件中获取数据（不需要回表查询）
只要使用到的列全部都在索引中，就是索引覆盖using index

explain select a1,a2 from test02 where a1='' or a2= '' ; --using index

在这里插入图片描述

(iv)using where （需要回表查询）

假设age是索引列
但查询语句select age,name from …where age =…,此语句中必须回原表查Name，因此会显示using where.
这个可以验证最佳左前缀：
按照索引 idx_a1_a2_a3 (a1,a2,a3) 最佳左前缀的性质，下面的情况不会出现using where
在这里插入图片描述

但是一旦不用左前缀的查询，或者某些情况使得索引失效(看下面的内容会有说明)，就会出现using Where，比如下面的语句，因为该表没有a2这个索引，所以会出现using where.

(v). impossible where ： where子句永远为false

explain select * from test02 where a1=‘x’ and a1=‘y’ ;
在这里插入图片描述

六.优化案例

（1）单表优化

create table book
(
	bid int(4) primary key,
	name varchar(20) not null,
	authorid int(4) not null,
	publicid int(4) not null,
	typeid int(4) not null 
);

insert into book values(1,'tjava',1,1,2) ;
insert into book values(2,'tc',2,1,2) ;
insert into book values(3,'wx',3,2,1) ;
insert into book values(4,'math',4,2,3) ;

查询authorid=1且 typeid为2或3 的bid，且按降序排序：

select bid from book where typeid in(2,3) and authorid=1  order by typeid desc ;

在这里插入图片描述
1.优化：加索引

alter table book add index idx_tab (typeid,authorid,bid); --虽然可以回表查询bid，但是将bid放到索引中 可以提升使用using index ;

在这里插入图片描述
2.再次优化（因为上面的sql有些版本运行出来是index）：思路。因为范围查询in有时会实现，因此交换索引的顺序，将typeid in(2,3) 放到最后。

	drop index idx_tab on book;
	alter table book add index idx_atb (authorid,typeid,bid);
	explain select bid from book where  authorid=1 and  typeid in(2,3) order by typeid desc ;

在这里插入图片描述

小结： a.最佳做前缀，保持索引的定义和使用的顺序一致性 b.索引需要逐步优化 c.将含In的范围查询放到where条件的最后，防止失效。
本例中同时出现了Using where（需要回原表）; Using index（不需要回原表）：原因，where authorid=1 and typeid in(2,3)中authorid在索引(authorid,typeid,bid)中，因此不需要回原表（直接在索引表中能查到）；而typeid虽然也在索引(authorid,typeid,bid)中，但是含in的范围查询已经使该typeid索引失效，因此相当于没有typeid这个索引，所以需要回原表（using where）；
例如以下没有了In，则不会出现using where
explain select bid from book where authorid=1 and typeid =3 order by typeid desc ;

还可以通过key_len证明In可以使索引失效。

（2）两表优化

create table teacher2
(
	tid int(4) primary key,
	cid int(4) not null
);

insert into teacher2 values(1,2);
insert into teacher2 values(2,1);
insert into teacher2 values(3,3);

create table course2
(
	cid int(4) ,
	cname varchar(20)
);

insert into course2 values(1,'java');
insert into course2 values(2,'python');
insert into course2 values(3,'kotlin');

比如我们要优化下面的sql

select *from teacher2 t left outer join course2 c on t.cid=c.cid where c.cname='java';

索引往哪张表加？

我们遵循小表驱动大表的原则，索引建立经常使用的字段上（本题 t.cid=c.cid可知，t.cid字段使用频繁，因此给该字段加索引） [一般情况对于左外连接，给左表加索引；右外连接，给右表加索引]
当编写 …on t.cid=c.cid 时，将数据量小的表放左边（假设此时t表数据量小）

	alter table teacher2 add index index_teacher2_cid(cid) ;
	alter table course2 add index index_course2_cname(cname);

在这里插入图片描述

(3)三张表优化A B C

a.小表驱动大表  b.索引建立在经常查询的字段上

7.联合索引小结

create table test03
(
  a1 int(4) not null,
  a2 int(4) not null,
  a3 int(4) not null,
  a4 int(4) not null
);

alter table test03 add index idx_a1_a2_a3_4(a1,a2,a3,a4) ;

结果分析：

	explain select a1,a2,a3,a4 from test03 where a1=1 and a2=2 and a3=3 and a4 =4 ; --推荐写法，因为 索引的使用顺序（where后面的顺序） 和 复合索引的顺序一致

	explain select a1,a2,a3,a4 from test03 where a4=1 and a3=2 and a2=3 and a1 =4 ; --虽然编写的顺序 和索引顺序不一致，但是 sql在真正执行前 经过了SQL优化器的调整，结果与上条SQL是一致的。
	--以上 2个SQL，使用了 全部的复合索引

	explain select a1,a2,a3,a4 from test03 where a1=1 and a2=2 and a4=4 order by a3; 
	--以上SQL用到了a1 a2两个索引，该两个字段 不需要回表查询using index ;而a4因为跨列使用，造成了该索引失效，需要回表查询 因此是using where；以上可以通过 key_len进行验证

	explain select a1,a2,a3,a4 from test03 where a1=1 and a4=4 order by a3; 
	--以上SQL出现了 using filesort(文件内排序，“多了一次额外的查找/排序”) ：不要跨列使用( where和order by 拼起来，不要跨列使用)


	explain select a1,a2,a3,a4 from test03 where a1=1 and a4=4 order by a2 , a3; --不会using filesort

–总结：i.如果 (a,b,c,d)复合索引和使用的顺序全部一致(且不跨列使用)，则复合索引全部使用。如果部分一致(且不跨列使用)，则使用部分索引。

8.避免索引失效的一些原则

(1)复合索引

b.复合索引，尽量使用全索引匹配

（2）不要在索引上进行任何操作（计算、函数、类型转换），否则索引失效

select …where A.x = … ; --假设A.x是索引
不要：select …where A.x3 = … ;
explain select * from book where authorid = 1 and typeid = 2 ;–用到了at2个索引
explain select * from book where authorid = 1 and typeid2 = 2 ;–用到了a1个索引
explain select * from book where authorid2 = 1 and typeid2 = 2 ;----用到了0个索引
explain select * from book where authorid*2 = 1 and typeid = 2 ;----用到了0个索引,原因：对于复合索引，如果左边失效，右侧全部失效。(a,b,c)，例如如果 b失效，则b c同时失效。

 		drop index idx_atb on book ; 
		alter table book add index idx_authroid (authorid) ;
		alter table book add index idx_typeid (typeid) ;
		explain select * from book where authorid*2 = 1 and typeid = 2 ;

在这里插入图片描述

（3）复合索引不能使用不等于（!= <>）或is null (is not null)，否则自身以及右侧所有全部失效。

复合索引中如果有>，则自身和右侧索引全部失效。

explain select * from book where authorid = 1 and typeid =2 ;

键盘歌唱家

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

SQL优化（以Mysql为例Oracle可参考）

一、SQL优化的原因

二、SQL语句的分析

三、SQL优化之引擎

四、SQL优化之索引

1.索引的描述

2.索引分类

3.创建索引的方式

4.删除索引

5.查询索引

五、SQL语句性能分析

1.id、table以及多表查询时表的查询顺序

(1)id: id值相同（数据量越小的表越先查询）

（2）id值不同（id值越大越优先查询 (本质：在嵌套子查询时，先查内层 再查外层)）

（3）id值有相同，又有不同（id值越大越优先；id值相同，从上往下 顺序执行）

2.select_type:查询类型

3.type级别：

（1）system（忽略）: 只有一条数据的系统表 或 衍生表只有一条数据的主查询

（2）const:仅仅能查到一条数据的SQL ,用于Primary key 或unique索引 （类型 与索引类型有关）

（3）eq_ref:唯一性索引：对于每个索引键的查询，返回匹配唯一行数据（有且只有1个，不能多 、不能0）

（4）ref：非唯一性索引，对于每个索引键的查询，返回匹配的所有行（0，多）

（5）range：检索指定范围的行 ,where后面是一个范围查询(between ,> < >=, 特殊:in有时候会失效 ，从而转为 无索引all)

（6）index：查询全部索引中数据

（7）all：查询全部表中的数据

小结:

4.possible_keys ：可能用到的索引，是一种预测，不准。

5.key ：实际使用到的索引

6.key_len ：索引的长度

7. ref : 注意与type中的ref值区分

8.rows: 被索引优化查询的 数据个数 (实际通过索引而查询到的 数据个数)

9.Extra

(i).using filesort ： 性能消耗大；需要“额外”的一次排序（查询） 。常见于 order by 语句中。

(ii). using temporary:性能损耗大 ，用到了临时表。一般出现在group by 语句中。

(iii). using index :性能提升; 索引覆盖（覆盖索引）。

(iv)using where （需要回表查询）

(v). impossible where ： where子句永远为false

六.优化案例

（1）单表优化

（2）两表优化

(3)三张表优化A B C

7.联合索引小结

8.避免索引失效的一些原则

(1)复合索引

（2）不要在索引上进行任何操作（计算、函数、类型转换），否则索引失效

（3）复合索引不能使用不等于（!= <>）或is null (is not null)，否则自身以及右侧所有全部失效。

（2）id值不同（id值越大越优先查询 (本质：在嵌套子查询时，先查内层再查外层)）

（3）id值有相同，又有不同（id值越大越优先；id值相同，从上往下顺序执行）

（1）system（忽略）: 只有一条数据的系统表或衍生表只有一条数据的主查询

（2）const:仅仅能查到一条数据的SQL ,用于Primary key 或unique索引（类型与索引类型有关）

（3）eq_ref:唯一性索引：对于每个索引键的查询，返回匹配唯一行数据（有且只有1个，不能多、不能0）

（5）range：检索指定范围的行 ,where后面是一个范围查询(between ,> < >=, 特殊:in有时候会失效，从而转为无索引all)

8.rows: 被索引优化查询的数据个数 (实际通过索引而查询到的数据个数)

(i).using filesort ：性能消耗大；需要“额外”的一次排序（查询）。常见于 order by 语句中。

(ii). using temporary:性能损耗大，用到了临时表。一般出现在group by 语句中。