索引及执行计划管理

最新推荐文章于 2023-05-06 09:37:44 发布

可惜我不是圣人

最新推荐文章于 2023-05-06 09:37:44 发布

阅读量149

点赞数

本文链接：https://blog.csdn.net/weixin_44695383/article/details/103742359

版权

索引及执行计划管理

1. 索引的作用

类似于一本书的目录,起到优化查询的功能.

2. 索引类型 (笔试)

BTREE索引   *****
RTREE索引
HASH 索引
全文 索引

3. BTREE的细分类(算法)

B-TREE
B+TREE
B*TREE

4. Btree索引的功能分类

辅助索引(二级索引) *****
聚集索引(集群索引)

5. B树是如何构建的

辅助索引

alter table t1 add index idx_name(name);
1. 将name列的所有值取出来,进行自动排序
2. 将排完序的值均匀的落到16KB叶子节点数据页中,
并将索引键值所对应的数据行的聚集索引列值
3. 向上生成枝节点和根节点

聚集索引

1. 默认是按照主键生成聚集索引.没有主键,存储引擎会使用唯一键.
   如果都没有,会自动生成隐藏的聚集索引.
2. 数据在存储是,就会按照聚集索引的顺序存储到磁盘的数据页.
3. 由于本身数据就是有序的,所以聚集索引构建时,不需要进行排序.
4. 聚集索引直接将磁盘的数据页,作为叶子节点.
5. 枝节点和根节点只会调取下层节点主键的最小值

6. 辅助索引细分

6.1 单列辅助索引

select * from t1  where name=''

6.2 联合索引

select * from t1  where  a   and    b  and c

6.2 唯一索引

唯一索引要求比较高，值必须唯一才可以创建。
如何检查值是否是唯一的？对要检查的列是用group by 分组，使用count检查出现次数；

7. 索引树的高度(越低越好)

7.1 表的数据量级大

分区表(自己了解下)
分库分表(分布式架构)

7.2 索引键值的长度

(1) 尽可能选择列值短的列创建索引
(2) 采用前缀索引

7.3 数据类型选择(选择合适)

varchar 和 char 
enum

8. 索引管理

8.0 压力测试准备

mysqlslap --defaults-file=/etc/my.cnf --concurrency=100 --iterations=1 --create-schema='test' --query="select * from test.t100w where k2='VWtu'" engine=innodb --number-of-queries=2000 -uroot -p123456 -verbose

8.1 索引命令操作

-- 查询索引

oldguo[school]>use school
oldguo[school]>desc student;
Key : PRI (主键) ,UNI(唯一索引),MUL(辅助索引)
oldguo[school]>show index from student\G

-- 创建单列索引

oldguo[school]>alter table student add index idx_name(sname);

-- 创建联合索引

oldguo[school]>alter table student add index idx_sname_sage_ssex(sname,sage,ssex); 
idx_a_b_c -----> idx_a   idx_a_b   idx_a_b_c

-- 创建前缀索引

oldguo[school]>alter table student add index idx(sname(5));

-- 创建唯一索引

oldguo[school]>alter table student add telnum char(11) not null ;
oldguo[school]>alter table student add unique index idx_tel(telnum);

-- 删除索引

oldguo[school]>alter table student drop index idx;
oldguo[school]>alter table student drop index idx_name;
oldguo[school]>alter table student drop index idx_sname_sage_ssex;

9. explain(desc) *****

explain select * from test.t100w where k2='VWtu'
或者:
desc select * from test.t100w where k2='VWtu'
作用: 抓取优化器优化过的执行计划

9.1 执行计划的分析

oldguo[test]>explain  select * from test.t100w where k2='VWtu';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | t100w | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 997511 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

重点关注:

table : 以上SQL语句设计到的表 ***
type : 查询的类型(全表扫描(ALL),索引扫描,查不到数据(NULL)) *****
possible_keys : 可能会用到的索引 ***
key : 使用到的索引 ****
key_len : 索引的覆盖长度 *****
Extra : 额外的信息 ****

9.2 type 详细说明 *****

9.2.1 ALL : 全表扫描 , 不会走任何索引

(1) 查询条件,没建索引

oldguo[test]>explain  select * from test.t100w where k2='VWtu'
oldguo[test]>desc t100w;
+-------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type      | Null | Key | Default           | Extra                       |
+-------+-----------+------+-----+-------------------+-----------------------------+
| id    | int(11)   | YES  |     | NULL              |                             |
| num   | int(11)   | YES  |     | NULL              |                             |
| k1    | char(2)   | YES  |     | NULL              |                             |
| k2    | char(4)   | YES  |     | NULL              |                             |
| dt    | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------+-----------+------+-----+-------------------+-----------------------------+

(2) 有索引不走

desc select * from t100w where k2 != 'asdf';
desc select * from t100w where k2 like '%aa%';
desc select * from t100w where k2 not in ('asda','asas');
desc select * from t100w;
注意: !=和not in 如果是主键列,是走range.

9.2.2 index 全索引扫描

oldguo[test]>desc select k2 from t100w;

================从range开始,我们才认为索引是有价值的==============

9.2.3 range 索引范围查询

所有索引:

> ,<, >=, <= ,like , between and
oldguo[world]>desc select * from city where id<10;
oldguo[world]>desc select * from city where countrycode like 'CH%'
in , or 
oldguo[world]>desc select * from city where countrycode in ('CHN','USA)

聚集索引:

!= ,not in 
oldguo[test]>desc select * from world.city where id != 10;
oldguo[test]>desc select * from world.city where id not in (10,20);
说明:
B+tree 索引能额外优化到:> ,<, >=, <= ,like , between and
in 和 or 享受不到b+tree额外的优化效果的,所以我一般情况会将in , or 进行改写
desc select * from city where countrycode = 'CHN' union all  select * from city where countrycode = 'USA';

ref : 辅助索引等值查询

oldguo[world]>desc select * from city where countrycode ='CHN';

eq_ref :

多表连接查询中,非驱动表on的条件列是主键或者唯一键
oldguo[world]>desc select a.name,b.name from city as a join country as b on a.countrycode=b.code where a.population<100;

const(system) 主键或唯一键的等值
oldguo[world]>desc select * from city where id=10;

NULL : 获取不到数据

oldguo[world]>desc select * from city where id=100000000000000;

9.3 possible_keys : 可能会用到的索引 ***

NULL : 没有和查询条件匹配的索引条目
有值 : 有和查询条件匹配的索引条目,但是没走,大部分原因是语句查询方式不符合索引应用条件

9.4 key : 使用到的索引 ****

最终使用的索引,可以帮助我们判断是否走了合适的索引.

9.5 key_len : 索引的覆盖长度 *****

在联合索引应用的判断时,会经常看
对于单列索引:

字符集 
                     not null        没有指定not null
int     :  4个字节   4                 4+1

tinyint :  1个字节   1                2

utf8mb4 : 一个字符最大是4个字节
char(2)              2*4               2*4+1
varchar(2)           2*4+2             2*4+2+1

说明:

有非空约束时,key_length就是最大字节长度
在没有非空约束时: 字符最大长度+1
varchar类型,需要额外在最大字符长度+2(存储字符长度的最长度占位)

create table t1 (
id int ,
num int not null, 
k1 char(2),
k2 char(2) not null,
k3 varchar(2),
k4 varchar(2) not null)charset utf8mb4;

oldguo[world]>alater table t1 add index idx_id(id);
oldguo[world]>desc select * from t1 where id=1;

oldguo[world]>desc select * from t1 where num=1;
oldguo[world]>alter table t1 add index idx_num(num);

oldguo[world]>alter table t1 add index idx_k1(k1);
oldguo[world]>alter table t1 add index idx_k2(k2);
oldguo[world]>alter table t1 add index idx_k3(k3);
oldguo[world]>alter table t1 add index idx_k4(k4);

联合索引优化细节:

oldguo[world]>desc t1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id    | int(11)    | YES  |     | NULL    |       |
| num   | int(11)    | NO   |     | NULL    |       |
| k1    | char(2)    | YES  |     | NULL    |       |
| k2    | char(2)    | NO   |     | NULL    |       |
| k3    | varchar(2) | YES  |     | NULL    |       |
| k4    | varchar(2) | NO   |     | NULL    |       |
-----------------------------------------------------
oldguo[world]>alter table t1 add index idx(id,num,k1,k2,k3,k4);
oldguo[world]>select 5+4+9+8+11+10 ;

(1) 最完美的查询情况

idx(id,num,k1,k2,k3,k4)
oldguo[world]>desc select * from t1 where id=1 and num=1 and k1='a' and k2='a' and k3='a' and k4='a';
oldguo[world]>desc select * from t1 where num=1  and id=1 and k1='a' and k2='a' and k3='a' and k4='a';

结论:
当查询条件中,包含了索引列中所有的列条件,并且都是等值的查询,那么无关排列顺序,都可以走全联合索引优化;
原因是优化器会自动调整顺序,达到最佳的优化效果.
所以,我们重点需要关注的是联合索引建立的顺序,从左到右,唯一值越多的列放在最左边

(2)查询条件中,哪些因素会key_len长度
-- 按照索引的建立顺序,在查询条件中,少了任意一个中间列,后续列都无法走索引

oldguo[world]>desc select * from t1 where  id=1 and k1='a' and k2='a' and k3='a' and k4='a';

-- 在条件查询中间,出现不等值查询时,从不等值列开始,所有列都无法使用联合索引 (暂存)

oldguo[world]>desc select * from t1 where  id=1 and num>10 and k1='a' and k2='a' and k3='a' and k4='a';

优化方法:
将不等值列放在最后.

oldguo[world]>alter table t1 add index idx1(id,k1,k2,k3,k4,num);
oldguo[world]>alter table t1 drop index idx;

(3) 如果有多子句的条件查询(必须是联合索引)
按照子句的执行顺序,建立联合索引.

9.6 Extra : 额外的信息 ****

Using filesort : 原因是在 group by ,order by,distinct等.
一般优化的方法是,和where条件的列建立联合索引

建立索引的原则（DBA运维规范）

1.1 建表时一定要有主键,一般是个无关列

略.回顾一下,聚集索引结构.

1.2 选择唯一性索引

唯一性索引的值是唯一的，可以更快速的通过该索引来确定某条记录。
例如，学生表中学号是具有唯一性的字段。为该字段建立唯一性索引可以很快的确定某个学生的信息。
如果使用姓名的话，可能存在同名现象，从而降低查询速度。

优化方案:

(1) 如果非得使用重复值较多的列作为查询条件(例如:男女),可以将表逻辑拆分

(2) 可以将此列和其他的查询类,做联和索引

select count(*) from world.city;
select count(distinct countrycode) from world.city;
select count(distinct countrycode,population ) from world.city;

1.3(必须的) 为经常需要where 、ORDER BY、GROUP BY,join on等操作的字段

排序操作会浪费很多时间。

where  A B C      ----》 A  B  C
in 
where A  group by B    
where A  order by C

如果为其建立索引，优化查询
注：如果经常作为条件的列，重复值特别多，可以建立联合索引。

1.4 尽量使用前缀来索引

如果索引字段的值很长，最好使用值的前缀来索引。

1.5 限制索引的数目

索引的数目不是越多越好。

可能会产生的问题:

(1) 每个索引都需要占用磁盘空间，索引越多，需要的磁盘空间就越大。
(2) 修改表时，对索引的重构和更新很麻烦。越多的索引，会使更新表变得很浪费时间。
(3) 优化器的负担会很重,有可能会影响到优化器的选择.
percona-toolkit中有个工具,专门分析索引是否有用

1.6 删除不再使用或者很少使用的索引(percona toolkit)

pt-duplicate-key-checker
表中的数据被大量更新，或者数据的使用方式被改变后，原有的一些索引可能不再需要。数据库管理
员应当定期找出这些索引，将它们删除，从而减少索引对更新操作的影响。

1.7 大表加索引,要在业务不繁忙期间操

pt-osc (了解一下)

1.8 尽量少在经常更新值的列上建索引

1.9 建索引原则

(1) 必须要有主键,如果没有可以做为主键条件的列,创建无关列
(2) 经常做为where条件列  order by  group by  join on, distinct 的条件(业务:产品功能+用户行为)
(3) 最好使用唯一值多的列作为索引,如果索引列重复值较多,可以考虑使用联合索引
(4) 列值长度较长的索引列,我们建议使用前缀索引.
(5) 降低索引条目,一方面不要创建没用索引,不常使用的索引清理,percona toolkit(xxxxx)
(6) 索引维护要避开业务繁忙期,pt-osc

面试题:有索引,为什么查询效率还很低?

1. 有索引不走
2. 联合索引没有完全覆盖
3. 索引失效或者统计信息不真实
4. 索引重复值太多

我一般是这样来拍查和解决此类问题的:

1）. 先看这个语句的执行计划

table 
type 
key_len  判断 联合索引应用长度 
extra    额外排序
rows     重复值 ------>

2 没索引建索引,改语句

1. 主键
2. where group by  order by  distinct join on 
3. 联合最左侧,唯一值
4. 尽量使用前缀索引 
5. 索引条目
6. 频繁更新的列,不适合做索引列,
7. 避开业务繁忙期,pt-tools

2 不走索引的情况（开发规范）

2.1 没有查询条件，或者查询条件没有建立索引

select * from tab;       全表扫描。
select  * from tab where 1=1;

在业务数据库中，特别是数据量比较大的表。
是没有全表扫描这种需求。
1、对用户查看是非常痛苦的。
2、对服务器来讲毁灭性的。

（1）select * from tab;SQL改写成以下语句：

select  * from  tab  order by  price  limit 10 ;    需要在price列上建立索引

（2）

select  * from  tab where name='zhangsan'          name列没有索引

改：

1、换成有索引的列作为查询条件
2、将name列建立索引

2.2 查询结果集是原表中的大部分数据，应该是25％以上。

查询的结果集，超过了总数行数25%，优化器觉得就没有必要走索引了。

假如：tab表 id，name id:1-100w ，id列有(辅助)索引

select * from tab  where id>500000 and id<500100;

如果业务允许，可以使用limit控制。
怎么改写？
结合业务判断，有没有更好的方式。如果没有更好的改写方案
尽量不要在mysql存放这个数据了。放到redis里面。

2.3 索引本身失效，统计数据不真实

索引有自我维护的能力。
对于表内容变化比较频繁的情况下，有可能会出现索引失效。
一般是删除重建
现象:
有一条select语句平常查询时很快,突然有一天很慢,会是什么原因
select?  --->索引失效,，统计数据不真实
DML ?   --->锁冲突
oldguo[world]>optimize table city;

optimize table city 重建索引

2.4 查询条件使用函数在索引列上，或者对索引列进行运算，运算包括(+，-，*，/，! 等)

例子：
错误的例子：select * from test where id-1=9;
正确的例子：select * from test where id=10;
算术运算
函数运算
子查询

2.5 隐式转换导致索引失效.这一点应当引起重视.也是开发中经常会犯的错误.

这样会导致索引失效. 错误的例子：

mysql> alter table tab add index inx_tel(telnum);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql>
mysql> desc tab;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id    | int(11)    | YES  |    | NULL    |      |
| name  | varchar(20) | YES  |    | NULL    |      |
| telnum | varchar(20) | YES  | MUL | NULL    |      |
+--------+-------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
mysql> select * from tab where telnum='1333333';
+------+------+---------+
| id  | name | telnum  |
+------+------+---------+
|    1 | a    | 1333333 |
+------+------+---------+
1 row in set (0.00 sec)
mysql> select * from tab where telnum=1333333;
+------+------+---------+
| id  | name | telnum  |
+------+------+---------+
|    1 | a    | 1333333 |
+------+------+---------+
1 row in set (0.00 sec)
mysql> explain  select * from tab where telnum='1333333';
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key    | key_len | ref  | rows | Extra                |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+

|  1 | SIMPLE      | tab  | ref  | inx_tel      | inx_tel | 63      | const |    1 | Using index condition |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)
mysql> explain  select * from tab where telnum=1333333;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra      |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tab  | ALL  | inx_tel      | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain  select * from tab where telnum=1555555;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra      |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | tab  | ALL  | inx_tel      | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain  select * from tab where telnum='1555555';
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+
| id | select_type | table | type | possible_keys | key    | key_len | ref  | rows | Extra                |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | tab  | ref  | inx_tel      | inx_tel | 63      | const |    1 | Using index condition |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-----------------------+
1 row in set (0.00 sec)
mysql>

建表定义的数据类型varchar(20) 那么查询时就要使用字符串的条件查询，如where telnum='1111'，如果不加引号会导致不走索引

2.6 <> ，not in 不走索引（辅助索引）

EXPLAIN  SELECT * FROM teltab WHERE telnum  <> '110';
EXPLAIN  SELECT * FROM teltab WHERE telnum  NOT IN ('110','119');

mysql> select * from tab where telnum <> '1555555';
+------+------+---------+
| id  | name | telnum  |
+------+------+---------+
|    1 | a    | 1333333 |
+------+------+---------+
1 row in set (0.00 sec)
mysql> explain select * from tab where telnum <> '1555555';

单独的>,<,in 有可能走，也有可能不走，和结果集有关，尽量结合业务添加limit

or或in 尽量改成union

EXPLAIN  SELECT * FROM teltab WHERE telnum  IN ('110','119');
改写成：
EXPLAIN SELECT * FROM teltab WHERE telnum='110'
UNION ALL
SELECT * FROM teltab WHERE telnum='119'

2.7 like "%_" 百分号在最前面不走索引

EXPLAIN SELECT * FROM teltab WHERE telnum LIKE '31%'  走range索引扫描
EXPLAIN SELECT * FROM teltab WHERE telnum LIKE '%110'  不走索引

%linux%类的搜索需求，可以使用elasticsearch+mongodb 专门做搜索服务的数据库产品

3. 小结:

建索引规范:

1. 主键
2. where group by  order by  distinct join on 
3. 联合最左侧,唯一值
4. 尽量使用前缀索引 
5. 索引条目
6. 频繁更新的列,不适合做索引列,
7. 避开业务繁忙期,pt-tools

不走索引:

1. 本来全表扫描
2. 25% 
3. 计算或函数
4. 隐式转换
5. like %xx%
6. != ,not in 
7. 索引失效,统计信息不真实

可惜我不是圣人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫