第四章节索引及执行计划

1. 介绍

相当于一本书中的目录,可以加速查询(select ,update,delete ).

2. 种类

Btree (平衡多叉树): b-tree b+tree(b*tree),优点:范围查找
HASH : 优点,比较适合随机的等值.
Rtree

3. Btree的细分

聚簇索引 : 主键索引
辅助索引 :
单列
联合
唯一
前缀

4. 索引的管理

4.0 索引建立之前压测:

mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='test' \
--query="select * from test.t100w where k2='VWlm'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

4.1 查询索引

a. desc city;
PRI : 主键索引 
MUL : 普通索引
UNI : 唯一索引

b. show index from city;
Table         表名
Key_name      索引名
Column_name   列名
Cardinality   基数(选择度),位置值的多少

Cardinality   建立索引之前,基数如何计算的?  
select count(distinct countrycode) from city;

4.2 创建

alter table city add index i_name(name);
alter table city add index i_d_p(distinct,population);
alter table city add index i_x(name(10));
alter table t1 add unique index i_a(a);

4.3 删除

alter table city  drop index i_name;

4.4 8.0新特性–> invisible index 不可见索引

mysql> alter table city alter index idx_name invisible;

SELECT INDEX_NAME, IS_VISIBLE
       FROM INFORMATION_SCHEMA.STATISTICS
       WHERE TABLE_SCHEMA = 'world' AND table_name='city';

5. B+tree查找算法介绍

5.1 平衡

不管查找哪个数,需要查找次数理论上是相同的.对于一个三层b树来讲,理论上查找每个值都是三次IO.

5.2 擅长范围查找

讲究快速锁定范围.
B+tree,加入了双向指针(头尾相接),进一步增强范围查找.减少对于ROOT和NON-LEAF的访问次数.

5.3 构建过程

叶子: 先将数据排序,生成叶子节点.
枝 : 保存叶子节点的范围(>=1 <5)+指针(→)
根 : 保存枝节点范围+指针
叶子节点和枝节点都有双向指针.

6. MySQL中如何应用B+TREE ******

6.1 名词解释

区(簇)extent : 连续的64pages ,默认是1M存储空间.
page页 : 16KB大小,MySQL中最小的IO单元.

6.2 IOT 组织表

数据应该按照索引结构有序(顺序)组织和存储数据.
MySQL使用聚簇索引组织存储数据.

6.3 聚簇(区)索引

6.3.1 构建条件

a. 如果表中有主键,主键就被作为聚簇索引.
b. 没有主键,第一个不为空的唯一键.
c. 什么都没有,自动生成一个6字节的隐藏列,作为聚簇索引.
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

6.3.2 如何形成B树结构.

叶子节点: 聚簇索引组织表,存数据时,已经是按照ID列有序存储数据到各个连续的数据页中.原表数据存储结构就是叶子节点.
枝节点 : 叶子节点中ID范围+指针
根节点 : 枝节点的ID范围+指针

6.3.3 优化了哪些查询?

只能优化基于ID作为条件.索引单纯使用ID列查询,很局限.

6.4 辅助索引

6.4.1 构建条件

需要人为按照需求创建辅助索引.

6.4.2 如何形成B树结构

alter table t1 add index idx(name);

叶子节点 : 将辅助索引列值(name)+ID提取出来,按照辅助索引列值从小到大排序,存储到各个page中,生成叶子节点.
枝节点 : 存储了叶子节点中,name列范围+指针.
根节点 : 枝节点的name的范围+指针.

6.4.3 优化了哪些查询?

如果查询条件使用了name列,都会先扫描辅助索引,获得ID,再回到聚簇索引(回表),按照ID进行聚簇索引扫描,最终获取到数据行.
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html

6.5 联合辅助索引结构

6.5.1 构建过程

alter  table t1 add index idx_n_g(a,b)

叶子节点 : 提取a+b+id列值,按照a,b联合排序(从小到大),生成叶子节点.
枝节点 : 叶子节点最左列范围+指针
根节点 : 枝节点的范围+指针.

6.5.2 优化了哪些查询?

查询条件中必须包含最左列条件(a),先通过 a条件扫描联合索引的根节点和枝节点,从而得到叶子节点范围.再拿b作为条件过滤一次.
最终目的,得到更精确的ID .理论上减少回表的次数.

6.5.3 最左原则

建立联合索引时,选择基数大(重复值少)作为最左列.
查询条件中必须要包含最左列条件.

7. 索引树高度影响因素

一般建议3-4层为佳,3层b树,2000w+.

a. 数据行多

分区表.
定期归档: 一般按照时间字段,定期归档到历史库中. pt-archiver.
分库分表:分布式

b. 索引列长度过长

前缀索引.

c. 数据类型

足够
简短
合适

8. 回表问题

8.1 回表是什么?

辅助索引扫描之后,得到ID,再回到聚簇索引查找的过程.

8.2 回表会带来什么问题?

IO : 次数和量会增加.
IOPS : 1000次/s
吞吐量 : 300M/s

8.3 怎么减少回表

a. 建索引使用联合索引(覆盖),尽可能多将查询条件的数据包含联合索引中.
b. 精细查询条件(业务方面,> and < ,limit)
c. 查询条件要符合联合索引规则,覆盖的列越多越好.

9. 扩展项: 索引自优化AHI(自适应hash索引)\change buffer

AHI : 索引的索引. 为内存中的热点索引页,做了一个HASH索引表,能够快速找到需要的索引页地址.
https://dev.mysql.com/doc/refman/8.0/en/innodb-adaptive-hash.html
https://dev.mysql.com/doc/refman/8.0/en/innodb-architecture.html

change buffer :
对于辅助索引的变化,不会立即更新到索引中.暂存至change buffer .
https://dev.mysql.com/doc/refman/8.0/en/innodb-change-buffer.html

10. 分析执行计划

10.1 是什么?

优化器(算法)最终得出的,代价最低的,SQL语句的执行方案.

10.2 为什么要分析执行计划?

场景一: 分析比较慢的语句.
场景二: 上线新业务,可能会包含很多select update delete…,提前发现问题.

10.3 如何抓取执行计划

a. 抓取目标

select  update  delete

b. 方法

mysql> desc   select * from world.city where countrycode='CHN';
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | city  | NULL       | ref  | CountryCode   | CountryCode | 3       | const |  363 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+-------+

10.4 如何分析执行计划

table : 操作的表
type : 操作类型(全表\索引) ,ALL index range ref eq_ref const(system)
possible_keys : 有可能用的索引
key : 真正要用是哪索引
key_len: 索引覆盖长度(联合索引)
rows : 预估需要扫描的行数
Extra : using where using index using index condition using filesort sort using temp

10.5 type 详解

a. ALL 全表扫描

mysql> explain   select * from world.city ;
mysql> explain   select * from world.city where countrycode !='chn';
mysql> explain   select * from world.city where countrycode like '%hn%';
mysql> explain   select * from world.city where countrycode not in ('chn','usa');

b. index 全索引扫描

需要扫描整颗索引树,才能得到想要的结果. 
desc select id ,countrycode from world.city;

c. range 索引范围

是我们应用索引优化的底线,也是应用最多的.
mysql> desc select * from city where id<10;
mysql> desc select * from city where countrycode like 'ch%';
mysql> desc select * from city where countrycode in ('CHN','USA');

SQL 改写为:  
desc 
select * from city where countrycode='CHN' union  all  select * from city where countrycode='USA'


mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='world' \
--query=" select * from city where countrycode in ('CHN','USA')" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='world' \
--query="select * from city where countrycode='CHN' union  all  select * from city where countrycode='USA'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

小经验:   
    索引列基数多少 +  压测结果,最终评估是否需要使用union .

d. ref : 辅助索引等值查询

mysql> desc  select * from city where countrycode='CHN';

e. eq_ref : 非驱动表,连接条件是主键或唯一键.

mysql> desc select a.name, b.name,a.countrycode,a.population  
from city  as a 
join country as b 
on a.countrycode=b.code where a.population<100;


补充: 
    多表连接时,小结果集的表驱动大表.
    优化会自动判断查询语句中的谁做为驱动表更合适.有可能会出现选择错误.
    我们可以通过left join 强制驱动表干预执行计划.

彩蛋：如何判断一条连接语句中,谁是驱动表?

1. 优化器的判断 
mysql> desc  select * from city  join country on city.countrycode=country.code where city.population<100000   and country.SurfaceArea>10000000;
    
2. 人为判断 
mysql> select count(*) from  country where country.SurfaceArea>10000000;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from city where city.population<100000 ;
+----------+
| count(*) |
+----------+
|      517 |
+----------+    
补充: 如果 where后的列中都有索引,会选择结果集小的作为驱动表.
3. 压测.

f. const(system)

mysql> desc select * from city where id=1;

10.6 key_len详解

a. 介绍

(联合)索引覆盖长度 
idx(a,b,c)   ----> a (10)  b(20)  c(30)

b. 如何计算索引列的key_len

key_len 和每个列的最大预留长度(字节)有关.

数据类型    utf8mb4       没有 not null 
tinyint      1              1
int          4              1
char(10)     4*10           1
varchar(10)  4*10+2         1

c. 联合索引应用细节

idx(a,b,c)    ---->   a  ab   abc 
完全覆盖 : 
    a=  and b=   and c=
    a=  and c=  and  b=  等值打乱顺序的
    a=  and b=  and  c范围 
    a=  anb b字符范围 and c=
        
部分覆盖 
    a=  and b= 
    a= 
    a=  and c=
    a=  anb b数字范围 and c=
完全不覆盖  bc  --> bc  b   
    b  
    c
    bc
    cb 

优化案例: 
    idx(k1,num,k2)
    1. mysql> desc select * from t100w where k1='Vs' and num<27779  and k2='mnij' 
    
优化方案: 修改索引为idx(k1,k2,num)

10.7 extra

using index 使用了索引覆盖扫描
using where 使用where回表扫描数据行,说明目标表的索引没有设计好.
a. table ----> 获取到出问题的表
b. 看原始查询语句中的where条件列
c. 查询列的索引情况-----> show index from t1;
d. 按需优化索引.
using filesort 使用了额外排序.
a. table ---->获取到出问题的表
b. 查看原始语句中的: order by group by distinct
c. 查看列的索引情况
d. 按需优化索引.

优化案例: 
mysql> desc select *from city where countrycode='CHN' order by population;
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+
|  1 | SIMPLE      | city  | NULL       | ref  | CountryCode   | CountryCode | 3       | const |  363 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+
mysql> show index from city;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| city  |          0 | PRIMARY     |            1 | ID          | A         |        4188 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| city  |          1 | CountryCode |            1 | CountryCode | A         |         232 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+    
mysql> alter table city add index idx(population);
    
mysql> desc select *from city where countrycode='CHN' order by population;
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key         | key_len | ref   | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+
|  1 | SIMPLE      | city  | NULL       | ref  | CountryCode   | CountryCode | 3       | const |  363 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+----------------+ 
mysql> alter table city add index idx_c_p(countrycode,population);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc select *from city where countrycode='CHN' order by population;
+----+-------------+-------+------------+------+---------------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys       | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | city  | NULL       | ref  | CountryCode,idx_c_p | idx_c_p | 3       | const |  363 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------------+---------+---------+-------+------+----------+-------+    

查看冗余索引
mysql> select table_schema,table_name , redundant_index_name , redundant_index_columns  from sys.schema_redundant_indexes;  
    
    
using temp   --->  
a. 条件范围是不是过大. 
b. having  order by 额外排序
c. 子查询

大几率开发需要改写语句了.

11. 扩展项: 关于索引的优化器算法:ICP \ MRR

11.1 ICP : Index Condition Pushdown

优化器算法: 
    a. 查询优化器算法:
    mysql> select @@optimizer_switch;
    b. 设置优化器算法: 
    mysql> set global optimizer_switch='index_condition_pushdown=off';
    hits方式: 
    https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html
    配置文件: 
    my.cnf
        
例子 : 
mysql> set global optimizer_switch='index_condition_pushdown=off';

mysql> desc select * from t100w where k1='Vs'  and num<27779 and k2='mnij';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t100w | NULL       | range | idx           | idx  | 14      | NULL |   29 |    10.00 | Using where |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> set global optimizer_switch='index_condition_pushdown=on';

mysql> desc select * from t100w where k1='Vs'  and num<27779 and k2='mnij';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | t100w | NULL       | range | idx           | idx  | 14      | NULL |   29 |    10.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

压测:
    a.  开ICP 2000次语句压测  索引顺序不调整
mysqlslap --defaults-file=/etc/my.cnf \
--concurrency=100 --iterations=1 --create-schema='test' \
--query=" select * from t100w where k1='Vs'  and num<27779 and k2='mnij'" engine=innodb \
--number-of-queries=2000 -uroot -p123 -verbose

4.580 seconds
4.569 seconds
4.431 seconds
4.433 seconds
4.391 seconds

b.  关 ICP 2000次语句压测  索引顺序不调整
5.327 
5.516
5.267
5.330
5.293 seconds

c. 索引顺序优化 压测 
4.251
4.143

11.2 MRR

https://dev.mysql.com/doc/refman/8.0/en/mrr-optimization.html

12. 索引应用规范

12.1 建立索引的原则（DBA运维规范）

(1) 必须要有主键,业务无关列。
(2) 经常做为where条件列 order by group by join on, distinct 的条件(业务:产品功能+用户行为)
(3) 最好使用唯一值多的列作为索引列,如果索引列重复值较多,可以考虑使用联合索引
(4) 列值长度较长的索引列,我们建议使用前缀索引.
(5) 降低索引条目,一方面不要创建没用索引,不常使用的索引清理,percona toolkit(xxxxx)
(6) 索引维护要避开业务繁忙期，建议用pt-osc。
(7) 联合索引最左原则

12.2 不走索引的情况（开发规范）

12.2.1 没有查询条件，或者查询条件没有建立索引

select * from t1  ;
select * from t1 where id=1001 or 1=1;
作业: 
SQL审核和审计. yearning.io  github, inception

12.2.2 查询结果集是原表中的大部分数据，应该是15-25%以上。

查询的结果集，超过了总数行数25%，优化器觉得就没有必要走索引了。
MySQL的预读功能有关。

可以通过精确查找范围，达到优化的效果。
1000000

>500000 and

12.2.3 索引本身失效，统计信息不真实（过旧）

索引有自我维护的能力。
对于表内容变化比较频繁的情况下，有可能会出现索引失效。
一般是删除重建

现象:
有一条select语句平常查询时很快,突然有一天很慢,会是什么原因
select?  --->索引失效,统计数据不真实

innodb_index_stats  
innodb_table_stats  

mysql> ANALYZE TABLE world.city;

12.2.4 查询条件使用函数在索引列上，或者对索引列进行运算，运算包括(+，-，*，/，! 等)

例子：
错误的例子：select * from test where id-1=9;
正确的例子：select * from test where id=10;
算术运算
函数运算
子查询

12.2.5 隐式转换导致索引失效.这一点应当引起重视.也是开发中经常会犯的错误.

mysql> desc select * from b where telnum=110;
mysql> desc select * from b where telnum='110';

第四章节 索引及执行计划

第四章节 索引及执行计划

1. 介绍

2. 种类

3. Btree的细分

4. 索引的管理

4.0 索引建立之前压测:

4.1 查询索引

4.2 创建

4.3 删除

4.4 8.0新特性–> invisible index 不可见索引

5. B+tree查找算法介绍

5.1 平衡

5.2 擅长范围查找

5.3 构建过程

6. MySQL中如何应用B+TREE ******

6.1 名词解释

6.2 IOT 组织表

6.3 聚簇(区)索引

6.3.1 构建条件

6.3.2 如何形成B树结构.

6.3.3 优化了哪些查询?

6.4 辅助索引

6.4.1 构建条件

6.4.2 如何形成B树结构

6.4.3 优化了哪些查询?

6.5 联合辅助索引结构

6.5.1 构建过程

6.5.2 优化了哪些查询?

6.5.3 最左原则

7. 索引树高度影响因素

8. 回表问题

8.1 回表是什么?

8.2 回表会带来什么问题?

8.3 怎么减少回表

9. 扩展项: 索引自优化AHI(自适应hash索引)\change buffer

10. 分析执行计划

10.1 是什么?

10.2 为什么要分析执行计划?

10.3 如何抓取执行计划

a. 抓取目标

b. 方法

10.4 如何分析执行计划

10.5 type 详解

a. ALL 全表扫描

b. index 全索引扫描

c. range 索引范围

d. ref : 辅助索引等值查询

e. eq_ref : 非驱动表,连接条件是主键或唯一键.

彩蛋 ： 如何判断一条连接语句中,谁是驱动表?

f. const(system)

10.6 key_len详解

a. 介绍

b. 如何计算索引列的key_len

c. 联合索引应用细节

10.7 extra

11. 扩展项: 关于索引的优化器算法:ICP \ MRR

11.1 ICP : Index Condition Pushdown

11.2 MRR

12. 索引应用规范

12.1 建立索引的原则（DBA运维规范）

12.2 不走索引的情况（开发规范）

12.2.1 没有查询条件，或者查询条件没有建立索引

12.2.2 查询结果集是原表中的大部分数据，应该是15-25%以上。

12.2.3 索引本身失效，统计信息不真实（过旧）

12.2.4 查询条件使用函数在索引列上，或者对索引列进行运算，运算包括(+，-，*，/，! 等)

12.2.5 隐式转换导致索引失效.这一点应当引起重视.也是开发中经常会犯的错误.

12.2.6 <> ，not in 不走索引（辅助索引）

12.2.7 like “%_” 百分号在最前面不走

第四章节索引及执行计划

第四章节索引及执行计划

彩蛋：如何判断一条连接语句中,谁是驱动表?