MySQL-如何合理正确使用索引

前言:

我们经常通过创建索引进行sql语句的优化,特别是联机交易(OLTP)场景下的数据库,通过创建索引可以使语句的执行效率提供10倍、100倍,但有些时候索引的使用却没有达到我们的预期,出现索引失效或者索引扫描效率低下的问题,这是因为我们没有掌握正确的索引使用方法所导致。

本文主要讲述如何合理正确的使用索引,索引使用原则。

索引使用原则:

1 避免列索引过多

2 索引列的区分度要高

3 区分度高的列放左边

4 匹配左边列

5 匹配列左前缀

6 范围扫描后的数据是无序的

7 避免左边列函数运算

8 避免隐式转化

测试表TEST(MySQL 5.7.32):

 CREATE TABLE `test` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(200) DEFAULT NULL,
  `net_address` varchar(50) DEFAULT NULL,
  `age` bigint(20) DEFAULT NULL,
  `name1` varchar(20) DEFAULT NULL,
  `insert_date` datetime DEFAULT NULL,
  `name3` varchar(20) DEFAULT NULL,
  `insert_date1` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=724285 DEFAULT CHARSET=utf8mb4 |

表避免列索引过多

创建测试表TEST

create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));

只有主键的索引8个并行插入100W数据需要266秒

[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
  Average number of seconds to run all queries: 266.121 seconds
  Minimum number of seconds to run all queries: 266.121 seconds
  Maximum number of seconds to run all queries: 266.121 seconds
  Number of clients running queries: 8
  Average number of queries per client: 125000

接下来,在表上创建6个索引,重新进行8个并行插入100w数据需要282秒

drop table test;
create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20)); 
alter table test add index(age);
alter table test add index(net_address);
alter table test add index(name);
alter table test add index(name1);
alter table test add index(name2);
alter table test add index(name3);
​
mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=16 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1  --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
​
Benchmark
  Average number of seconds to run all queries: 282.414 seconds
  Minimum number of seconds to run all queries: 282.414 seconds
  Maximum number of seconds to run all queries: 282.414 seconds
  Number of clients running queries: 8
  Average number of queries per client: 125000

总结:表上索引过多会影响表的insert,delete,update性能,因为这些DML操作会维护索引

2 索引列的区分度要高

表列name1的区分度很低,只有2个不同值

root@mysql.sock 22:42:  [db1]>select name1,count(*) from test group by name1;
+-------+----------+
| name1 | count(*) |
+-------+----------+
| a     |  1000000 |
| b     |  1000000 |
+-------+----------+

在name1上创建索引ind_name1

alter table test add index ind_name1(name1);

按name1条件执行查询sql,索引扫描的效率很低,执行时间为1.33秒,EXTRA为null,表示扫描索引后需要回表

root@mysql.sock 22:45:  [db1]>explain select sum(age) from test where name1='a';
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref   | rows    | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
|  1 | SIMPLE      | test  | NULL       | ref  | ind_name1     | ind_name1 | 83      | const | 1003085 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
​
root@mysql.sock 22:45:  [db1]>select sum(age) from test where name1='a';
+----------+
| sum(age) |
+----------+
|  1000000 |
+----------+
1 row in set (1.33 sec)

忽略索引,让sql走全表扫描,执行时间0.60秒,执行效率高于走索引

root@mysql.sock 22:50:  [db1]>explain select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2006170 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
​
root@mysql.sock 22:50:  [db1]>select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----------+
| sum(age) |
+----------+
|  1000000 |
+----------+
1 row in set (0.60 sec)

 总结:创建索引的列区分度要高,这样才能避免出现高消耗的回表

3 区分度高的列放左边

表TEST列age的区分度为51个值,列insert_date的区分度为316

root@mysql.sock 21:06:  [db1]>select count( distinct age) from test;
+----------------------+
| count( distinct age) |
+----------------------+
|                   51 |
+----------------------+
​
​
root@mysql.sock 21:00:  [db1]>select count(distinct insert_date) from test;
+-----------------------------+
| count(distinct insert_date) |
+-----------------------------+
|                         316 |
+-----------------------------+
1 row in set (0.00 sec)

将区分度低的age列放在左边第一列,区分度高的insert_date放在第二列

root@mysql.sock 21:11:  [db1]>alter table test add index ind_test_2(age,insert_date);

测试查询索引的访问效率,执行时间为0.11秒,filtered的效率很低只有11.11

root@mysql.sock 21:11:  [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys         | key        | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_5,ind_test_2 | ind_test_2 | 9       | NULL | 308779 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
​
​
root@mysql.sock 21:11:  [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.11 sec)

将区分度低的insert_date列放在左边第一列,区分度高的age放在第二列

root@mysql.sock 21:11:  [db1]>alter table test drop index ind_test_2;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0
​
root@mysql.sock 21:11:  [db1]>alter table test add index ind_test_2(insert_date,age);
Query OK, 0 rows affected (1.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

测试查询索引的访问效率,执行时间为0.001秒,filtered的效率有50

root@mysql.sock 21:12:  [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys         | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_5,ind_test_2 | ind_test_2 | 6       | NULL |    8 |    50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
​
root@mysql.sock 21:12:  [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (0.00 sec)
​
root@mysql.sock 21:12:  [db1]>

总结:创建索引区分度高的列要放在左边,这样才能最大划的过滤数据,提示索引的访问效率

4 匹配左边列

创建索引ind_test_2,age在左边第一列,insert_date在第二列

alter table test add key ind_test_2(age,insert_date);

 使用列age进行查询,可以正常使用到索引

root@mysql.sock 15:51:  [db1]>explain select sum(length(name3)) from test where age<2;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2    | ind_test_2 | 9       | NULL | 2000 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+

使用insert_date进行查询,没有使用上索引,因为insert_date在索引的第二列

root@mysql.sock 11:23:  [db1]>explain select sum(length(name3)) from test where insert_date>'2022-05-16 15:44:30' and insert_date<'2022-05-20 15:44:30';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199121 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

总结:在创建索引或者写sql语句的时候,需要在使用到的列放在索引的左边,MySQL5.7没有跳跃索引访问的功能,在MySQL 8.0实现了跳跃索引访问功能,但跳跃索引的性能肯定不及范围索引扫描

5 匹配列左前缀

创建索引ind_test_4

alter table test add key ind_test_4(net_address);

使用左边模糊匹配,可以使用上索引,并且过滤很高

root@mysql.sock 11:37:  [db1]>explain select sum(length(name3)) from test where net_address like 'aaa.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_4    | ind_test_4 | 203     | NULL |    4 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

使用右边模糊匹配,没有使用索引,过滤只有11.11,执行效率下降

root@mysql.sock 11:37:  [db1]>explain select sum(length(name3)) from test where net_address like '%.cn';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199125 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

总结:索引是按顺序存放的,如果只有列的右边进行模糊匹配,会导致访问效率降低,对于需要进行右边匹配的查询,建议将值倒序存放

倒序存放方法

---倒序存放
update test set net_address=reverse(net_address);
select name3,reverse(net_address) from test where net_address like 'nc.%';
​
root@mysql.sock 11:43:  [db1]>select name3,reverse(net_address) from test where net_address like 'nc.%';
+-------+----------------------+
| name3 | reverse(net_address) |
+-------+----------------------+
| xx    | aaa.xxxxx.cn         |
| xx    | aaa.xxxxx.cn         |
+-------+----------------------+
2 rows in set (0.00 sec)
​
root@mysql.sock 11:43:  [db1]>explain select name3,reverse(net_address) from test where net_address like 'nc.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_4    | ind_test_4 | 203     | NULL |    2 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

6 范围扫描后的数据是无序的

创建组合索引ind_test_5,有列age,net_address,insert_date组成

alter table test add key ind_test_5(age,net_address,insert_date);

执行以下sql,等值查询之后按单个条件进行范围扫描,通过索引可以消除order by的排序没有使用filesort排序,说明索引等值查询之后,索引其他列的数据依然是有序的

root@mysql.sock 11:54:  [db1]>explain select insert_date 
  from test 
  where age=20  and net_address ='aaa.xxxxx.cn' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29' 
  order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                    | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218     | NULL |    2 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
​

执行以下sql,等值查询之后进行两个条件的范围扫描,可以发现使用了filesort文件排序,说明范围扫描之后,索引其他列的数据是乱序的

root@mysql.sock 20:17:  [db1]>explain select insert_date    from test    where age=20  and net_address like 'aaa.xxxxx.cn%' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29'    order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys                    | key        | key_len | ref  | rows | filtered | Extra                                    |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218     | NULL |    2 |     2.50 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
​
root@mysql.sock 20:21:  [db1]>

总结:索引进行范围扫描之后的数据是乱序的,这时候的索引查询结果没法消除排序,需要进行filesort文件排序

7 避免左边列函数运算

创建索引ind_test_6

alter table test add key ind_test_6(age);

对age在左边进行运算age-2,可以发现sql没有使用上索引,索引失效

root@mysql.sock 20:28:  [db1]>explain select name    from test    where age-50=2;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 199121 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+

对age在右边进行运算50+2,可以发现sql正常使用上索引ind_test_6

root@mysql.sock 20:28:  [db1]>explain select name    from test    where age=50+2;
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys         | key        | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | ind_test_6 | ind_test_6 | 9       | const | 2000 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+

总结:避免对左边列进行函数或者算术运算,会导致索引失效

8 避免隐式转化

创建索引ind_test_7

alter table test add key ind_test_7(insert_date1);

按条件insert_date1进行范围查询,可以使用索引

root@mysql.sock 20:56:  [db1]>explain select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | test  | NULL       | range | ind_test_1    | ind_test_1 | 803     | NULL | 4000 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
​
root@mysql.sock 20:56:  [db1]>select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----------+
| count(*) |
+----------+
|     4000 |
+----------+
1 row in set (0.00 sec)

按条件insert_date1进行范围查询,将输入的number类型的值,可以使用索引,但执行时间增加为0.05秒,过滤下降到11.11,并且type类似为index,全索引扫描

root@mysql.sock 20:56:  [db1]>explain select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key        | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE        | test  | NULL       | index | ind_test_1    | ind_test_1 | 803     | NULL | 199100 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
​
​
root@mysql.sock 20:57:  [db1]>select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----------+
| count(*) |
+----------+
|     4000 |
+----------+
1 row in set (0.05 sec)

总结:过滤条件要避免发生隐式转化,隐式转化会引发索引数据访问低效以及索引失效问题

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值