前言:
我们经常通过创建索引进行sql语句的优化,特别是联机交易(OLTP)场景下的数据库,通过创建索引可以使语句的执行效率提供10倍、100倍,但有些时候索引的使用却没有达到我们的预期,出现索引失效或者索引扫描效率低下的问题,这是因为我们没有掌握正确的索引使用方法所导致。
本文主要讲述如何合理正确的使用索引,索引使用原则。
索引使用原则:
1 避免列索引过多
2 索引列的区分度要高
3 区分度高的列放左边
4 匹配左边列
5 匹配列左前缀
6 范围扫描后的数据是无序的
7 避免左边列函数运算
8 避免隐式转化
测试表TEST(MySQL 5.7.32):
CREATE TABLE `test` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(200) DEFAULT NULL,
`net_address` varchar(50) DEFAULT NULL,
`age` bigint(20) DEFAULT NULL,
`name1` varchar(20) DEFAULT NULL,
`insert_date` datetime DEFAULT NULL,
`name3` varchar(20) DEFAULT NULL,
`insert_date1` varchar(200) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=724285 DEFAULT CHARSET=utf8mb4 |
1 表避免列索引过多
创建测试表TEST
create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));
只有主键的索引8个并行插入100W数据需要266秒
[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 266.121 seconds
Minimum number of seconds to run all queries: 266.121 seconds
Maximum number of seconds to run all queries: 266.121 seconds
Number of clients running queries: 8
Average number of queries per client: 125000
接下来,在表上创建6个索引,重新进行8个并行插入100w数据需要282秒
drop table test;
create table test(id bigint auto_increment primary key,name varchar(200),net_address varchar(50),age bigint,name1 varchar(20),name2 varchar(20),name3 VARCHAR(20));
alter table test add index(age);
alter table test add index(net_address);
alter table test add index(name);
alter table test add index(name1);
alter table test add index(name2);
alter table test add index(name3);
mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=16 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
[root@rac19a ~]# mysqlslap -uchenlz -pchenlz -P3306 -h127.0.0.1 --concurrency=8 --number-of-queries=1000000 --create-schema=db1 --query="insert into test(name,net_address,age,name1,name2,name3) values('aaaaaaaa','www.xxxxx.com',FLOOR(RAND() * 100000),'xxxxxxaaa','xxxxxxxxxx','xxxxxxxxx')"
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 282.414 seconds
Minimum number of seconds to run all queries: 282.414 seconds
Maximum number of seconds to run all queries: 282.414 seconds
Number of clients running queries: 8
Average number of queries per client: 125000
总结:表上索引过多会影响表的insert,delete,update性能,因为这些DML操作会维护索引
2 索引列的区分度要高
表列name1的区分度很低,只有2个不同值
root@mysql.sock 22:42: [db1]>select name1,count(*) from test group by name1;
+-------+----------+
| name1 | count(*) |
+-------+----------+
| a | 1000000 |
| b | 1000000 |
+-------+----------+
在name1上创建索引ind_name1
alter table test add index ind_name1(name1);
按name1条件执行查询sql,索引扫描的效率很低,执行时间为1.33秒,EXTRA为null,表示扫描索引后需要回表
root@mysql.sock 22:45: [db1]>explain select sum(age) from test where name1='a';
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
| 1 | SIMPLE | test | NULL | ref | ind_name1 | ind_name1 | 83 | const | 1003085 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 22:45: [db1]>select sum(age) from test where name1='a';
+----------+
| sum(age) |
+----------+
| 1000000 |
+----------+
1 row in set (1.33 sec)
忽略索引,让sql走全表扫描,执行时间0.60秒,执行效率高于走索引
root@mysql.sock 22:50: [db1]>explain select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 2006170 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 22:50: [db1]>select sum(age) from test IGNORE INDEX(ind_name1) where name1='a';
+----------+
| sum(age) |
+----------+
| 1000000 |
+----------+
1 row in set (0.60 sec)
总结:创建索引的列区分度要高,这样才能避免出现高消耗的回表
3 区分度高的列放左边
表TEST列age的区分度为51个值,列insert_date的区分度为316
root@mysql.sock 21:06: [db1]>select count( distinct age) from test;
+----------------------+
| count( distinct age) |
+----------------------+
| 51 |
+----------------------+
root@mysql.sock 21:00: [db1]>select count(distinct insert_date) from test;
+-----------------------------+
| count(distinct insert_date) |
+-----------------------------+
| 316 |
+-----------------------------+
1 row in set (0.00 sec)
将区分度低的age列放在左边第一列,区分度高的insert_date放在第二列
root@mysql.sock 21:11: [db1]>alter table test add index ind_test_2(age,insert_date);
测试查询索引的访问效率,执行时间为0.11秒,filtered的效率很低只有11.11
root@mysql.sock 21:11: [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_5,ind_test_2 | ind_test_2 | 9 | NULL | 308779 | 11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 21:11: [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.11 sec)
将区分度低的insert_date列放在左边第一列,区分度高的age放在第二列
root@mysql.sock 21:11: [db1]>alter table test drop index ind_test_2;
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
root@mysql.sock 21:11: [db1]>alter table test add index ind_test_2(insert_date,age);
Query OK, 0 rows affected (1.06 sec)
Records: 0 Duplicates: 0 Warnings: 0
测试查询索引的访问效率,执行时间为0.001秒,filtered的效率有50
root@mysql.sock 21:12: [db1]>explain select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_5,ind_test_2 | ind_test_2 | 6 | NULL | 8 | 50.00 | Using where; Using index |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 21:12: [db1]>select count(*) from test where age>60 and age <80 and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29';
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (0.00 sec)
root@mysql.sock 21:12: [db1]>
总结:创建索引区分度高的列要放在左边,这样才能最大划的过滤数据,提示索引的访问效率
4 匹配左边列
创建索引ind_test_2,age在左边第一列,insert_date在第二列
alter table test add key ind_test_2(age,insert_date);
使用列age进行查询,可以正常使用到索引
root@mysql.sock 15:51: [db1]>explain select sum(length(name3)) from test where age<2;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_2 | ind_test_2 | 9 | NULL | 2000 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
使用insert_date进行查询,没有使用上索引,因为insert_date在索引的第二列
root@mysql.sock 11:23: [db1]>explain select sum(length(name3)) from test where insert_date>'2022-05-16 15:44:30' and insert_date<'2022-05-20 15:44:30';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199121 | 11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
总结:在创建索引或者写sql语句的时候,需要在使用到的列放在索引的左边,MySQL5.7没有跳跃索引访问的功能,在MySQL 8.0实现了跳跃索引访问功能,但跳跃索引的性能肯定不及范围索引扫描
5 匹配列左前缀
创建索引ind_test_4
alter table test add key ind_test_4(net_address);
使用左边模糊匹配,可以使用上索引,并且过滤很高
root@mysql.sock 11:37: [db1]>explain select sum(length(name3)) from test where net_address like 'aaa.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_4 | ind_test_4 | 203 | NULL | 4 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
使用右边模糊匹配,没有使用索引,过滤只有11.11,执行效率下降
root@mysql.sock 11:37: [db1]>explain select sum(length(name3)) from test where net_address like '%.cn';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199125 | 11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
总结:索引是按顺序存放的,如果只有列的右边进行模糊匹配,会导致访问效率降低,对于需要进行右边匹配的查询,建议将值倒序存放
倒序存放方法
---倒序存放
update test set net_address=reverse(net_address);
select name3,reverse(net_address) from test where net_address like 'nc.%';
root@mysql.sock 11:43: [db1]>select name3,reverse(net_address) from test where net_address like 'nc.%';
+-------+----------------------+
| name3 | reverse(net_address) |
+-------+----------------------+
| xx | aaa.xxxxx.cn |
| xx | aaa.xxxxx.cn |
+-------+----------------------+
2 rows in set (0.00 sec)
root@mysql.sock 11:43: [db1]>explain select name3,reverse(net_address) from test where net_address like 'nc.%';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_4 | ind_test_4 | 203 | NULL | 2 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
6 范围扫描后的数据是无序的
创建组合索引ind_test_5,有列age,net_address,insert_date组成
alter table test add key ind_test_5(age,net_address,insert_date);
执行以下sql,等值查询之后按单个条件进行范围扫描,通过索引可以消除order by的排序没有使用filesort排序,说明索引等值查询之后,索引其他列的数据依然是有序的
root@mysql.sock 11:54: [db1]>explain select insert_date
from test
where age=20 and net_address ='aaa.xxxxx.cn' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29'
order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218 | NULL | 2 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+--------------------------+
执行以下sql,等值查询之后进行两个条件的范围扫描,可以发现使用了filesort文件排序,说明范围扫描之后,索引其他列的数据是乱序的
root@mysql.sock 20:17: [db1]>explain select insert_date from test where age=20 and net_address like 'aaa.xxxxx.cn%' and insert_date>'2022-08-24 11:35:29' and insert_date<'2022-08-26 11:35:29' order by insert_date;
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_2,ind_test_3,ind_test_5 | ind_test_5 | 218 | NULL | 2 | 2.50 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+----------------------------------+------------+---------+------+------+----------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)
root@mysql.sock 20:21: [db1]>
总结:索引进行范围扫描之后的数据是乱序的,这时候的索引查询结果没法消除排序,需要进行filesort文件排序
7 避免左边列函数运算
创建索引ind_test_6
alter table test add key ind_test_6(age);
对age在左边进行运算age-2,可以发现sql没有使用上索引,索引失效
root@mysql.sock 20:28: [db1]>explain select name from test where age-50=2;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | test | NULL | ALL | NULL | NULL | NULL | NULL | 199121 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
对age在右边进行运算50+2,可以发现sql正常使用上索引ind_test_6
root@mysql.sock 20:28: [db1]>explain select name from test where age=50+2;
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | test | NULL | ref | ind_test_6 | ind_test_6 | 9 | const | 2000 | 100.00 | Using index condition |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+------+----------+-----------------------+
总结:避免对左边列进行函数或者算术运算,会导致索引失效
8 避免隐式转化
创建索引ind_test_7
alter table test add key ind_test_7(insert_date1);
按条件insert_date1进行范围查询,可以使用索引
root@mysql.sock 20:56: [db1]>explain select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | test | NULL | range | ind_test_1 | ind_test_1 | 803 | NULL | 4000 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+
root@mysql.sock 20:56: [db1]>select count(*) from test where insert_date1 >'20220516160000' and insert_date1<'20220518160000';
+----------+
| count(*) |
+----------+
| 4000 |
+----------+
1 row in set (0.00 sec)
按条件insert_date1进行范围查询,将输入的number类型的值,可以使用索引,但执行时间增加为0.05秒,过滤下降到11.11,并且type类似为index,全索引扫描
root@mysql.sock 20:56: [db1]>explain select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | test | NULL | index | ind_test_1 | ind_test_1 | 803 | NULL | 199100 | 11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+
root@mysql.sock 20:57: [db1]>select count(*) from test where insert_date1 >20220516160000 and insert_date1<20220518160000;
+----------+
| count(*) |
+----------+
| 4000 |
+----------+
1 row in set (0.05 sec)
总结:过滤条件要避免发生隐式转化,隐式转化会引发索引数据访问低效以及索引失效问题