MySQL 数据库增加索引和不加索引查询效率对比及分析
分析什么时候使用组合索引,什么时候使用单独索引分析
一.前期数据准备
1.建表
CREATE TABLE `user` (
`uid` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) DEFAULT NULL,
`pwd` varchar(50) DEFAULT NULL,
`create_time` datetime DEFAULT NULL,
`modify_time` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`rids` varchar(15) DEFAULT NULL,
`nickname` varchar(45) DEFAULT NULL,
`company` varchar(15) DEFAULT NULL,
PRIMARY KEY (`uid`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
2.插入数据
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (1, 'rocker', 'rocker', NULL, '2021-02-26 11:05:02', '1', 'rocker', 'rocker');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (2, 'danny', 'danny', NULL, '2021-02-26 11:31:36', '2', 'rocker', 'danny');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (3, 'tom', 'tom', NULL, '2021-02-26 11:31:39', '1', 'tom', 'rocker');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (4, 'messi', 'messi', NULL, '2021-02-26 11:31:21', '2', 'messi', 'messi');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (5, 'wenger', 'wenger', NULL, '2021-02-26 11:29:38', '1', 'wenger', 'rocker');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (6, 'henry', 'henry', NULL, '2021-02-26 11:30:46', '2', 'henry', 'henry');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (7, 'ronaldo', 'ronaldo', NULL, '2021-02-26 11:30:49', '1', 'ronaldo', 'ronaldo');
INSERT INTO `test`.`user`(`uid`, `name`, `pwd`, `create_time`, `modify_time`, `rids`, `nickname`, `company`) VALUES (8, 'kaka', 'kaka', NULL, '2021-02-26 11:29:45', '2', 'kaka', 'rocker');
二.分析
1.不加索引
首先在’nickname’和‘company’这俩字段不加索引的情况下执行一个查询语句,并分析
mysql> explain select * from user where nickname = 'rocker' and company = 'rocker';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 12.50 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到,没有走索引,总共查询了8条数据,而表中总共也是8条数据,相当于全表扫描了。
mysql> explain select * from user where company = 'rocker' or nickname = 'rocker';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 23.44 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到:不加任何索引的情况下,不管是and还是or,都是全表扫描,没有索引。
2.单独索引
给nickname和company分别加上索引,再执行and和or的sql查询
alter table user add index `idx_nickname` (`nickname`);
alter table user add index `idx_company` (`company`);
执行查询语句and
mysql> explain select * from user where nickname = 'rocker' and company = 'rocker';
+----+-------------+-------+------+--------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | user | ref | idx_nickname,idx_company | idx_nickname | 138 | const | 2 | Using where |
+----+-------------+-------+------+--------------------------+--------------+---------+-------+------+-------------+
1 row in set (0.05 sec)
执行查询语句or
mysql> explain select * from user where company = 'rocker' or nickname = 'rocker';
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | user | ALL | idx_nickname,idx_company | NULL | NULL | NULL | 8 | Using where |
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
可以看到:加上索引后and查询是可以走索引的,但是只有一个索引起作用,对于另一个索引字段还是要进行遍历,而且and查询会根据关联性高(符合该条件的行数少)选择具体走哪个索引
or查询不走索引
3.组合索引
删除原先的单独索引,新增组合索引
alter table user drop index `idx_nickname`;
alter table user drop index `idx_company`;
alter table user add index `idx_composition` (`nickname`,`company`);
执行查询语句and
mysql> explain select * from user where nickname = 'rocker' and company = 'rocker';
+----+-------------+-------+------+-----------------+-----------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+-----------------+---------+-------------+------+-------------+
| 1 | SIMPLE | user | ref | idx_composition | idx_composition | 186 | const,const | 1 | Using where |
+----+-------------+-------+------+-----------------+-----------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
执行查询语句or
mysql> explain select * from user where company = 'rocker' or nickname = 'rocker';
+----+-------------+-------+------+-----------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+------+---------+------+------+-------------+
| 1 | SIMPLE | user | ALL | idx_composition | NULL | NULL | NULL | 8 | Using where |
+----+-------------+-------+------+-----------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
可以看到:加上组合索引后,组合索引起作用,只需查询一条符合结果的数据,效率要比单独索引高,
但是复合索引对于or查询不起作用
4.组合索引查询单个索引列
对于组合索引为(nickname
,company
)这个顺序的情况
alter table user drop index `idx_composition`;
alter table user add index `idx_composition` (`nickname`,`company`);
mysql> explain select * from user where nickname = 'rocker';
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
| 1 | SIMPLE | user | ref | idx_composition | idx_composition | 138 | const | 2 | Using where |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql> explain select * from user where company = 'rocker';
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | user | ALL | NULL | NULL | NULL | NULL | 8 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
可以看到:组合索引中nickname在前时,单独查询nickname会走索引,单独查询company不会走索引
对于组合索引为(company
,nickname
)这个顺序的情况
alter table user drop index `idx_composition`;
alter table user add index `idx_composition` (`company`,`nickname`);
mysql> explain select * from user where nickname = 'rocker';
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | user | ALL | NULL | NULL | NULL | NULL | 8 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain select * from user where company = 'rocker';
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
| 1 | SIMPLE | user | ref | idx_composition | idx_composition | 48 | const | 4 | Using where |
+----+-------------+-------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
可以看到:组合索引中company在前时,单独查询company会走索引,单独查询nickname不会走索引
如果组合索引是(A,B),则对于条件where A=a,是可以用上这个组合索引的,因为组合索引是先按照第一列进行排序的,所以没必要对A单独建立一个索引,但是对于条件where B=b就用不上了,因为只有在第一列相同的情况下,才比较第二列,因为第二列相同时,可以分布在不同的节点上,没办法快速定位
三.组合索引的优势
1.减少开销
建一个联合索引 (a,b,c) ,实际相当于建了 (a)、(a,b)、(a,b,c) 三个索引。每多一个索引,都会增加写操作的开销和磁盘空间的开销。对于大量数据的表,使用联合索引会大大的减少开销!
2.覆盖索引
对联合索引 (a,b,c),如果有如下 sql 的,
SELECT a,b,c from table where a='xx' and b = 'xx';
那么 MySQL 可以直接通过遍历索引取得数据,而无需回表,这减少了很多的随机 io 操作。减少 io 操作,特别是随机 io 其实是 DBA 主要的优化策略。所以,在真正的实际应用中,覆盖索引是主要的提升性能的优化手段之一。
3.效率高
索引列多,通过联合索引筛选出的数据越少。比如有 1000W 条数据的表,有如下SQL:
select col1,col2,col3 from table where col1=1 and col2=2 and col3=3;
假设:假设每个条件可以筛选出 10% 的数据。
- A. 如果只有单列索引,那么通过该索引能筛选出 1000W * 10%=100w 条数据,然后再回表从 100w 条数据中找到符合 col2=2 and col3= 3 的数据,然后再排序,再分页,以此类推(递归);
- B. 如果是(col1,col2,col3)联合索引,通过三列索引筛选出 1000w * 10% * 10% *10%=1w,效率提升可想而知!
四.索引是建的越多越好吗
答案自然是否定的
- 数据量小的表不需要建立索引,建立会增加额外的索引开销
- 不经常引用的列不要建立索引,因为不常用,即使建立了索引也没有多大意义
- 经常频繁更新的列不要建立索引,因为肯定会影响插入或更新的效率
- 数据重复且分布平均的字段,因此他建立索引就没有太大的效果(例如性别字段,只有男女,不适合建立索引)
- 数据变更需要维护索引,意味着索引越多维护成本越高。
- 更多的索引也需要更多的存储空间