Loose index scan
各种资料关于loose index scan的解释很拗口,其实等同于oracle的index skip scan。
对于复合索引(sex, id),前导列sex的值只有两个:M和F,
(`F',98)
(`F',100)
(`F',102)
(`F',104)
(`M',101)
(`M',103)
(`M',105)
则索引可以逻辑的分成两个子索引
当执行select* from t where id = N时,等价于
select* from t where id = N and sex =‘M’
union
select* from t where id = N and sex = 'F'
进而有效利用到索引进行查找,如果换成mysql,则只能执行全表扫描;
不考虑字段选择性的前提下,即便将sql写为select* from t where id = N and sex in(‘M’,'F'),pre-5.6在索引扫描时也不会使用到id列过滤,而是在读取出行记录后再进行id=N的筛选。
的案例。
CREATE TABLE log_table (
id INT NOT NULL PRIMARY KEY,
log_machine VARCHAR(20) NOT NULL,
log_time DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE INDEX ix_log_machine_time ON log_table (log_machine, log_time);
1
SELECTMAX(log_time) FROM log_table;
SELECT MAX(log_time) FROM log_table WHERE log_machine IN ('Machine 1');
这两条sql都只需一次index seek便可返回,源于索引的有序排序,优化器意识到min/max位于最左/右块,从而避免范围扫描;
extra显示Select tables optimized away;
2
SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’,’Machine 2’,’Machine 3’,’Machine 4’);
执行计划type为range(extra显示using where; using index),即执行索引范围扫描,先读取所有满足log_machine约束的记录,然后对其遍历找出max value;
改进
SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’,’Machine 2’,’Machine 3’,’Machine 4’)group by log_machineorder by 1 desc limit 1;
这满足group by选择loose index scan的要求,执行计划的extra显示using index for group-by,执行效果等值于
SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’)
Union
SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 2’)
…..
即对每个log_machine执行loose index scan,rows从原来的82636下降为16(该表总共1,000,000条记录)。
Group by何时使用loose index scan?
适用条件:
1针对单表操作
2Group by使用索引的最左前缀列
3只支持聚集函数min()/max()
4Where条件出现的列必须为=constant操作 , 没出现在group by中的索引列必须使用constant
5不支持前缀索引,即部分列索引 ,如index(c1(10))
执行计划的extra应该显示using index for group-by
假定表t1有个索引idx(c1,c2,c3)
SELECT c1, c2 FROM t1 GROUP BY c1, c2;
SELECT DISTINCT c1, c2 FROM t1;
SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;
SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;
SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2
SELECT c1, c3 FROM t1 GROUP BY c1, c2;--无法使用松散索引
而SELECT c1, c3 FROM t1where c3= const GROUP BY c1, c2;则可以
紧凑索引扫描tight index scan
Group by在无法使用loose index scan,还可以选择tight,若两者都不可选,则只能借助临时表;
扫描索引时,须读取所有满足条件的索引键,要么是全索引扫描,要么是范围索引扫描;
Group by的索引列不连续;或者不是从最左前缀开始,但是where条件里出现最左列;
SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;
SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;
5.6的改进
事实上,5.6的index condition push down可以弥补loose index scan缺失带来的性能损失。
KEY(age,zip)
mysql> explain SELECTname FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| id | select_type | table| type| possible_keys | key| key_len | ref| rows| Extra|
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
|1 | SIMPLE| people | range | age| age|4| NULL | 90556 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
1 row in set (0.01 sec)
根据key_len=4可以推测出sql只用到索引的第一列,即先通过索引查出满足age (18,20)的行记录,然后从server层筛选出满足zip约束的行;
pre-5.6,对于复合索引,只有当引导列使用"="时才有机会在索引扫描时使用到后面的索引列。
mysql> explain SELECTname FROM people WHERE age=18 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table| type| possible_keys | key| key_len | ref| rows | Extra|
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
|1 | SIMPLE| people | range | age| age| 8| NULL |3 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
对比一下查询效率
mysql> SELECTsql_no_cache name FROM people WHERE age=19 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name|
+----------------------------------+
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
1 row in set (0.06 sec)
mysql> SELECTSQL_NO_CACHE name FROM people WHERE age BETWEEN 18 AND 22 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name|
+----------------------------------+
| ed4481336eb9adca222fd404fa15658e |
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
2 rows in set (1 min 56.09 sec)
对于第二条sql,可以使用union改写,
mysql> SELECTname FROM people WHERE age=18 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECTname FROM people WHERE age=19 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECTname FROM people WHERE age=20 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECTname FROM people WHERE age=21 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECTname FROM people WHERE age=22 AND zip IN (12345,12346, 12347);
而mysql5.6引入了index condition pushdown,从优化器层面解决了此类问题。