Forward-Full table scan vs full index scan performance

最新推荐文章于 2024-07-27 13:12:49 发布

dianyi8080

最新推荐文章于 2024-07-27 13:12:49 发布

阅读量94

点赞数

文章标签： shell 数据库

原文链接：http://www.cnblogs.com/TerryChan/p/3338420.html

版权

Earlier this week, Cédric blogged about how easy we can get confused between a covering index and a full index scan in the EXPLAIN output. While a covering index (seen with EXPLAIN as Extra: Using index) is a very interesting performance optimization, a full index scan (type: index) is according to the documentation the 2nd worst possible execution plan after a full table scan.
If it is obvious that a full table scan is not good for performance, how much can we expect if we can switch to a full index scan? In other terms, is a full table scan always the worst possible execution and should it be avoided at all costs?

Let’s take the employees database, and slightly modify the employees tables:

mysql> ALTER TABLE employees ADD INDEX idx_first (first_name),ENGINE=InnoDB;

1	mysql > ALTER TABLE employees ADD INDEX idx_first ( first_name ) , ENGINE = InnoDB ;

And then let’s consider this query:

SELECT * FROM employees ORDER BY first_name;

1	SELECT * FROM employees ORDER BY first_name ;

This query can of course by executed by running a full table scan, but we could also take advantage of the idx_first index, which will translate to a full index scan.
Let’s see how the optimizer will execute it:

mysql> EXPLAIN SELECT * FROM employees ORDER BY first_name\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 300363 Extra: Using filesort

mysql > EXPLAIN SELECT * FROM employees ORDER BY first_name \ G

* * * * * * * * * * * * * * * * * * * * * * * * * * * 1. row * * * * * * * * * * * * * * * * * * * * * * * * * * *

id : 1

select_type : SIMPLE

table : employees

type : ALL

possible_keys : NULL

key : NULL

key_len : NULL

ref : NULL

rows : 300363

Extra : Using filesort

Surprising? The optimizer preferred a full table scan, and it did not even consider scanning the idx_first index as a relevant choice (possible_keys: NULL).

What do we get if we force the optimizer to use the index?

mysql> EXPLAIN SELECT * FROM employees FORCE INDEX(idx_first) ORDER BY first_name\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees type: index possible_keys: NULL key: idx_first key_len: 16 ref: NULL rows: 300363 Extra:

mysql > EXPLAIN SELECT * FROM employees FORCE INDEX ( idx_first ) ORDER BY first_name \ G

* * * * * * * * * * * * * * * * * * * * * * * * * * * 1. row * * * * * * * * * * * * * * * * * * * * * * * * * * *

id : 1

select_type : SIMPLE

table : employees

type : index

possible_keys : NULL

key : idx_first

key_len : 16

ref : NULL

rows : 300363

Extra :

Honestly, it looks better: the number of rows is the same, but a full index scan instead of a full table scan and no filesort. But predicting how a query will perform by only looking at the execution plan is not enough, we must run both queries and compare execution time.

First case: the employees table does not fit in memory
With the full table scan, the query runs in about 4s.
With the full index scan, it runs in about 30s.

So the optimizer was right after all. But why? Simply because all access patterns are not equal. When we are doing a full table scan, we are doing sequential reads, which are quite fast even with slow disks. But when we are using the index, we first have to do a full index scan (fast sequential reads, no problem) and then lots of random reads to fetch rows by index value. And random reads are orders of magnitude slower than sequential reads.
The optimizer has this knowledge, so it is able to pick the right execution plan.

Second case: the employees table fits in memory
With the full table scan, the query runs in about 3.3s.
With the full index scan, the query runs in about 2.6s.

We can see here a limitation of the optimizer: it does not know on which kind of media data is stored. If it is stored on spinning disks, the assumption that random reads are much slower than sequential reads is correct, but it is not the case anymore if data is stored in memory. That’s why when execution plans look similar, you should always execute the query to really see which execution plan should be chosen. Here if we know that the table will always be in memory, we should add the FORCE INDEX hint to ensure optimal response time.

Now let’s modify the query by selecting only the first_name field instead of selecting all the fields:

mysql> explain SELECT first_name FROM employees ORDER BY first_name\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees type: index possible_keys: NULL key: idx_first key_len: 16 ref: NULL rows: 300584 Extra: Using index

mysql > explain SELECT first_name FROM employees ORDER BY first_name \ G

* * * * * * * * * * * * * * * * * * * * * * * * * * * 1. row * * * * * * * * * * * * * * * * * * * * * * * * * * *

id : 1

select_type : SIMPLE

table : employees

type : index

possible_keys : NULL

key : idx_first

key_len : 16

ref : NULL

rows : 300584

Extra : Using index

The optimizer chooses the full index scan. It is a good choice because the index now covers the query, which means that reading the index is enough to get the results.

Conclusions:

For a non covering index, the different between a full table scan and an execution plan based on a full index scan is basically the difference between sequential reads and random reads: it can be close if you have fast storage or it can be very different if you have slow storage.
A full index scan can become interesting when the index is covering.
Don’t forget to measure response time when you are trying different execution plans. It is too easy to get focused on having a good-looking execution, but the end user only cares about response time!

转载于:https://www.cnblogs.com/TerryChan/p/3338420.html

dianyi8080

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Forward-Full table scan vs full index scan performance

Earlier this week, Cédric blogged about how easy we can get confused between a covering index and a full index scan in the EXPLAIN output. While a covering index (seen with EXPLAIN as Extra: Us...
复制链接

扫一扫