Forward-Full table scan vs full index scan performance

Earlier this week, Cédric blogged about how easy we can get confused between a covering index and a full index scan in the EXPLAIN output. While a covering index (seen with EXPLAIN as Extra: Using index) is a very interesting performance optimization, a full index scan (type: index) is according to the documentation the 2nd worst possible execution plan after a full table scan.
If it is obvious that a full table scan is not good for performance, how much can we expect if we can switch to a full index scan? In other terms, is a full table scan always the worst possible execution and should it be avoided at all costs?

Let’s take the employees database, and slightly modify the employees tables:

And then let’s consider this query:

This query can of course by executed by running a full table scan, but we could also take advantage of the idx_first index, which will translate to a full index scan.
Let’s see how the optimizer will execute it:

Surprising? The optimizer preferred a full table scan, and it did not even consider scanning the idx_first index as a relevant choice (possible_keys: NULL).

What do we get if we force the optimizer to use the index?

Honestly, it looks better: the number of rows is the same, but a full index scan instead of a full table scan and no filesort. But predicting how a query will perform by only looking at the execution plan is not enough, we must run both queries and compare execution time.

First case: the employees table does not fit in memory
With the full table scan, the query runs in about 4s.
With the full index scan, it runs in about 30s.

So the optimizer was right after all. But why? Simply because all access patterns are not equal. When we are doing a full table scan, we are doing sequential reads, which are quite fast even with slow disks. But when we are using the index, we first have to do a full index scan (fast sequential reads, no problem) and then lots of random reads to fetch rows by index value. And random reads are orders of magnitude slower than sequential reads.
The optimizer has this knowledge, so it is able to pick the right execution plan.

Second case: the employees table fits in memory
With the full table scan, the query runs in about 3.3s.
With the full index scan, the query runs in about 2.6s.

We can see here a limitation of the optimizer: it does not know on which kind of media data is stored. If it is stored on spinning disks, the assumption that random reads are much slower than sequential reads is correct, but it is not the case anymore if data is stored in memory. That’s why when execution plans look similar, you should always execute the query to really see which execution plan should be chosen. Here if we know that the table will always be in memory, we should add the FORCE INDEX hint to ensure optimal response time.

Now let’s modify the query by selecting only the first_name field instead of selecting all the fields:

The optimizer chooses the full index scan. It is a good choice because the index now covers the query, which means that reading the index is enough to get the results.

Conclusions:

  • For a non covering index, the different between a full table scan and an execution plan based on a full index scan is basically the difference between sequential reads and random reads: it can be close if you have fast storage or it can be very different if you have slow storage.
  • A full index scan can become interesting when the index is covering.
  • Don’t forget to measure response time when you are trying different execution plans. It is too easy to get focused on having a good-looking execution, but the end user only cares about response time!

转载于:https://www.cnblogs.com/TerryChan/p/3338420.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值