MySQL ORDER BY / LIMIT performance: late row lookups

转载自:http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/

My latest article on SQL in general:  Happy New Year!. You're welcome to read and comment on it.

MySQL ORDER BY / LIMIT performance: late row lookups

with 4 comments

From Stack Overflow:

When I run an SQL command like the one below, it takes more than 15 seconds:

1. SELECT  *
2. FROM    news
3. WHERE   cat_id = 4
4. ORDER BY
5. id DESC
6. LIMIT   150000, 10

EXPLAIN shows that its using where and the index on (cat_id, id)

LIMIT 20, 10 on the same query only takes several milliseconds.

This task can be reformulated like this: take the last 150,010 rows in id order and return the first 10 of them

It means that though we only need 10 records we still need to count off the first 150,000.

The table has an index which keeps the records ordered. This allows us not to use a filesort. However, the query is still far from being efficient: 15 seconds for 150,000 records (which are already ordered) is way too much.

To better understand the reason behind the low performance let’s look into this picture:

Structure

As we already said before, there is an index created on the table. Logically, an index is a part of a table which is not even visible from the SQL side: all queries are issued against the table, not the index, and the optimizer decides whether to use the index or not.

However, physically, an index is a separate object in the database.

An index is a shadow copy of the table which stores some subset of the table’s data:

  1. Index key, i. e. the columns which the index created on.
  2. Table pointer, that is some value that uniquely identifies a row the record reflects. In InnoDB, it is the value of the PRIMARY KEY; in MyISAM, it is an offset in the .MYD file.

The index records are stored in a B-Tree structure which make the ref and range searching on them super fast.

However, the index itself does not contain all table’s data: only the subset we described above. To find the actual table values while preserving order, one needs to join the index and the table. That is for each index record the engine should find the corresponding table record (using the row pointer) and return all non-indexed values from the table itself.

Here’s how it looks:

All

The process of fetching the table records corresponding to the index records is called row lookup. It is pictured by the curvy arrows connecting the index and the table.

Since the index records and the table records are located far away from each other in the memory and on the disk, the row lookup leads to much more page misses, cache misses and disk seeks that a sequential access and is therefore quite costly. It takes much time to traverse all the connectors on the picture above.

If we do a plain query which returns all the records we of course need to fetch all the records and each row lookup is necessary.

But do we really need it when we use a LIMIT clause with the offset greater than 0?

If we did something like LIMIT 8, 2 we could just skip the first 8 values using the index and the return the remaining two:

Late row lookup

We see that this is a much more efficient algorithm that will save us lots of row lookups.

This is called late row lookup: the engine should look a row up only if there is no way to avoid it. If there is a chance that a row will be filtered out by using the indexed fields only, it should be done before the rows are looked up in the actual MySQL table. There is no point in fetching the records out of the table just to discard them.

However, MySQL always does early row lookup: it searches for a row prior to checking values in the index, even the optimizer decided to use the index.

Let’s create a sample table and try to reproduce this behavior:

Table creation details

This MyISAM table contains 200,000 records and has a PRIMARY KEY index on id. Each record is filled with200 bytes of stuffing data.

Here’s the query to select values from 150,001 to 150,010:

1. SELECT  id, value, LENGTH(stuffing) AS len
2. FROM    t_limit
3. ORDER BY
4. id
5. LIMIT 150000, 10

View query details

This query works for almost 6 seconds which is way too long.

It, however, uses a filesort which the optimizer considered more efficient than using the index. This would make sense if not for the stuffing field which is too long to be sorted efficiently. In this very case traversing the index would be faster.

Let’s try to force the index:

1. SELECT  id, value, LENGTH(stuffing) AS len
2. FROM    t_limit FORCE INDEX (PRIMARY)
3. ORDER BY
4. id
5. LIMIT 150000, 10

View query details

Now it is only 1.23 seconds but still too long due to the early row lookups.

We, however, can trick MySQL to use the late row lookups.

We will only select the id in the subquery with an ORDER BY and LIMIT and then join the original table back on id.

This will make each individual row lookup less efficient, since each join will require looking up the index value again. However, this is not much of a deal, and the total number of lookups will be reduced greatly, so overall performance increase is expected:

01. SELECT  l.id, value, LENGTH(stuffing) AS len
02. FROM    (
03. SELECT  id
04. FROM    t_limit
05. ORDER BY
06. id
07. LIMIT 150000, 10
08. ) o
09. JOIN    t_limit l
10. ON      l.id = o.id
11. ORDER BY
12. l.id

View query details

This is only 75 ms, or 16 times as fast as the previous query.

Note that we put an additional ORDER BY to the end of the query, since the order is not guaranteed to be preserved after the join. This resulted in an extra filesort in the plan. However, the actual plan used outputs the values already sorted and this filesort, therefore, will require only a single pass over 10 values which is instant.

Written by Quassnoi

October 23rd, 2009 at 11:00 pm

Posted in MySQL

4 Responses to 'MySQL ORDER BY / LIMIT performance: late row lookups'

Subscribe to comments with RSS or TrackBack to 'MySQL ORDER BY / LIMIT performance: late row lookups'.

  1. your sql is kind of complex in this case
    you may write you sql in this more effective way:
    SELECT id, value, LENGTH(stuffing) AS len
    FROM t_limit FORCE INDEX (PRIMARY)
    where id>150000
    ORDER BY id
    limit 10

    liuyuxun

    11 Jan 13 at 12:30

  2. @liuyuxun: what if ids are not consecutive?

    Quassnoi

    11 Jan 13 at 15:38

  3. Wow that’s horrible. Do you know if Postgres also suffers from that?

    Martin S.

    2 Mar 13 at 22:23

  4. @Martin: until 9.2, PostgreSQL was not capable of index-only scans at all, since index records do not store information of transaction which created them and hence their visibility could not be determined without examining the table record.

    Quassnoi

    3 Mar 13 at 13:51



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值