MySQL 最左原则的理解

最新推荐文章于 2024-07-12 15:13:20 发布

bohu83

最新推荐文章于 2024-07-12 15:13:20 发布

阅读量3.6k

点赞数 3

分类专栏： MYSQL 文章标签： MySQL 联合索引最左原则范围查询停止匹配

本文链接：https://blog.csdn.net/bohu83/article/details/108613193

版权

MYSQL 专栏收录该内容

84 篇文章 23 订阅

订阅专栏

先看下规则：一下是美团的技术

建索引的几大原则

1.最左前缀匹配原则，非常重要的原则，mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配，比如a = 1 and b = 2 and c > 3 and d = 4 如果建立(a,b,c,d)顺序的索引，d是用不到索引的，如果建立(a,b,d,c)的索引则都可以用到，a,b,d的顺序可以任意调整。

2.=和in可以乱序，比如a = 1 and b = 2 and c = 3 建立(a,b,c)索引可以任意顺序，mysql的查询优化器会帮你优化成索引可以识别的形式。

3.尽量选择区分度高的列作为索引，区分度的公式是count(distinct col)/count(*)，表示字段不重复的比例，比例越大我们扫描的记录数越少，唯一键的区分度是1，而一些状态、性别字段可能在大数据面前区分度就是0，那可能有人会问，这个比例有什么经验值吗？使用场景不同，这个值也很难确定，一般需要join的字段我们都要求是0.1以上，即平均1条扫描10条记录。

4.索引列不能参与计算，保持列“干净”，比如from_unixtime(create_time) = ’2014-05-29’就不能使用到索引，原因很简单，b+树中存的都是数据表中的字段值，但进行检索时，需要把所有元素都应用函数才能比较，显然成本太大。所以语句应该写成create_time = unix_timestamp(’2014-05-29’)。

5.尽量的扩展索引，不要新建索引。比如表中已经有a的索引，现在要加(a,b)的索引，那么只需要修改原来的索引即可。

有人问为啥联合索引最左前缀遇到范围查询后，后面的字段不走索引呢？他搜了半天没搜到满意的答复，问题是我也没自己看过。需要学习（Google 下）。

索引的概念我不在重复了，https://blog.csdn.net/bohu83/article/details/81104432

再看MySQL官网介绍：

MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column (see Section 8.3.4, “Column Indexes”).

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.

Note

As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns. In MySQL, it is very easy to use this extra column:
SELECT * FROM tbl_name
  WHERE hash_col=MD5(CONCAT(val1,val2))
  AND col1=val1 AND col2=val2;
。。。。

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer attempts to use the Index Merge optimization (see Section 8.2.1.3, “Index Merge Optimization”), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

这里介绍两点：一个是联合索引的大概结构，你可以认为是多个列有hash过的。还有就是联合索引的最左原则。

再找了片老外的：

List
↓
Where A in (0,1) and B=5
A B
Range
↓
Where A beetween 0 and 1 and B=5
A B
There is such a nuance here. For example, A in (0,1) and A between 0 and 1 are equivalent forms, in both cases those are ranges, but in the case when it is A in (0,1) – list, MySQL realizes that it is not range and replaces it by multiple equality conditions. In this case, MySQL will use the index. So one needs to make a right decision – either to write by a list or by putting <> signs.
Let’s look another example:
Where B=5 <=> Where A in (0,1) and B=5

(select .. Where A=0 and B=5 order by C limit 10)

union all

(select .. Where A=1 and B=5 order by C limit 10)

order by C limit 10;
Here if we have the index(A, B), then it will not be used for “where B = 5” condition, but we can make it in an application so that it independently substitutes the missing condition if there is a small possible version of the list. But with this recommendation, you need to be very cautious, because, despite the fact that in the presence of a list, the further use of the index is not discarded, it cannot be used for sorting, i.e. only on equality. Therefore, if we have queries for sorting, then we will have to rebuild the query through “union all” so that there are no lists in order to use sorting. Naturally, this is not always possible and not always convenient.

这一篇讲了in 的改写，也能走索引，而范围查询这里使用了between and. 这跟>,like一个效果。是范围查询range （range condition ）感兴趣的可以搜索：leftmost prefix range condition 。其实会不会Google 关键看你搜索的条件是啥。

继续，而且它解释了原因，大致理解就是无序。

找个例子：

In the "High Performance MySQL" book says,

The ORDER BY clause also has the same limitation as lookup queries: it needs to form a leftmost prefix of the index. In all other cases, MySQL uses a filesort.

看看，高性能MySQL这样宝典都说了。看个stackoverflow案例：

CREATE TABLE rental (
...
PRIMARY KEY (rental_id),
UNIQUE KEY rental_date (rental_date,inventory_id,customer_id),
KEY idx_fk_inventory_id (inventory_id),
KEY idx_fk_customer_id (customer_id),
KEY idx_fk_staff_id (staff_id),
...
);

EXPLAIN 
SELECT `rental_id`, `staff_id` FROM `sakila`.`rental`
WHERE `rental_date` > '2005-05-25'
ORDER BY `inventory_id`, `customer_id`;
*************************** 1. row *************************** 
...
key: NULL Extra: Using where; using filesort
...

当然rental_date='2005-05-25' possible_keys=rental_date

有个答复如下：一目了然

Let's say that the index is like this:
rental_date, inventory_id, customer_id
======================================
...
2005-05-25, 10, 10
2005-05-25, 20, 20
2005-05-25, 30, 30
2005-05-26, 20, 20
2005-05-26, 40, 40
2005-05-27, 10, 10
2005-05-27, 30, 30
...
In the first query mysql uses the index to find first entry '2005-05-25'. Since rental_date is the first field of the index, the other fields (inventory_id, customer_id) for the same date value are in sorted order.

But in the second query imagine what happens with 2005-05-26, 2005-05-27. Suddenly, the inventory_id, customer_id are not sorted. They are sorted only for specific rental_date values. So mysql has to sort them finally.

看明白了嘛？（背景：索引的底层是一棵B+树，联合索引的键值数量不是一个是多个，那么就是MySQL说的类似hash的概念，依据联合索引最左的字段来建立B+树）就是对于联合索引的a.b,c 来说：因为联合索引首先是按a排序的，b是无序的.在a值相等的情况下，b值又是按顺序排列的,在b下c是有序的.以此类推，这个顺序是相对的。对应上面的例子rental_date='2005-05-25'，因为在a值确定的情况下b是相对有序的，而`rental_date` > '2005-05-25' and b=xx，a字段可以匹配上索引，但b值不可以，因为a的值是一个范围，在这个范围中b是无序的。

同理，这样也能解释为什么直接b=3 索引无效，只能扫全表（因为联合索引首先是按a排序的，b是无序的。）

我们要理解索引的目的是为了减少遍历总数，将数据顺序集中减少随机存取，从这个点考虑，MySQL的一些索引优化：索引下推或者索引合并也就容易理解。

索引会失效的多种情况，有大神整理了，了解这些情况，有助于我们建索引及优化SQL。

以下图片来自：https://segmentfault.com/a/1190000021464570

preview

MySQL真是博大精深，吐血看了一晚上，我已经放弃治疗了，希望对你有所收获，喜欢就支持下。

参考：

https://tech.meituan.com/2014/06/30/mysql-index.html

https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

https://stackoverflow.com/questions/28748724/mysql-range-condition-and-order-by

https://segmentfault.com/a/1190000021464570

bohu83

关注

3
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
MySQL 最左原则的理解

先看下规则：一下是美团的技术建索引的几大原则1.最左前缀匹配原则，非常重要的原则，mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配，比如a = 1 and b = 2 and c > 3 and d = 4 如果建立(a,b,c,d)顺序的索引，d是用不到索引的，如果建立(a,b,d,c)的索引则都可以用到，a,b,d的顺序可以任意调整。2.=和in可以乱序，比如a = 1 and b = 2 and c = 3 建立(a,b,c)索引可
复制链接

扫一扫

专栏目录