mysql 索引_Mysql聚簇索引

最新推荐文章于 2023-04-25 09:09:05 发布

weixin_39693295

最新推荐文章于 2023-04-25 09:09:05 发布

阅读量160

点赞数

文章标签： mysql 索引 mysql 聚簇索引和非聚簇索引 mysql创建聚簇索引 mysql索引聚簇索引和非聚簇索引的区别

本文链接：https://blog.csdn.net/weixin_39693295/article/details/111277936

版权

聚集索引，也可以叫聚簇索引，只是不同的说法，它的英文原名叫做clustered['klʌstəd] index,下面来看看官方文档是如何描述它的：

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

(

每一个使用了InnoDB的表都有一个特殊的索引：clustered index，并且行中的数据就存储在其中。clustered index和主键同义，您必须要理解InnoDB是如何使用clustered index来优化很多操作的

当你在表中定义了一个主键，InnoDB将其用作clustered index，如果你的表是在没有逻辑上唯一、非空的列，InnoDB还是建议你创建一个主键
如果你没有在表上定义主键，Mysql将会寻找第一个所有列都不为空的唯一索引，InnoDB会将其当作clustered index
如果你的表既没有定义主键，也没有唯一索引，那么在InnoDB内部会生成一个合成的列，该列是一个单调递增的ID，与之对应的是一个隐藏的clustered index

)

由此可见，clustered index非常重要，即便你没有指定，InnoDB也会默认给你生成一个，那clustered index的物理结构是怎么样的呢？

All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB.

(不管是什么索引，只要是InnoDB引擎中的索引，都是B树，只有叶子结点可以存放记录，其实就是B+树，索引页的默认大小是16KB)

不知道你有没有注意到，上面一句话有一个词index page，也就是说，page不仅仅是用来存放行记录的，也是可以用来存储索引指针的，还记得吗，我们之前说的数据页，索引页，数据段，索引段，最终都指的是页的不同用途，在官方文档也描述了一个page都包含什么内容：

An InnoDB page has seven parts(一个InnoDB页一共有七部分):

Fil Header：指向前一个页的指针、指向后一个页的指针
Page Header：各种标志位
Infimum + Supremum Records：该页记录的开头和结束
User Records：用户记录
Free Space：空闲空间
Page Directory：对当前页的内容进行排序，排序后的结果指向User Records，方便二分查找该页的内容
Fil Trailer：完整性校验

现在，我们可以大致画一下clustered index在逻辑上的样子：

P是指针的意思
最上面一层有一个根索引页，第二层有两个索引页，第三层是四个数据页(每个数据页中有两条数据)
每个页内容都不全，如上文所说，一个页一共有7个部分，这里只画出了关键的指针和数据
除了顶层，每个层级都是双向链表可以访问彼此

How the Clustered Index Speeds Up Queries(聚簇索引是如何加速查询的)

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

(通过聚簇索引查找行数据非常快，因为索引可以直接定位到行数据所在的数据页，如果一张表非常大，聚簇索引会节约很多io操作)

举个例子，在上面画的聚簇索引逻辑图，假如我们要查询id=5的记录，那么首先根据根索引发现该记录需要从下一层级的右侧索引页去找，找到第二层级的右侧索引页后又可以定位到该记录所在的数据页，到了该记录所在的数据页后，通过该数据页的Page Directory进行二分查找立马就找到了id=5的记录，而不用逐行遍历，节省了很多次io操作。