Mysql学习_InnoDB On-Disk Structures_INDEX

Indexes

Clustered and Secondary Indexes:

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.
1.When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
2.If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
3.If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

每个InnoDB表都有一个聚簇索引,通常情况下等于表的主键。InnoDB选择聚簇索引如下:
1.如果表上定义了primary key,clustered index=primary key
2.如果没定义primary key,Mysql 定位第一个Unique index(键值非空),将其作为clustered index
3.如果1/2都没有符合的话,InnoDB内部自动生成一个hidden clustered index(GEN_CLUST_INDEX),列为包含row ID值的一个合成列,列长6bytes,列值单调递增

How the Clustered Index Speeds Up Queries:

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. 
If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

clustered index,B+树结构,将主键组织到B+树中,行数据就储存在叶子节点中,遍历索引即可直接获取到行数据

How Secondary Indexes Relate to the Clustered Index

All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

所有的非聚簇索引都叫二级索引。二级索引的叶子节点中存储的是主键的值和二级索引列的值,InnoDB再通过这个主键的值去聚簇索引上检索获取行数据。
如果主键的长度很长的话,二级索引所需要的存储空间也会变得很多。

The Physical Structure of an InnoDB Index:

With the exception of spatial indexes, InnoDB indexes are B-tree data structures. Spatial indexes use R-trees, which are specialized data structures for indexing multi-dimensional data. Index records are stored in the leaf pages of their B-tree or R-tree data structure. The default size of an index page is 16KB.

除了spatial indexes,InnoDB的其他索引都是B-tree 结构,索引记录存放在叶子节点中,默认索引页大小为16K。

When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.
InnoDB performs a bulk load when creating or rebuilding B-tree indexes. This method of index creation is known as a sorted index build. The innodb_fill_factor configuration option defines the percentage of space on each B-tree page that is filled during a sorted index build, with the remaining space reserved for future index growth. Sorted index builds are not supported for spatial indexes. 
An innodb_fill_factor setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth.

当聚簇索引有新记录插入时,InnoDB会尝试保留1/16的pagesize,供以后索引的insert和update使用。如果是顺序写,索引页大约是15/16 full,如果是随机写,索引页是介于1/2 full和15/16 full

If the fill factor of an InnoDB index page drops below the MERGE_THRESHOLD, which is 50% by default.if not specified, InnoDB tries to contract the index tree to free the page. The MERGE_THRESHOLD setting applies to both B-tree and R-tree indexes.

当InnoDB索引页的填充比小于MERGE_THRESHOLD(默认50%),如果没有特殊指定的话,InnoDB会通知index tree 释放该页。

 
相关参数:
innodb_fill_factor:
Dynamic Yes、DEFAULT VALUE 100、Scope Global、Minimum Value 10、Maximum Value 100
it defines the percentage of space on each B-tree page that is filled during a sorted index build, with the remaining space reserved for future index growth.
Actual percentages may vary. The innodb_fill_factor setting is interpreted as a hint rather than a hard limit.
An innodb_fill_factor setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth.
innodb_fill_factor applies to both B-tree leaf and non-leaf pages. It does not apply to external pages used for TEXT or BLOB entries.

定义索引创建时,B树上叶节点和非叶节点的页的保留值(非硬性指标、实际情况可能会变化),参数设置成100时会保留1/16的空间,对外部页无效

Sorted Index Builds:

InnoDB performs a bulk load instead of inserting one index record at a time when creating or rebuilding indexes. This method of index creation is also known as a sorted index build. Sorted index builds are not supported for spatial indexes.

There are three phases to an index build. 
In the first phase, the clustered index is scanned, and index entries are generated and added to the sort buffer. When the sort buffer becomes full, entries are sorted and written out to a temporary intermediate file. This process is also known as a 'run'. 
In the second phase, with one or more runs written to the temporary intermediate file, a merge sort is performed on all entries in the file. 
In the third and final phase, the sorted entries are inserted into the B-tree.

1.扫描聚簇索引,将要创建的索引的条目,先存放在排序缓冲区(sort buffer)。当排序缓冲区写满后,条目将会进行一次排序,并写出到一个临时中间文件
2.经过多次写满排序缓冲区并写出到临时中间文件,在临时中间文件中将会对所有条目进行一次合并排序
3.条目插入到目标B树索引中

Prior to the introduction of sorted index builds, index entries were inserted into the B-tree one record at a time using insert APIs. This method involved opening a B-tree cursor to find the insert position and then inserting entries into a B-tree page using an optimistic insert. 
If an insert failed due to a page being full, a pessimistic insert would be performed, which involves opening a B-tree cursor and splitting and merging B-tree nodes as necessary to find space for the entry. The drawbacks of this “top-down” method of building an index are the cost of searching for an insert position and the constant splitting and merging of B-tree nodes.

以前创建索引时,索引一条条插入B-tree,这个方法涉及到去B树中定位插入位置,进行乐观插入。插入失败的话则进行悲观插入,涉及到拆分和合并B-tree节点,在这方面的耗时是这个方法最大的缺点

Sorted index builds use a “bottom-up” approach to building an index. With this approach, a reference to the right-most leaf page is held at all levels of the B-tree. The right-most leaf page at the necessary B-tree depth is allocated and entries are inserted according to their sorted order. 
Once a leaf page is full,a node pointer is appended to the parent page and a sibling leaf page is allocated for the next insert.
This process continues until all entries are inserted, which may result in inserts up to the root level.
When a sibling page is allocated, the reference to the previously pinned leaf page is released, and thenewly allocated leaf page becomes the right-most leaf page and new default insert location.

Sorted index builds由于事先已经将索引条目排序,插入B树索引时,只需要顺序地往索引页里插入数据,再索引页写满时,再向右分配一个索引页即可,整个索引创建过程中,都是在写入最右的叶子节点,无需去寻找确认插入的位置,也不会引发索引页的分裂合并。

Sorted Index Builds and Redo Logging:

Redo logging is disabled during a sorted index build. Instead, there is a checkpoint to ensure that the index build can withstand a crash or failure. The checkpoint forces a write of all dirty pages to disk.
During a sorted index build, the page cleaner thread is signaled periodically to flush dirty pages to ensure that the checkpoint operation can be processed quickly. Normally, the page cleaner thread flushes dirty pages when the number of clean pages falls below a set threshold. For sorted index builds, dirty pages are flushed promptly to reduce checkpoint overhead and to parallelize I/O and CPU activity.

Sorted Index Builds不会写redo记录,而是用一个检查点来将脏页刷到磁盘。而页清理线程会及时地将索引脏页刷到磁盘,以减少执行检查点所花时间和开销。

Sorted Index Builds and Optimizer Statistics:

Sorted index builds may result in optimizer statistics that differ from those generated by the previous method of index creation. The difference in statistics, which is not expected to affect workload performance, is due to the different algorithm used to populate the index.

Sorted index builds可能会导致优化器统计信息和之前的方法创建的索引的统计信息不一样,但是对性能没影响,因为算法不一样

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值