InnoDB四大特性------Change Buffer、Double Write、Adaptive Hash Index、Read Ahead

最新推荐文章于 2024-08-12 21:16:18 发布

jzzw

最新推荐文章于 2024-08-12 21:16:18 发布

阅读量1.4k

点赞数

分类专栏： Mysql

本文链接：https://blog.csdn.net/u014710633/article/details/89501069

版权

Mysql 专栏收录该内容

44 篇文章 2 订阅

订阅专栏

INNODB特性：

1.Change Buffer：
The change buffer is a special data structure that caches changes to secondary index pages when those pages are not in the buffer pool.The buffered changes, which may result from INSERT, UPDATE or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.
Change Buffer，是In-Memory Structures，Buffer Pool的一部分，用来缓存对那些没在buffer pool中的二级索引页(Secondary Index Pages)做的修改(DML),当这些页被其他读操作加载到buffer pool的时候，change buffer里面的changes会去与其进行merge;
存的是二级索引的改变，有点类似于Oracle 的 redo log buffer，存的是changes不是changed pages

Unlike clustered indexes, secondary indexes are usually nonunique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time,when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read secondary index pages into the buffer pool from disk.
区别于聚簇索引,二级索引通常是不唯一、插入顺序随机的,当被修改的页通过其他读操作被读取到buffer pool后，再对其进行合并修改，可以避免大量的磁盘随机I/O
假设没有change buffer，当修改涉及到二级索引时，我们需要在当前DML操作中把相关页从磁盘读上来，然后在buffer pool中对页进行修改，change buffer类似对二级索引进行了延迟修改，可以减少一次磁盘I/O

Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown,writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.
当系统空闲时或者关闭过程中，purge操作会周期性的把被修改的页写入磁盘，相比于修改后立刻写盘，purge的这种批量索引修改的操作效率更会高

Change buffer merging may take several hours when there are many affected rows and numerous secondary indexes to update. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed, and even after a server shutdown and restart.
如果修改涉及到的数据量很大的话，merge操作可能会持续几个小时，在这期间，磁盘的I/O的增加，从而可能会引起其他需要磁盘I/O的查询的性能的降低。在事务提交后和实例重启后，可能会继续出现merge操作。

In memory, the change buffer occupies part of the buffer pool. On disk, the change buffer is part of the system tablespace, where index changes are buffered when the database server is shut down.
在内存中，change buffer会占用buffer pool的空间; 数据库停止时，change buffer里的数据会保存到system tablespace里，所以Change buffer merging may also continue to occur even after a server shutdown and restart.

提问：如果MySQL异常关闭，change buffer里面的修改是不是就丢失了。
change buffer里的数据会保存到system tablespace里，这种保存应该不是只是在数据库停止的时候才保存，而是和普通的数据文件一样，通过系统进程写入进行持续性保存(类似于Oracle的checkpoint?)

Change Buffer的大小通过参数 innodb_buffer_pool_size 和 innodb_change_buffer_max_size 共同控制;
innodb_buffer_pool_size控制buffer pool的大小，innodb_change_buffer_max_size控制change pool在buffer pool里的最大占比，默认25，最大为50

innodb_change_buffering控制change buffer的影响范围，包括all、none、inserts、deletes、changes、purges，默认 all

对于唯一二级索引(unique key)，由于索引记录具有唯一性，因此无法缓存插入操作，但可以缓存删除操作；

http://mysql.taobao.org/monthly/2015/07/01/

2.Double Write

Doublewrite Buffer 存在的意义：
为了解决页断裂(partial write)的问题，即当数据库异常宕机时，数据库页只有部分写入磁盘，导致页面出现不一致的情况，因为page已经损坏，所以就无法定位到page中的事务号，这个时候redo就无法直接恢复。
数据库，OS和磁盘读写的基本单位是块，也可以称之为(page size)block size。我们知道数据库的块一般为8K，16K；而OS的块则一般为4K；IO块则更小，linux内核要求IO block size<=OS block size。磁盘IO除了IO block size，还有一个概念是扇区(IO sector)，扇区是磁盘物理操作的基本单位，而IO 块是磁盘操作的逻辑单位，一个IO块对应一个或多个扇区，扇区大小一般为512个字节。所以各个块大小的关系可以梳理如下：
DB block > OS block >= IO block > 磁盘 sector，而且他们之间保持了整数倍的关系。

The doublewrite buffer is a storage area located in the system tablespace where InnoDB writes pages that are flushed from the InnoDB buffer pool, before the pages are written to their proper positions in the data file.
Only after flushing and writing pages to the doublewrite buffer, does InnoDB write pages to their proper positions. If there is an operating system, storage subsystem, or mysqld process crash in the middle of a page write, InnoDB can later find a good copy of the page from the doublewrite buffer during crash recovery.
If there is an operating system, storage subsystem, or mysqld process crash in the middle of a page write, InnoDB can later find a good copy of the page from the doublewrite buffer during crash recovery.

Doublewrite Buffer是位于系统表空间里的一块存储区域，分为2个单元，128个页，其中120个用于批量刷脏数据，另外8个用于Single Page Flush，一般是2MB，用于存放那些已经从buffer pool中刷出来，将要写到对应数据文件里面的页。
做区分的原因是批发刷脏是后台线程做的，不影响前台线程。而Single page flush是用户线程发起的，需要尽快的刷脏并替换出一个空闲页出来。

buffer pool里面也有个Doublewrite Buffer

数据流向：buffer pool---> doublewrite buffer ---> datafile，如果出现crash的话，doublewrite buffer可以为crash recovery提供数据的完整性

Although data is always written twice, the doublewrite buffer does not require twice as much I/O overhead or twice as many I/O operations. Data is written to the doublewrite buffer itself as a large sequential chunk, with a single fsync() call to the operating system.
虽然数据是写两次，但是doublewrite buffer不需要2倍的I/O负载或者2倍的I/O次数。数据是作为一个大的连续的块，通过操作系统的一个fsync()请求写入doublewrite buffer

The doublewrite buffer is enabled by default in most cases. To disable the doublewrite buffer, set innodb_doublewrite to 0.
doublewrite buffer大部分情况下默认开始。如果想禁用doublewrite buffer，将innodb_doublewrite设置成0

https://yq.aliyun.com/articles/80423

3.Adaptive Hash Index
Based on the observed pattern of searches, a hash index is built using a prefix of the index key. The prefix can be any length, and it may be that only some values in the B-tree appear in the hash index.Hash indexes are built on demand for the pages of the index that are accessed often.
If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup of any element, turning the index value into a sort of pointer. InnoDB has a mechanism that monitors index searches. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.
In MySQL 5.7, the adaptive hash index feature is partitioned. Each index is bound to a specific partition, and each partition is protected by a separate latch. Partitioning is controlled by the innodb_adaptive_hash_index_parts variable.
In earlier releases, the adaptive hash index feature was protected by a single latch which could become a point of contention under heavy workloads. The innodb_adaptive_hash_index_parts variable is set to 8 by default. The maximum setting is 512.

You can monitor adaptive hash index use and contention in the SEMAPHORES section of SHOW ENGINE INNODB STATUS output. If there are numerous threads waiting on RW-latches created in btr0sea.c,consider increasing the number of adaptive hash index partitions or disabling the adaptive hash index feature.

#InnoDB Adaptive Hash Index# 定义
维护索引叶页面中所有记录的索引键值(或键值前缀)到索引叶页面位置的Hash映射关系，能够根据索引键值(前缀)快速定位到叶页面满足条件记录的Offset，减少了B+树Search Path的代价，将B+树从Root页面至Leaf页面的路径定位，优化为Hash Index的快速查询。

#InnoDB Adaptive Hash Index# 使用
Adaptive Hash Index是针对B+树Search Path的优化，因此所有会涉及到Search Path的操作，均可使用此Hash索引进行优化，这些可优化的操作包括：Unique Scan/Range Scan(Locate First Key Page)/Insert/Delete/Purge等等，几乎涵盖InnoDB所有的操作类型。

#InnoDB Adaptive Hash Index# 维护
Adaptive，意味着不是所有的叶页面都会以Hash索引维护，叶页面进入Hash索引的条件是：同种类型的操作(Scan/Insert…)，命中同一叶页面的次数，超过此页面记录数量的1/16，则可将当前叶页面加入Hash索引，用以优化后续可能的相同Search Path。

http://hedengcheng.com/?p=458

4.Read Ahead
A read-ahead request is an I/O request to prefetch multiple pages in the buffer pool asynchronously,in anticipation that these pages will be needed soon. The requests bring in all the pages in one extent.
预读是一个I/O请求，该请求可以异步预取多个页面到缓冲池中，期待这些页即将被用到。这个请求会把一个extent里的所有页都取过来

InnoDB uses two read-ahead algorithms to improve I/O performance:
InnoDB提供两种预读算法来提高I/O效率

Linear read-ahead is a technique that predicts what pages might be needed soon based on pages in the buffer pool being accessed sequentially.
You control when InnoDB performs a read-ahead operation by adjusting the number of sequential page accesses required to trigger an asynchronous read request, using the configuration parameter innodb_read_ahead_threshold.
Before this parameter was added, InnoDB would only calculate whether to issue an asynchronous prefetch request for the entire next extent when it read the last page of the current extent.
The configuration parameter innodb_read_ahead_threshold controls how sensitive InnoDB is in detecting patterns of sequential page access. If the number of pages read sequentially from an extent is greater than or equal to innodb_read_ahead_threshold, InnoDB initiates an asynchronous read-ahead operation of the entire following extent.
innodb_read_ahead_threshold can be set to any value from 0-64. The default value is 56. The higher the value, the more strict the access pattern check.
For example, if you set the value to 48, InnoDB triggers a linear read-ahead request only when 48 pages in the current extent have been accessed sequentially. If the value is 8, InnoDB triggers an asynchronous read-ahead even if as few as 8 pages in the extent are accessed sequentially.
You can set the value of this parameter in the MySQL configuration file, or change it dynamically with InnoDB Buffer Pool Configuration the SET GLOBAL statement, which requires privileges sufficient to set global system variables.
Linear 是基于正在buffer pool种被顺序读取的页来预计接下来可能需要的页
在innodb_read_ahead_threshold这个参数之前，InnoDB的预读仅限于区块间的，当读取当前区段的最后一页时，InnoDB才开始计算是否要为整个下一区段发出异步预取请求
通过innodb_read_ahead_threshold这个参数来设定阈值，以决定何时触发预读操作，如果一个extent里顺序读取的页数超过该阈值，即触发预读操作，启动整个后续区段的异步预读操作。
innodb_read_ahead_threshold取值范围是0-64，默认值56，该参数可以通过Mysql参数文件设置，也可以通过set GLOBAL语句设置

Random read-ahead is a technique that predicts when pages might be needed soon based on pages already in the buffer pool, regardless of the order in which those pages were read. If 13 consecutive pages from the same extent are found in the buffer pool, InnoDB asynchronously issues a request to prefetch the remaining pages of the extent.
To enable this feature, set the configuration variable innodb_random_read_ahead to ON.
Linear 是基于buffer pool中存在的属于同一extent里的连续页的数量来预计接下来可能需要的页，而不管这些页的读取顺序，如果13个连续页存在于buffer pool，InnoDB就异步发出一个请求来预取该extent的剩余页
该特性由innodb_random_read_ahead参数决定

The SHOW ENGINE INNODB STATUS command displays statistics to help you evaluate the effectiveness of the read-ahead algorithm. Statistics include counter information for the following global status variables:
Innodb_buffer_pool_read_ahead、Innodb_buffer_pool_read_ahead_evicted、Innodb_buffer_pool_read_ahead_rnd
This information can be useful when fine-tuning the innodb_random_read_ahead setting