InnoDB 架构

懒惰的星期六

已于 2023-02-17 23:16:56 修改

阅读量214

点赞数

文章标签：架构 java 开发语言

于 2023-02-17 19:45:09 首次发布

原文链接：https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html

版权

version 5.7

下图显示了组成 InnoDB 存储引擎体系结构的内存结构（In-memory structures）和磁盘结构（On-Disk Structures）。

图1 InnoDB 架构

1. InnoDB In-Memory Structures（内存结构）

InnoDB 内存结构包括：Buffer Pool，Change Buffer，Adaptive Hash Index，Log buffer。

1.1 Buffer Pool

缓冲池是 InnoDB 缓存表和索引数据的内存区域。缓冲池允许直接从内存访问频繁使用的数据，从而加快处理速度。在专用服务器上，通常会将高达 80% 的物理内存分配给缓冲池。

为了提高大容量读取操作的效率，缓冲池被分成可能包含多行（row）的页面（page）。缓冲池由页面链表实现。使用最近使用次数最少（LRU）算法的变体将很少使用的数据从缓存中老化。

1.1.1 缓冲池 LRU 算法

缓冲池使用 LRU 算法的变体作为列表进行管理。当需要空间将新页面添加到缓冲池时，最近最少使用的页面将被驱逐(evicted)，并将新页面添加到列表的中间。这种中点插入策略将列表视为两个子列表：

在头部，最近访问的新（new/young）页面的子列表（At the head, a sublist of new (“young”) pages that were accessed recently）
在尾部，最近较少访问的旧页面的子列表（At the tail, a sublist of old pages that were accessed less recently）

图2 缓冲池列表（Buffer Pool List）

该算法将经常使用的页面保留在新的子列表（New Sublist）中。旧的子列表包含不常用的页面；这些页面是驱逐的候选者。

默认情况下，算法运行如下：

缓冲池的 3/8 专门用于旧子列表（Old Sublist）。
列表的中点是新子列表的尾部与旧子列表的头部相交的边界。
当InnoDB将页面读入缓冲池时，它最初将其插入到中点（旧子列表的头部，the head of the old sublist）。用户启动的操作（例如 SQL 查询）或者 InnoDB 自动预读（read-ahead）操作可以读取页面。
访问旧子列表中的页面使其成为 “新”页面，将其移动到新子列表的头部。
如果某页面是因为用户启动的操作需要它而被读取的，则第一次访问会使该页面成为新页面。
如果页面是由于预读操作而被读取的，则不会。
随着数据库的运行，缓冲池中未访问的页面通过向列表的尾部移动而老化 。随着其他页面的更新，新旧子列表中的页面都会老化。随着在中点插入页面，旧子列表中的页面也会老化。最终，未使用的页面到达旧子列表的尾部并被逐出。
(问：页面到达新子列表的尾部，再次老化是否会变为旧子列表的头部？)

默认情况下，查询读取的页面会立即移动到新的子列表中，这意味着它们在缓冲池中停留的时间更长。例如，mysqldump 操作或没有 where 子句的 select 语句可以将大量数据带入缓冲池并驱逐等量的旧数据，即使新数据不再使用也是如此。同样，由预读后台线程加载且仅访问一次的页面将移至新列表的头部。

1.1.2 使用 InnoDB 标准监视器监视缓冲池

InnoDB 可以使用命令 SHOW ENGINE INNODB STATUS 查看有关缓冲池操作的指标。
BufferPool 指标位于 BUFFER POOL AND MEMORY 部分。

InnoDB 缓冲池指标 InnoDB 缓冲池指标

Name	Description
Total memory allocated	The total memory allocated for the buffer pool in bytes.
Dictionary memory allocated	The total memory allocated for the `InnoDB` data dictionary in bytes.
Buffer pool size	The total size in pages allocated to the buffer pool.
Free buffers	The total size in pages of the buffer pool free list.
Database pages	The total size in pages of the buffer pool LRU list.
Old database pages	The total size in pages of the buffer pool old LRU sublist.
Modified db pages	The current number of pages modified in the buffer pool.
Pending reads	The number of buffer pool pages waiting to be read into the buffer pool.
Pending writes LRU	The number of old dirty pages within the buffer pool to be written from the bottom of the LRU list.
Pending writes flush list	The number of buffer pool pages to be flushed during checkpointing.
Pending writes single page	The number of pending independent page writes within the buffer pool.
Pages made young	The total number of pages made young in the buffer pool LRU list (moved to the head of sublist of “new” pages).
Pages made not young	The total number of pages not made young in the buffer pool LRU list (pages that have remained in the “old” sublist without being made young).
youngs/s	The per second average of accesses to old pages in the buffer pool LRU list that have resulted in making pages young. See the notes that follow this table for more information.
non-youngs/s	The per second average of accesses to old pages in the buffer pool LRU list that have resulted in not making pages young. See the notes that follow this table for more information.
Pages read	The total number of pages read from the buffer pool.
Pages created	The total number of pages created within the buffer pool.
Pages written	The total number of pages written from the buffer pool.
reads/s	The per second average number of buffer pool page reads per second.
creates/s	The average number of buffer pool pages created per second.
writes/s	The average number of buffer pool page writes per second.
Buffer pool hit rate	The buffer pool page hit rate for pages read from the buffer pool vs from disk storage.
young-making rate	The average hit rate at which page accesses have resulted in making pages young. See the notes that follow this table for more information.
not (young-making rate)	The average hit rate at which page accesses have not resulted in making pages young. See the notes that follow this table for more information.
Pages read ahead	The per second average of read ahead operations.
Pages evicted without access	The per second average of the pages evicted without being accessed from the buffer pool.
Random read ahead	The per second average of random read ahead operations.
LRU len	The total size in pages of the buffer pool LRU list.
unzip_LRU len	The length (in pages) of the buffer pool unzip_LRU list.
I/O sum	The total number of buffer pool LRU list pages accessed.
I/O cur	The total number of buffer pool LRU list pages accessed in the current interval.
I/O unzip sum	The total number of buffer pool unzip_LRU list pages decompressed.
I/O unzip cur	The total number of buffer pool unzip_LRU list pages decompressed in the current interval.

1.2 Chang Buffer

Chang Buffer 是一种特殊的数据结构，当这些页面不在 Buffer Pool 时，它会缓存对二级索引页面的更改。缓冲的更改可能由 Insert，Delete 或 Update 操作 (DML) 引起，稍后当其他读取操作将页面加载到缓冲池中时，这些更改将被合并。

图3 Change Buffer

与聚集索引不同，二级索引通常是非唯一的，并且以相对随机的顺序插入二级索引。同样，删除和更新可能会影响索引树中不相邻的二级索引页。

1.2.1 change buffer的数据什么时候会同步出去：

当其他操作将受影响的页面读入缓冲池时，稍后合并缓存的更改可避免将二级索引页面从磁盘读入缓冲池所需的大量随机访问 I/O。(访问这个数据页时、执行合并 merge 操作合并修改的页面，后面同步到磁盘)
在系统大部分空闲时，后台线程处理。
缓慢关闭期间运行的清除操作会定期将更新的索引页写入磁盘。与立即将每个值写入磁盘相比，清除操作可以更有效地为一系列索引值写入磁盘块。

当有许多受影响的行和许多二级索引要更新时，更改缓冲区合并可能需要几个小时。在此期间，磁盘 I/O 增加，这可能会导致磁盘绑定查询显着减慢。更改缓冲区合并也可能在事务提交后继续发生，甚至在服务器关闭并重新启动后（有关更多信息，请参阅第 14.22.2 节，“强制 InnoDB 恢复” ）。

在内存中，Change Buffer 占据 Buffer Pool 的一部分。在磁盘上，Change Buffer 是系统表空间的一部分，当数据库服务器关闭时，索引更改将被缓冲。

因为它可以减少磁盘读写，所以 Change Buffer 对于受 I/O 限制的工作最有价值；例如，具有大量 DML 操作（如批量插入）的应用程序受益于更改缓冲。

但是，Change Buffer 占用了 Buffer Pool 的一部分，减少了可用于缓存数据页（data pages）的内存。
如果工作集几乎适合 Buffer Pool ，或者如果您的表的二级索引相对较少，则禁用更改缓冲可能会有用。
如果工作数据集完全适合 Buffer Pool，则 Change Buffer 不会造成额外开销，因为它仅适用于不在缓冲池中的页面。

1.2.2 相关配置参数

innodb_change_buffering

InnoDB 您可以为插入 insert、删除 delete 操作（当索引记录最初被标记为删除时）和清除 purge 操作（当索引记录被物理删除时）启用或禁用缓冲。更新 update 操作是插入和删除的组合。默认 innodb_change_buffering 值为 all。

允许的innodb_change_buffering 值包括：

all 默认值：缓冲区插入、删除标记操作和清除。
none 不要缓冲任何操作。
inserts 缓冲区插入操作。
deletes 缓冲区删除标记操作。
changes 缓冲插入和删除标记操作。
purges 缓冲在后台发生的物理删除操作。

配置时机：可以在 MySQL 选项文件 (my.cnf 或 my.ini) 中设置参数或使用语句动态更改它 SET GLOBAL。

innodb_change_buffer_max_size

该 innodb_change_buffer_max_size 变量允许将更改缓冲区的最大大小配置为缓冲池总大小的百分比。默认情况下， innodb_change_buffer_max_size设置为 25。最大设置为 50。

考虑在具有大量插入、更新和删除活动的 MySQL 服务器上增加 innodb_change_buffer_max_size，其中更改缓冲区合并跟不上新的更改缓冲区条目，导致更改缓冲区达到其最大大小限制。

innodb_change_buffer_max_size考虑在具有用于报告的静态数据的 MySQL 服务器上减少，或者如果更改缓冲区消耗过多与缓冲池共享的内存空间，导致页面比预期更快地从缓冲池中老化。

配置时机：使用具有代表性的工作负载测试不同的设置以确定最佳配置。该 innodb_change_buffer_max_size 变量是动态的，允许在不重新启动服务器的情况下修改设置。

1.2.3 监视更改缓冲区

查看监视器数据，使用 SHOW ENGINE INNODB STATUS 语句。

Change Buffer 状态信息位于 INSERT BUFFER AND ADAPTIVE HASH INDEX 标题下。

1.3 Adaptive Hash Index

自适应哈希索引是针对经常访问的索引页面按需构建的。

LIKE 带有运算符和通配符的查询 % 也往往不会受益。（对于无法从自适应哈希索引中获益的工作负载，将其关闭可减少不必要的性能开销。由于很难提前预测自适应哈希索引功能是否适合特定系统和工作负载，因此请考虑在启用和禁用它的情况下运行基准测试。）

在 MySQL 5.7 中，自适应哈希索引特性被分区。每个索引都绑定到一个特定的分区，每个分区都由一个单独的锁存器保护。分区由变量控制 innodb_adaptive_hash_index_parts。默认情况下，该 innodb_adaptive_hash_index_parts 变量设置为 8。最大设置为 512。

SEMAPHORES您可以在输出部分监视自适应哈希索引的使用和争用 SHOW ENGINE INNODB STATUS。如果有大量线程在等待 btr0sea.c 创建的 rw-latches，请考虑增加自适应哈希索引分区的数量或禁用自适应哈希索引。

1.4 Log Buffer

Log Buffer 是存储要写入磁盘上日志文件的数据的内存区域。
Log Buffer 大小由变量 innodb_log_buffer_size 定义。默认大小为 16MB。日志缓冲区的内容会定期刷新到磁盘。
大型日志缓冲区使大型事务无需在事务提交之前将重做日志数据写入磁盘即可运行。因此，如果您有更新、插入或删除许多行的事务，增加日志缓冲区的大小可以节省磁盘 I/O。

该 innodb_flush_log_at_trx_commit 变量控制日志缓冲区的内容如何写入和刷新到磁盘。该 innodb_flush_log_at_timeout 变量控制日志刷新频率。

2 InnoDB On-Disk Structures

2.1 系统表空间（System tablespace）

System tablespace 包括 InnoDB data dictionary, doublewrite buffer, change buffer, 和 undo logs。如果表是在系统表空间而不是 file-per-table 或通用表空间中创建的，它还可能包含表和索引数据。(It may also contain table and index data if tables are created in the system tablespace rather than file-per-table or general tablespaces.)
系统表空间数据文件的大小和数量由 innodb_data_file_path 启动选项定义。有关配置信息，请参阅系统表空间数据文件配置。

2.1.1 调整系统表空间

2.1.1.1 增加系统表空间的大小

方法一

增加系统表空间大小的最简单方法是将其配置为自动扩展。
为此，请 autoextend 在设置中指定最后一个数据文件的属性 innodb_data_file_path ，然后重新启动服务器。例如：

innodb_data_file_path=ibdata1:10M:autoextend

指定该 autoextend 属性后，数据文件的大小会根据需要的空间自动增加 8MB。该 innodb_autoextend_increment 变量控制增量大小。

方法二

还可以通过添加另一个数据文件来增加系统表空间的大小。为此：

停止 MySQL 服务器。
如果设置中的最后一个数据文件 innodb_data_file_path 是用该 autoextend 属性定义的，则将其删除，并修改大小属性以反映当前数据文件的大小。要确定要指定的适当数据文件大小，请检查文件系统的文件大小，并将该值向下舍入到最接近的 MB 值，其中 MB 等于 1024 x 1024 字节。
将新数据文件附加到 innodb_data_file_path 设置中，可选择指定 autoextend 属性。autoextend 只能为设置中的最后一个数据文件指定属性 innodb_data_file_path 。
启动 MySQL 服务器。

例如，这个表空间有一个自动扩展的数据文件：

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:10M:autoextend

假设数据文件随时间增长到 988MB。然后通过 innodb_data_file_path 修改 size 属性以反映当前数据文件大小并指定新的 50MB 自动扩展数据文件后的设置：

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend

添加新数据文件时，不要指定现有文件名。InnoDB在启动服务器时创建并初始化新的数据文件。

笔记
您不能通过更改其大小属性来增加现有系统表空间数据文件的大小。例如，将innodb_data_file_path 设置从 ibdata1:10M:autoextend 更改为 ibdata1:12M:autoextend 在启动服务器时产生以下错误：
[ERROR] [MY-012263] [InnoDB] The Auto-extending innodb_system
data file './ibdata1' is of a different size 640 pages (rounded down to MB) > than
specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) > pages!
该错误表明现有数据文件大小（以InnoDB页数表示）与配置文件中指定的数据文件大小不同。如果遇到此错误，请恢复之前的 innodb_data_file_path 设置，并参考系统表空间调整说明。

2.1.1.2 减小 InnoDB 系统表空间的大小

不能从系统表空间中删除数据文件。要减小系统表空间大小，请使用以下过程：

使用 mysqldump 转储所有 InnoDB表，包括 InnoDB位于 mysql 模式中的表。使用以下查询识别 mysql 中的 InnoDB 表：

mysql> SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql' and ENGINE='InnoDB';
+---------------------------+
| TABLE_NAME                |
+---------------------------+
| engine_cost               |
| gtid_executed             |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| innodb_index_stats        |
| innodb_table_stats        |
| plugin                    |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
+---------------------------+

停止服务器。
删除所有现有的表空间文件 ( *.ibd)，包括 ibdata和ib_log 文件。不要忘记删除*.ibd 位于 mysql schema 中的表的文件。
删除 InnoDB 表的所有 .frm 文件。
为新的系统表空间配置数据文件。请参阅系统表空间数据文件配置。
重新启动服务器。
导入转储文件。

笔记

如果您的数据库只使用该InnoDB 引擎，转储所有数据库、停止服务器、删除所有数据库和InnoDB日志文件、重新启动服务器并导入转储文件可能会更简单。

为避免系统表空间过大，请考虑为您的数据使用 file-per-table 表空间或 general tablespaces 通用表空间。File-per-table 表空间是默认的表空间类型，在创建表时隐式使用InnoDB 。与系统表空间不同，file-per-table 表空间在被 truncated or dropped 时将磁盘空间返回给操作系统。有关详细信息，请参阅 “File-Per-Table 表空间”。通用表空间是 multi-table tablespaces 多表表空间，也可以用作系统表空间的替代。请参阅 “通用表空间”。