MySQL的数据存储、索引、碎片整理

最新推荐文章于 2024-01-17 15:07:11 发布

阿湯哥

最新推荐文章于 2024-01-17 15:07:11 发布

阅读量1.4k

点赞数 1

分类专栏： MySQL 文章标签： MySQL 聚簇索引二级索引碎片整理

本文链接：https://blog.csdn.net/ttyy1112/article/details/123853564

版权

MySQL 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

MySQL的数据存储、索引、碎片整理

页（Page）
区（Extent）
段（Segment）
页面与数据行的关系（How Pages Relate to Table Rows）
聚簇索引和二级索引（Clustered and Secondary Indexes）
聚簇索引是如何提升查询性能的？（How the Clustered Index Speeds Up Queries）
聚簇索引与二级索引是如何关联的？（How Secondary Indexes Relate to the Clustered Index）
表碎片整理（Defragmenting a Table）
参考

页（Page）

Each tablespace consists of database pages. Every tablespace in a MySQL instance has the same page size. By default, all tablespaces have a page size of 16KB;

每个表空间都是有数据库页面构成的。一个MySQL实例内的所有表空间的分页大小相等，每个分页的默认大小为16KB。

区（Extent）

The pages are grouped into extents of size 1MB for pages up to 16KB in size (64 consecutive 16KB pages, or 128 8KB pages, or 256 4KB pages). For a page size of 32KB, extent size is 2MB. For page size of 64KB, extent size is 4MB.

连续的页面组成区段。
16KB以下的页面组成的区段大小为1MB（可以是连续64个16KB的页面、连续128个8KB的页面或者连续256个4KB的页面）。
32KB的页面组成的区段大小为2MB（32KB * 64）。
64KB的页面组成的区段大小为4MB（64KB * 64）。

段（Segment）

The “files” inside a tablespace are called segments in InnoDB.
Two segments are allocated for each index in InnoDB. One is for nonleaf nodes of the B-tree, the other is for the leaf nodes. Keeping the leaf nodes contiguous on disk enables better sequential I/O operations, because these leaf nodes contain the actual table data.

InnoDB中表空间中的”文件“被称为段。
InnoDB中为每个索引分配两个段，一个段用于B树的非叶节点，另一个用于叶节点。因为叶子节点包含实际的表数据，保证磁盘上叶子节点的连续性能提供更好的顺序IO操作。

When you delete data from a table, InnoDB contracts the corresponding B-tree indexes. Whether the freed space becomes available for other users depends on whether the pattern of deletes frees individual pages or extents to the tablespace.

当从表中删除数据时，InnoDB会移除其对应的B树索引。释放的空间是否对其他用户可用，取决于删除（数据和索引）时是否释放相应的页和区段给表空间。

Two segments are allocated for each index in InnoDB. One is for nonleaf nodes of the B-tree, the other is for the leaf nodes. Keeping the leaf nodes contiguous on disk enables better sequential I/O operations, because these leaf nodes contain the actual table data.

页面与数据行的关系（How Pages Relate to Table Rows）

The maximum row length is slightly less than half a database page for 4KB, 8KB, 16KB, and 32KB innodb_page_size settings. For example, the maximum row length is slightly less than 8KB for the default 16KB InnoDB page size. For 64KB pages, the maximum row length is slightly less than 16KB.

可以通过 innodb_page_size 设置数据库页面大小。
页面大小为4KB, 8KB, 16KB, and 32KB时，行的最大长度略小于页面大小的一半。例如，如果页面的大小设置为16KB，则行数据的最大长度应略小于8KB。
页面大小为64KB时，行数据的最大长度应略小于16KB。

If a row does not exceed the maximum row length, all of it is stored locally within the page. If a row exceeds the maximum row length, variable-length columns are chosen for external off-page storage until the row fits within the maximum row length limit. External off-page storage for variable-length columns differs by row format:

如果行数据没有超过最大长度，所有的数据都会存储在当前页面内。
如果行数据超过了最大长度，可变长度列的数据会被存储到外部页（external off-page），直到满足行最大长度的限制。
外部页（external off-page）对可变列数据的存储方式取决于行格式：

COMPACT and REDUNDANT Row Formats
When a variable-length column is chosen for external off-page storage, InnoDB stores the first 768 bytes locally in the row, and the rest externally into overflow pages. Each such column has its own list of overflow pages. The 768-byte prefix is accompanied by a 20-byte value that stores the true length of the column and points into the overflow list where the rest of the value is stored.

紧凑和冗余行格式
当一个可变列被选中进行外部页存储时，InnoDB会存储前768个字节在当前页，其余的放入溢出页。
这个列有自己的溢出页列表。
与768字节前缀一同存储的是一个20字节的值，它用来存储该列的实际长度并指向存储剩余数据的溢出页列表。

DYNAMIC and COMPRESSED Row Formats
When a variable-length column is chosen for external off-page storage, InnoDB stores a 20-byte pointer locally in the row, and the rest externally into overflow pages.

动态和压缩行格式
当一个可变列被选中进行外部页存储时，InnoDB会存储前一个20字节的指针在当前页，其余的存储到溢出页中。

LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

聚簇索引和二级索引（Clustered and Secondary Indexes）

Each InnoDB table has a special index called the clustered index that stores row data. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, it is important to understand how InnoDB uses the clustered index to optimize the common lookup and DML operations.

每一个InnoDB表有一个称为聚簇索引的特殊索引用来存储行数据。
通常，聚簇索引等同于主键。
在进行查询、插入及其他数据库操作时，为了获取最好的性能，了解InnoDB利用聚簇索引优化通用查询和DML操作的机制至关重要。

When you define a PRIMARY KEY on a table, InnoDB uses it as the clustered index. A primary key should be defined for each table. If there is no logical unique and non-null column or set of columns to use a the primary key, add an auto-increment column. Auto-increment column values are unique and are added automatically as new rows are inserted.

如果你的表定义了主键，InnoDB会使用主键作为聚簇索引。每张表都应该定义主键。
如果没有逻辑唯一且非空列或列的集合作为主键，则添加一列自增列作为主键。
自增列的值是唯一的，且随着新行的插入自动增加。

If you do not define a PRIMARY KEY for a table, InnoDB uses the first UNIQUE index with all key columns defined as NOT NULL as the clustered index.

如果你没有定义表的主键，InnoDB会使用第一个所有列均为非空的唯一索引作为聚簇索引。

If a table has no PRIMARY KEY or suitable UNIQUE index, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains row ID values. The rows are ordered by the row ID that InnoDB assigns. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in order of insertion.

如果表既没有主键也没有合适的唯一索引，InnoDB会在一个包含row ID的合成列上生成一个名为 GEN_CLUST_INDEX 的隐藏聚簇索引。
所有行按照InnoDB赋予的row ID排序。row ID是一个6字节字段，它的值会随着新行的插入单调递增。因此，按row ID排序其实是按行的物理插入顺序排序。

聚簇索引是如何提升查询性能的？（How the Clustered Index Speeds Up Queries）

Accessing a row through the clustered index is fast because the index search leads directly to the page that contains the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

通过聚簇索引访问行数据特别快是因为索引查询能直接定位到包含数据的页面。
如果表数据量比较大，与使用不同于索引记录页的页面存储行数据的存储方式相比，聚簇索引架构通常能节省IO操作。

聚簇索引与二级索引是如何关联的？（How Secondary Indexes Relate to the Clustered Index）

Indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

非聚簇索引的其他索引被称为二级索引。在InnoDB中，二级索引中的每条记录包含行的主键列及被指定为二级索引的列。
InnoDB使用主键值在聚簇索引中查找行数据。

如果主键过长，二级索引会使用更多的空间，因此使用一个短的主键会更有优势。

表碎片整理（Defragmenting a Table）

Random insertions into or deletions from a secondary index can cause the index to become fragmented. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages, or that there are many unused pages in the 64-page blocks that were allocated to the index.

对二级索引的无序插入或删除操作会导致索引碎片化。
碎片意味着索引页在磁盘上的物理顺序和页面上记录的索引顺序不一致，（如果按索引查询会执行更多的IO操作）
或者存在许多分配给该索引却未被使用的64-页块（64-page blocks）。（索引删除但空间未释放）