InnoDB Table and Index Structures

最新推荐文章于 2024-10-08 15:15:04 发布

oneyearlater

最新推荐文章于 2024-10-08 15:15:04 发布

阅读量395

点赞数

文章标签： table buffer header database insert pointers

Storage Engine	Permissible Index Types
`MyISAM`	`BTREE`
`InnoDB`	`BTREE`
`MEMORY`/`HEAP`	`HASH`, `BTREE`
`NDB`	`HASH`, `BTREE` (see note in text)

from http://dev.mysql.com/doc/refman/5.0/en/innodb-table-and-index.html

13.2.10.2. Physical Structure of an Index

All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB. When new records are inserted, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records.

If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full. If the fill factor of an index page drops below 1/2, InnoDB tries to contract the index tree to free the page.

Note

Changing the page size is not a supported operation and there is no guarantee that InnoDB will function normally with a page size other than 16KB. Problems compiling or running InnoDB may occur.

A version of InnoDB built for one page size cannot use data files or log files from a version built for a different page size.

13.2.10.3. Insert Buffering

It is a common situation in database applications that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus, insertions into the clustered index do not require random reads from a disk.

On the other hand, secondary indexes are usually nonunique, and insertions into secondary indexes happen in a relatively random order. This would cause a lot of random disk I/O operations without a special mechanism used inInnoDB.

If an index record should be inserted into a nonunique secondary index, InnoDB checks whether the secondary index page is in the buffer pool. If that is the case, InnoDB does the insertion directly to the index page. If the index page is not found in the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it fits entirely in the buffer pool, and insertions can be done very fast.

Periodically, the insert buffer is merged into the secondary index trees in the database. Often it is possible to merge several insertions into the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.

The insert buffer merging may continue to happen after the inserting transaction has been committed. In fact, it may continue to happen after a server shutdown and restart (see Section 13.2.6.2, “Forcing InnoDB Recovery”).

Insert buffer merging may take many hours when many secondary indexes must be updated and many rows have been inserted. During this time, disk I/O will be increased, which can cause significant slowdown on disk-bound queries. Another significant background I/O operation is the purge thread (see Section 13.2.9, “InnoDB Multi-Versioning”).

Previous / Next / Up / Table of Contents

User Comments

Posted by Marek Kowal on September 30 2004 11:09pm

[Delete] [Edit]

If you type show innodb status\G, it will show - under section "INSERT BUFFER AND ADAPTIVE HASH INDEX" line like that:

54737 inserts, 12769 merged recs, 3612 merges

From what I have understood, this basically shows that 54 000 records have been inserted, but only 12 000 have been merged into indexes. The trick I have just found is that even after you stop last query to the database, the database still keeps on doing heavy IO, and what is happening then is that the remaining records are merged. The procedure continues until "inserts"=="merged recs". Only after that the IO really stops.

In my case the merging procedure takes about 3hrs after the last query. Even if you stop the database by shutdown command during merging, when you turn it on again, it will continue to merge the rows anyway. Only then it will make it slower, or so it seems from my experience. Dunno why. Also, if you shut down the database and then start it and it will continue to merge records, the "inserts" counter will be zeroed, so you will have not the slightest idea, how long to wait untill it finishes ;-)

Tip: DO NOT shut down the database until you see that everything is merged. Also, keep an eye on the difference between those values. If it grows constantly during normal operation, you are really missing some resources on your computer (probably IO). Also, merging is usually the reason for big IO even when the traffic drops down and you'd expect the database to perform faster, but it does not ;-) In the worst possible scenario (happened to me), the server was working on maximum IO rate, but was completely unusable, all IO went down to merging. This "deadlock" could only be resolved manually by stopping all queries for 6hrs...

13.2.10.4. Adaptive Hash Indexes // 但是不知道怎么验证开启了Hash Index

If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes.InnoDB has a mechanism that monitors index searches made to the indexes defined for a table. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.

The hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on the pattern of searches that InnoDB observes for the B-tree index. A hash index can be partial: It is not required that the whole B-tree index is cached in the buffer pool. InnoDB builds hash indexes on demand for those pages of the index that are often accessed.

In a sense, InnoDB tailors itself through the adaptive hash index mechanism to ample main memory, coming closer to the architecture of main-memory databases.

13.2.10.5. Physical Row Structure

The physical row structure for an InnoDB table depends on the MySQL version and the optional ROW_FORMAToption used when the table was created. For InnoDB tables in MySQL 5.0.3 and earlier, only the REDUNDANT row format was available. For MySQL 5.0.3 and later, the default is to use the COMPACT row format, but you can use the REDUNDANT format to retain compatibility with older versions of InnoDB tables. To check the row format of anInnoDB table use SHOW TABLE STATUS.

The compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed, compact format is likely to be faster. If the workload is a rare case that is limited by CPU speed, compact format might be slower.

Rows in InnoDB tables that use REDUNDANT row format have the following characteristics:

Each index record contains a six-byte header. The header is used to link together consecutive records, and also in row-level locking.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte transaction ID field and a seven-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. Before MySQL 5.0.3, InnoDB truncates trailing spaces from VARCHAR columns.
An SQL NULL value reserves one or two bytes in the record directory. Besides that, an SQL NULL value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. Reserving the fixed space for NULL values enables an update of the column from NULL to a non-NULL value to be done in place without causing fragmentation of the index page.

Rows in InnoDB tables that use COMPACT row format have the following characteristics:

Each index record contains a five-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
The variable-length part of the record header contains a bit vector for indicating NULL columns. If the number of columns in the index that can be NULL is N, the bit vector occupies CEILING(N/8) bytes. (For example, if there are anywhere from 9 to 15 columns that can be NULL, the bit vector uses two bytes.) Columns that are NULL do not occupy space other than the bit in this vector. The variable-length part of the header also contains the lengths of variable-length columns. Each length takes one or two bytes, depending on the maximum length of the column. If all columns in the index are NOT NULL and have a fixed length, the record header has no variable-length part.
For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the two-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column.
The record header is followed by the data contents of the non-NULL columns.
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte transaction ID field and a seven-byte roll pointer field.
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
Internally, InnoDB stores fixed-length, fixed-width character columns such as CHAR(10) in a fixed-length format. Before MySQL 5.0.3, InnoDB truncates trailing spaces from VARCHAR columns.
Internally, InnoDB attempts to store UTF-8 CHAR(N) columns in N bytes by trimming trailing spaces. (WithREDUNDANT row format, such columns occupy 3 × N bytes.) Reserving the minimum space N in many cases enables column updates to be done in place without causing fragmentation of the index page.