Storage Engine | Permissible Index Types |
---|---|
MyISAM | BTREE |
InnoDB | BTREE |
MEMORY /HEAP | HASH , BTREE |
NDB | HASH , BTREE (see note in text) |
13.2.10.2. Physical Structure of an Index
All InnoDB
indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB. When new records are inserted, InnoDB
tries to leave 1/16 of the page free for future insertions and updates of the index records.
If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full. If the fill factor of an index page drops below 1/2, InnoDB
tries to contract the index tree to free the page.
Changing the page size is not a supported operation and there is no guarantee that InnoDB
will function normally with a page size other than 16KB. Problems compiling or running InnoDB may occur.
A version of InnoDB
built for one page size cannot use data files or log files from a version built for a different page size.
13.2.10.3. Insert Buffering
It is a common situation in database applications that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus, insertions into the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually nonunique, and insertions into secondary indexes happen in a relatively random order. This would cause a lot of random disk I/O operations without a special mechanism used inInnoDB
.
If an index record should be inserted into a nonunique secondary index, InnoDB
checks whether the secondary index page is in the buffer pool. If that is the case, InnoDB
does the insertion directly to the index page. If the index page is not found in the buffer pool, InnoDB
inserts the record to a special insert buffer structure. The insert buffer is kept so small that it fits entirely in the buffer pool, and insertions can be done very fast.
Periodically, the insert buffer is merged into the secondary index trees in the database. Often it is possible to merge several insertions into the same page of the index tree, saving disk I/O operations. It has been measured that the insert buffer can speed up insertions into a table up to 15 times.
The insert buffer merging may continue to happen after the inserting transaction has been committed. In fact, it may continue to happen after a server shutdown and restart (see Section 13.2.6.2, “Forcing InnoDB
Recovery”).
Insert buffer merging may take many hours when many secondary indexes must be updated and many rows have been inserted. During this time, disk I/O will be increased, which can cause significant slowdown on disk-bound queries. Another significant background I/O operation is the purge thread (see Section 13.2.9, “InnoDB
Multi-Versioning”).
13.2.10.4. Adaptive Hash Indexes // 但是不知道怎么验证开启了Hash Index
If a table fits almost entirely in main memory, the fastest way to perform queries on it is to use hash indexes.InnoDB
has a mechanism that monitors index searches made to the indexes defined for a table. If InnoDB
notices that queries could benefit from building a hash index, it does so automatically.
The hash index is always built based on an existing B-tree index on the table. InnoDB
can build a hash index on a prefix of any length of the key defined for the B-tree, depending on the pattern of searches that InnoDB
observes for the B-tree index. A hash index can be partial: It is not required that the whole B-tree index is cached in the buffer pool. InnoDB
builds hash indexes on demand for those pages of the index that are often accessed.
In a sense, InnoDB
tailors itself through the adaptive hash index mechanism to ample main memory, coming closer to the architecture of main-memory databases.
13.2.10.5. Physical Row Structure
The physical row structure for an InnoDB
table depends on the MySQL version and the optional ROW_FORMAT
option used when the table was created. For InnoDB
tables in MySQL 5.0.3 and earlier, only the REDUNDANT
row format was available. For MySQL 5.0.3 and later, the default is to use the COMPACT
row format, but you can use the REDUNDANT
format to retain compatibility with older versions of InnoDB
tables. To check the row format of anInnoDB
table use SHOW TABLE STATUS
.
The compact row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed, compact format is likely to be faster. If the workload is a rare case that is limited by CPU speed, compact format might be slower.
Rows in InnoDB
tables that use REDUNDANT
row format have the following characteristics:
-
Each index record contains a six-byte header. The header is used to link together consecutive records, and also in row-level locking.
-
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte transaction ID field and a seven-byte roll pointer field.
-
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
-
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
-
A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
-
Internally,
InnoDB
stores fixed-length character columns such asCHAR(10)
in a fixed-length format. Before MySQL 5.0.3,InnoDB
truncates trailing spaces fromVARCHAR
columns. -
An SQL
NULL
value reserves one or two bytes in the record directory. Besides that, an SQLNULL
value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. Reserving the fixed space forNULL
values enables an update of the column fromNULL
to a non-NULL
value to be done in place without causing fragmentation of the index page.
Rows in InnoDB
tables that use COMPACT
row format have the following characteristics:
-
Each index record contains a five-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
-
The variable-length part of the record header contains a bit vector for indicating
NULL
columns. If the number of columns in the index that can beNULL
isN
, the bit vector occupiesCEILING(
bytes. (For example, if there are anywhere from 9 to 15 columns that can beN
/8)NULL
, the bit vector uses two bytes.) Columns that areNULL
do not occupy space other than the bit in this vector. The variable-length part of the header also contains the lengths of variable-length columns. Each length takes one or two bytes, depending on the maximum length of the column. If all columns in the index areNOT NULL
and have a fixed length, the record header has no variable-length part. -
For each non-
NULL
variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the two-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column. -
The record header is followed by the data contents of the non-
NULL
columns. -
Records in the clustered index contain fields for all user-defined columns. In addition, there is a six-byte transaction ID field and a seven-byte roll pointer field.
-
If no primary key was defined for a table, each clustered index record also contains a six-byte row ID field.
-
Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
-
Internally,
InnoDB
stores fixed-length, fixed-width character columns such asCHAR(10)
in a fixed-length format. Before MySQL 5.0.3,InnoDB
truncates trailing spaces fromVARCHAR
columns. -
Internally,
InnoDB
attempts to store UTF-8CHAR(
columns inN
)N
bytes by trimming trailing spaces. (WithREDUNDANT
row format, such columns occupy 3 ×N
bytes.) Reserving the minimum spaceN
in many cases enables column updates to be done in place without causing fragmentation of the index page.
User Comments
If you type show innodb status\G, it will show - under section "INSERT BUFFER AND ADAPTIVE HASH INDEX" line like that:
54737 inserts, 12769 merged recs, 3612 merges
From what I have understood, this basically shows that 54 000 records have been inserted, but only 12 000 have been merged into indexes. The trick I have just found is that even after you stop last query to the database, the database still keeps on doing heavy IO, and what is happening then is that the remaining records are merged. The procedure continues until "inserts"=="merged recs". Only after that the IO really stops.
In my case the merging procedure takes about 3hrs after the last query. Even if you stop the database by shutdown command during merging, when you turn it on again, it will continue to merge the rows anyway. Only then it will make it slower, or so it seems from my experience. Dunno why. Also, if you shut down the database and then start it and it will continue to merge records, the "inserts" counter will be zeroed, so you will have not the slightest idea, how long to wait untill it finishes ;-)
Tip: DO NOT shut down the database until you see that everything is merged. Also, keep an eye on the difference between those values. If it grows constantly during normal operation, you are really missing some resources on your computer (probably IO). Also, merging is usually the reason for big IO even when the traffic drops down and you'd expect the database to perform faster, but it does not ;-) In the worst possible scenario (happened to me), the server was working on maximum IO rate, but was completely unusable, all IO went down to merging. This "deadlock" could only be resolved manually by stopping all queries for 6hrs...