对于innodb存储大的blob字段,将会对索引、查询性能造成极大的影响(从mysql5.1版本有所改进),以下为关于此的相关原因分析与总结:

1、http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ (转载)

I’m running in this misconception second time in a week or so, so it is time to blog about it.

How blobs are stored in Innodb ? This depends on 3 factors. Blob size; Full row size and Innodb row format.

But before we look into how BLOBs are really stored lets see what misconception is about. A lot of people seems to think for standard (“Antelope”) format first 768 bytes are stored in the row itself while rest is stored in external pages, which would make such blobs really bad. I even seen a solution to store several smaller blobs or varchar fields which are when concatenated to get the real data. This is not exactly what happens :)

With COMPACT and REDUNDANT row formats (used in before Innodb plugin and named “Antelope” in Innodb Plugin and XtraDB) Innodb would try to fit the whole row onto Innodb page. At least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. If row fits completely Innodb will store it on the page and not use external blob storage pages. For example 7KB blob can be stored on the page. However if row does not fit on the page, for example containing two 7KB blobs Innodb will have to pick some of them and store them in external blob pages. It however will keep at least 768 bytes from each of the BLOBs on the row page itself. With two of 7KB blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.

Such decision to store first 768 bytes of the BLOB may look strange, especially as MySQL internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if BLOB is accessed external page will always have to be read. This decision seems to be rooted in desire to keep code simple while implementing initial BLOB support for Innodb – BLOB can have prefix index and it was easier to implement index BLOBs if their prefix is always stored on the row page.

This decision also causes strange data storage “bugs” – you can store 200K BLOB easily, however you can’t store 20 of 10K blobs. Why ? Because each of them will try to store 768 bytes on the row page itself and it will not fit.

Another thing to beware with Innodb BLOB storage is the fact external blob pages are not shared among the blobs. Each blob, even if it has 1 byte which does not fit on the page will have its own 16K allocated. This can be pretty inefficient so I’d recommend avoiding multiple large blobs per row when possible. Much better decision in many cases could be combine data in the single large Blob (and potentially compress it)

If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen. Furthermore depending on column sizes it may vary for different rows. I wish Innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. May be one day we’ll come to implementing this in XtraDB :)

So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

COMPRESSED row format is similar to DYNAMIC when it comes to handling blobs and will use the same strategy storing BLOBs completely off page. It however will always compress blobs which do not fit to the row page, even if KEY_BLOCK_SIZE is not specified and compression for normal data and index pages is not enabled.

If you’re interested to learn more about Innodb row format check out this page in Innodb docs:

It is worth to note I use BLOB here in a very general term. From storage prospective BLOB, TEXT as well as long VARCHAR are handled same way by Innodb. This is why Innodb manual calls it “long columns” rather than BLOBs.


2、mysql 5.1 官网解释:

All data in InnoDB is stored in database pages comprising a B-tree index (the so-called clustered index or primary key index). The essential idea is that the nodes of the B-tree contain, for each primary key value (whether user-specified or generated or chosen by the system), the values of the remaining columns of the row as well as the key. In some other database systems, a clustered index is called an index-organized table. Secondary indexes in InnoDB are also B-trees, containing pairs of values of the index key and the value of the primary key, which acts as a pointer to the row in the clustered index.

There is an exception to this rule. Variable-length columns (such as BLOB and VARCHAR) that are too long to fit on a B-tree page are stored on separately allocated disk (overflow) pages. We call these off-page columns. The values of such columns are stored on singly-linked lists of overflow pages, and each such column has its own list of one or more overflow pages. In some cases, all or a prefix of the long column values is stored in the B-tree, to avoid wasting storage and eliminating the need to read a separate page.

The new Barracuda file format provides a new option (KEY_BLOCK_SIZE) to control how much column data is stored in the clustered index, and how much is placed on overflow pages.


. COMPACT and REDUNDANT Row Formats

Previous versions of InnoDB used an unnamed file format (now called Antelope) for database files. With that format, tables were defined with ROW_FORMAT=COMPACT (or ROW_FORMAT=REDUNDANT) and InnoDB stored up to the first 768 bytes of variable-length columns (such as BLOB and VARCHAR) in the index record within the B-tree node, with the remainder stored on the overflow page(s).

To preserve compatibility with those prior versions, tables created with the InnoDB Plugin use the prefix format, unless one of ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED is specified (or implied) on the CREATE TABLEcommand.

With the Antelope file format, if the value of a column is not longer than 768 bytes, no overflow page is needed, and some savings in I/O may result, since the value is in the B-tree node. This works well for relatively shortBLOBs, but may cause B-tree nodes to fill with data rather than key values, thereby reducing their efficiency. Tables with many BLOB columns could cause B-tree nodes to become too full of data, and contain too few rows, making the entire index less efficient than if the rows were shorter or if the column values were stored off-page.

DYNAMIC Row Format

When innodb_file_format is set to Barracuda and a table is created with ROW_FORMAT=DYNAMIC orROW_FORMAT=COMPRESSED, long column values are stored fully off-page, and the clustered index record contains only a 20-byte pointer to the overflow page.

Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long, InnoDB chooses the longest columns for off-page storage until the clustered index record fits on the B-tree page.

The DYNAMIC row format maintains the efficiency of storing the entire row in the index node if it fits (as do theCOMPACT and REDUNDANT formats), but this new format avoids the problem of filling B-tree nodes with a large number of data bytes of long columns. The DYNAMIC format is predicated on the idea that if a portion of a long data value is stored off-page, it is usually most efficient to store all of the value off-page. With DYNAMIC format, shorter columns are likely to remain in the B-tree node, minimizing the number of overflow pages needed for any given row.

Specifying a Table’s Row Format

The row format used for a table is specified with the ROW_FORMAT clause of the CREATE TABLE and ALTER TABLEcommands. Note that COMPRESSED format implies DYNAMIC format. See Section 3.2, “Enabling Compression for a Table” for more details on the relationship between this clause and other clauses of these commands.

3、http://blog.csdn.net/gtuu0123/article/details/5354822 (转载)

一、innodb行格式
(1)REDUNDANT和COMPACT格式,被命名为“Antelope”
REDUNDANT:MySQL 4.1 and below
COMPACT:MySQL 5.0 and above(默认的)

相关特征:
1)每一个页面至少存储2行,因此如果一行要完全存储在此页面中,那个此行数据限制为8000bytes
2)如果一个带blob列的行的大小小于上述限制,那么此行的所有数据将存储在一个页面中;否则,会将blob列的前768bytes存储在此页面中,其他的数据存储在额外的页面中。假设一行有两个7k的blob列数据,那么会将第一个blob列数据存储在此页面中,第二个blob列的头768bytes存储在此页面中,而第二个blob列中的其他存储在其他页面中。
3)存储blob列的前768bytes字节的原因是:可以很容易的实现blob列前缀索引
4)这个768bytes的决定造成了一个bugs:那是你可以存储200K Blob数据,但是你不能存储一个20字节的数据
5)外部用于存储blob数据的页面不是共享的。假设一个blob列多出了一个字节需要存储,那么将分配一个16K大小的页面,并且这个页面不能被其他的blob列数据使用。所以应当避免同一行中使用多个blob列,建议将多个blob列合并为一个。
6)如果所有的blob列都不适合于页,那么mysql会用外部页面选择其中的一个blob列进行存储,这取决于不同行的列的大小情况。因此,可以不同的行选择不同的blob列来进行外部存储。

缺点:
因为在innodb中一个b-tree结点所存储的是key+row data,所以如果一个页面中能够存储更多的数据,即存储更多的行,那么在搜索时会达到更多的效率。假如有blob列数据,利用上面的存储格式会造成效率的降低。因为在一个页面中如果存储了blob列的数据,那么会造成存储的行数据的减少,因此搜索时的效率会下降。如果要是使blob列的数据用分离的页面存储,那么存储的行数据会更多,搜索效率会更高。

(2)ROW_FORMAT=DYNAMIC(被命名为Barracuda)
相关特征:
1)要么存储所有行数据(包括blob列数据)在一个页面,要么只存储20bytes的指针在页面中,而利用外部页面存储blob列数据。
2)COMPRESSED格式比DYNAMIC更小,其他同DYNAMIC格式

BLOB、TEXT、VARCHAR存储的格式是相同的。因此,上述也适用于TEXT、VARCHAR。

通过以上描述,可以得出:
(1)如果预期blob的数据较少,并且整行的数据可以小于8000bytes,而且查询一般是以单行查询为主,那么用“REDUNDANT和COMPACT格式”比较好。
(2)如果预期blob的数据较多,并且查询经常返回一个范围的结果集,那么用DYNAMIC较好。
(3)使用COMPRESSED会消耗更多的CPU时间。