如何合理的使用blob字段

对于一些公司,通常通过mysql blob来存储文件。对于Mysql 55以后,文件格式采用了Barracuda的新格式,这种格式有个明显的功能就是可以压缩每一行,根据key_block_size来选择需要压缩的大小,对于Blob的大文件而言,无疑为mysql使用上有多了一个选择!
来看一个压缩的案例:

mysql> create table comptest(b mediumblob);
Query OK, 0 rows affected (0.05 sec)
mysql>insert into comptest values(repeat('a',1000000));
Query OK, 1 row affected (0.02 sec)
mysql> show table status like "comptest" \G
*************************** 1. row ***************************
           Name: comptest
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1
 Avg_row_length: 1589248
    Data_length:  1589248
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2014-01-15 22:24:31
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)
mysql> alter table comptest row_format=compressed;
Query OK, 1 row affected (0.03 sec)
Records: 1  Duplicates: 0  Warnings: 0
mysql> show table status like "comptest" \G
*************************** 1. row ***************************
           Name: comptest
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 1
 Avg_row_length: 16384
    Data_length:  16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2014-01-15 22:26:44
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED
        Comment: 
1 row in set (0.00 sec)

可以看到在使用mysql5.5的barracuda文件格式的compress之后,发现既然达到100倍左右的压缩率。

在Antelope文件格式下的,存在compact和redundant行格式,Innodb会尝试将全部的行内容存入Innodb page.而且,每个页至少存两行数据,这就限制了每个行不得超过8000byte。当然,新的文件格式,也就是Barracuda格式的
row格式,允许使用压缩的方式来存储。在这种存储格式下,Innodb在row page上存储所有的blob内容,如果blob不能完全存储在page上,innodb将只存储一个20 byte的指针指向真正存储blog内容的地址。这样,你就可
以在一个row page上存储更多的行。在Barracuda存储格式下,blob可以有前缀索引,不在需要blob前缀部分存在row page上(如5.5以前的Antelope格式,对于超过8000byte的blog,只存储前768byte,接着一个20byte的指
针指向剩下的blog的地址,前缀索引部分必须是前768byte字符)。所以,你可以在blob字段上建立前缀索引,而这些blog可以存放在row page以外分配的空间。

So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

为什么不用文件系统存储,而mysql只用存储Link呢?
1.在文件过大的情况下,需要监控innode的使用情况,并根据需要调整innode。
2.在文件夹过多的情况下,需要根据文件的类别将文件放在多个子文件夹下,这样你就要确保你的应用程序合理的管理你的文件夹,比如创建,删除空文件夹等。
3.在文件夹中文件数量过多的情况下,ls操作也会带来一定的影响。
所以有时候使用db来存储文件,使用Innodb来存储文件的有效方式如下:
A) 建立一个table,只有两个字段,一个是自增主键id,一个是blob的值。
B)所有关于这个blob值的元数据信息,存放在另一个单独的表里面,通过id主键来和第一个表关联。
C)当你有复杂查询的时候,只需和元数据信息表进行关联,再通过元数据信息,就能得到blob表中单行信息。
当然,你在设计上述表的时候,需要注意一下几点:
1).对于在Innodb 中通过blob存储的单个大文件,根据主键来查询该blob。如果文件过多过大,导致磁盘是瓶颈,通过行压缩将代价转换到CPU上从而缓解磁盘的压力。
2).一个设计良好的元数据信息表,需要存储blob的所有相关信息。包括是否使用前50个字段作为前缀索引等。
3).对于一个表中有多个blob字段的表,可以考虑通过将这些blob存放到一个列中,通过应用程序来对这个blob进行拆分,减少磁盘碎片。

I agree with you in general, but there are applications where storing tons of files in a filesystem is bad.
I ran into problems having many thousands (millions?) of files, where I ran out of inodes in the filesystem I was using. Besides disk size, I did not want to have to worry about keeping track of inodes nor extra options to mkfs to increase inode counts whenever deploying a new server or increasing disk or logical volume size.
If you split things into multiple subdirectories then you also have to ensure your application(s) maintain those directories — create new, delete old when empty, etc.
Also doing an ‘ls’ on a huge directory will tie things up on your server.
The more efficient thing to do is:
A) create 1 table that has ONLY an id sequence column set as primary index, and 2nd column for the blob. This table could conceivably be in a different database if you so choose.
B) All other metadata about this blob must be held in a separate table, and the id column used to associate rows (1-to-1 relationship).
C) Do your hefty queries (with joins, complex where clauses, etc) ONLY against the metadata table, and when you actually need the data from the blob query just that single row from your blob table.
What you achieve:
1) A single large file in the filesystem for that 1 InnoDB table that holds the blob. Queries against this table will use the primary index, and will only pull from the actual table what you need.
If disk space usage is important, use row compression and pay the CPU price on your DB server, or have your application compress/decompress upon INSERT/SELECT to pay that price on your app. server.
2) A well-designed meta-data table will contain everything you need to know about that blob, and you could design things to even contain a field that is, say, the first 50 bytes of your blob — if you need that for indexing or searching.
If you think you need multiple blob columns, then as Peter suggested, consider combining them into 1 blob and on the application side splitting up the blobs. That might not so easy either, so maybe use multiple “blob-only” tables that have just a few blob columns each — less than 10 he said was problematic.

参考 :http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值