HBase 中有一种数据行为叫Compaction,从字面的意思就是数据文件合并,本文对Compaction的目的,控制方法,具体实施过程等几个方面对HBase 的 Compaction 行为进行了介绍。
1. Compaction是什么
合并多个HFile生成一个HFile
Compaction有两种:
Minor Compaction(部分文件合并)
Major Compaction(完整文件合并)
2.为什么要Compaction?
减少HFile文件的个数
提高性能
清除过期和删除数据
3.配置
修改Hbase配置文件可以控制compaction行为
键值 | 默认值 | 意义 |
hbase.regionserver.thread.splitcompactcheckfrequency | 20s | compaction检查周期(0.94.0已经没有这个参数了) |
hbase.hstore.compactionThreshold | 3 | 最小minor compaction的文件个数 |
hbase.hstore.blockingStoreFiles | 7 | Block flush操作的Store个数 |
hbase.hstore.blockingWaitTime | 90s | Block flush操作的等待时间 |
hbase.hstore.compaction.max | 10 | 最大minor compaction的文件个数 |
hbase.hregion.majorcompaction | 1 day | Major compaction的周期 |
4.流程
Compaction是一个Async的过程,可以由客户端发起,也可能是服务器端自己检查发起compaction.
1)客户端发起
Client端:
HBaseAdmin::compaction or majorCompaction
==>HMaster modifyTable
==>RegionManager::startAction
==> put into map regionsToCompact and regionsToMajorCompact
==>Send to HRegionServer
Server端:
HRegionServer::run forward the request to CompactionSplitThread
==>CompactionSplitThread handle the request from queue
==>HRegion::compactStores
==>Do compaction preparations, create the compaction folder
==>HStore::compaction
==>Create a HFile.Writer for writing
==>Create a StoreScanner for major compaction
==>Create a MinorCompactionStoreScanner for minor compaction
==>Scan the scanner and write to the hfile
==>Complete the compaction,delete old files and move the file to store folder
2) Server检查发起
Major compaction:
Major compaction由region server定期检查
==>HRegionServer::MajorCompactionChecker
==>Send the request to CompactionSplitThread
Minor compaction:
Minor compaction由Memstore flush到HDFS前检查
==>MemStoreFlusher::flushRegion
==>Send the request to CompactionSplitThread
原文链接:http://www.spnguru.com/2010/08/%E8%AF%A6%E8%A7%A3hbase-compaction/