1)memstore flush to disk
2)columnfamily's files compaction
3)region split
首先介绍一些概念:hbase一个表(table)会分割为n个region(在建表时可以指定多少个以及每个表的key range,同时也会在运行时split),这些region会均匀分布在集群的regionserver上.一个region(HRegion)下会有一定数量的column family(一个cf称为一个Store,包含一个MemStore),hbase是按列存储,所以column family是其hdfs对应的最细粒度的文件夹,文件夹的名字即是cf的名字,里面躺着一定数量的hfile(称为StoreFile).
名词总结:HTable,HRegion,Store,StoreFile,MemStore
本篇介绍MemStore的flush过程
当一个HRegion下的所有Store里的MemStoreSize大于hbase.hregion.memstore.flush.size时,即会触发一次flush(不是非得立刻触发),这里可以参考http://standalone.iteye.com/blog/889944
flush逻辑需要注意的是,flush是对一个HRegion来定义的,所以flush时,是对其所有的Store里的MemStore进行flush.即有可能会flush出多个文件,对应在不同的Store里
(不过根据需求设定表的结构,我有一个表,设定为每天的数据一个列,每天的put操作99%都是put到一个列里,于是其他的Store的MemStoreSize为0.)
让我们来看看regionserver的log里在flush过程中会记录神马信息:
2011-11-01 00:00:09,737 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0.
2011-11-01 00:00:09,737 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0., current region memstore size 64.0m
2011-11-01 00:00:10,606 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores
2011-11-01 00:00:12,432 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/.tmp/3648164384178649648 to hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/log_31/6967914182982470371
2011-11-01 00:00:12,448 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://peckermaster:9000/hbase-pecker/acookie_log_201110/9ca121717e4e8545d3d6b5806b6dccb0/log_31/6967914182982470371, entries=65501, sequenceid=4492353, memsize=64.4m, filesize=14.9m
2011-11-01 00:00:12,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~64.4m for region acookie_log_201110,\x08,1317949412354.9ca121717e4e8545d3d6b5806b6dccb0. in 2714ms, sequenceid=4492353, compaction requested=true
1)memstore先把数据snapshot到一个buffer
2)然后flush到region下面的.tmp目录,用了一个随机数来命名文件
3)将2)里面的.tmp目录下的临时文件,move到log_31下面,同时用新的随机数重新命名.
4)最后,可以看到,64.4m的数据,经过lzo压缩(我设定的),最后flush到log_31这个Store里,文件名为6967914182982470371,大小为14.9m
总计:在hbase日常运行中,region会不断的经历flush,相应的store里会不断的增多譬如6967914182982470371这种碎小文件
next,如果这种14.9m的文件不断增加,hbase会进一步对这个region下的store做神马呢?请等待下一篇"HBase Region操作实战分析之StoreFile Compaction"
ps:补充flush代码流程
从HRegion的public void put(Put put, Integer lockid, boolean writeToWAL)开始
1)在put函数里,首先调用了private void checkResources() ,此函数主要检查memstoresize是否超过了blockingsize(hbase.hregion.memstore.flush.size * hbase.hregion.memstore.block.multiplier),超过了,则block update,flush先
2)取出put的familyMap,调用private void put(final Map<byte [], List<KeyValue>> familyMap, boolean writeToWAL). 函数里会将familyMap的内容apply
到memstore,并且,之后判断memstore的大小是否超过hbase.hregion.memstore.flush.size , 超过,则触发 HRegion::requesFlush函数
3)HRegion::requesFlush 触发MemStoreFlush::requestFlush函数 ,代码里会把region的flush信息增加到MemStoreFlush的flush队列里
4)MemStoreFlush的队列在run函数里(因为MemStoreFlush是个后台线程)不断的被pop出需要flush的队列,
5)取出一个队列fre之后,调用MemStoreFlush::flushRegion函数,内部调用HRegion::flushcache函数,内部继续调用HRegion::internalFlushcache,此函数是key point,简化之后的代码如下:
funcinternalFlushcache
1) for (Store s : stores.values()) {
storeFlushers.add(s.getStoreFlusher(completeSequenceId));
}
2)
// prepare flush (take a snapshot)
for (StoreFlusher flusher : storeFlushers) {
flusher.prepare();
遍历region所有的store,得到每个store的storeflusher,take a snapshot
3) for (StoreFlusher flusher : storeFlushers) {
flusher.flushCache();
}
StoreFlusher类为StoreFlusherImpl
调用Store.java里的internalFlushCache,执行每个Store的flush
4) for (StoreFlusher flusher : storeFlushers) {
boolean needsCompaction = flusher.commit();
if (needsCompaction) {
compactionRequested = true;
}
}
flusher.commit()里会检查return this.storefiles.size() >= this.compactionThreshold,store里的个数超过配置,
有一个store需要compact,则internalFlushcache返回true.
返回的compactionRequested回溯到HRegion::flushRegion函数,告诉线程是否需要在flush之后进行compact,是的话,调用this.server.compactSplitThread.requestCompaction,进行compact