Hbase的Region Compact算法实现分析

Hbase的Region Compact算法属于一种多路归并的外排算法。这种算法的特点是,待排序文件本身是有序的,同时打开这些文件,顺序遍历并对比它们的首条数据,最后合并输出为一个文件,多个文件遍历时的首条数据用内存堆进行内排。

Hbase在实现该算法的过程中重要的是下面这五个类。
1.org.apache.hadoop.hbase.regionserver.Store
2.org.apache.hadoop.hbase.regionserver.StoreScanner
3.org.apache.hadoop.hbase.regionserver.StoreFileScanner
4.org.apache.hadoop.hbase.regionserver.KeyValueHeap
5.org.apache.hadoop.hbase.regionserver.ScanQueryMatcher

这五个类的关系是
1.Store类调用StoreScanner的next方法,并循环输出kv到合并文件;
2.StoreScanner的作用是负责创建并持有多个输入文件的StoreFileScanner,内部遍历这些StoreFileScanner并通过KeyValueHeap来排序这些输入文件的首条记录;
3.StoreFileScanner的作用是遍历单个输入文件,管理并提供单个输入文件的首条记录;
4.KeyValueHeap的作用就是通过堆来排序每个输入文件的首条记录。
5.ScanQueryMatcher的作用是当输入文件的首条记录来的时候,根据一定的策略判断这条记录到底是该输出还是该跳过。

源代码如下:

1.Store.compact(final List<StoreFile> filesToCompact,
final boolean majorCompaction, final long maxId)
throws IOException {

这个方法重点是对StoreScanner的方法的调用,因此比较简单,关键的是下面有注释的几句。

// calculate maximum key count after compaction (for blooms)
int maxKeyCount = 0;
for (StoreFile file : filesToCompact) {
StoreFile.Reader r = file.getReader();
if (r != null) {
// NOTE: getFilterEntries could cause under-sized blooms if the user
// switches bloom type (e.g. from ROW to ROWCOL)
long keyCount = (r.getBloomFilterType() == family.getBloomFilterType())
? r.getFilterEntries() : r.getEntries();
maxKeyCount += keyCount;
if (LOG.isDebugEnabled()) {
LOG.debug("Compacting " + file +
", keycount=" + keyCount +
", bloomtype=" + r.getBloomFilterType().toString() +
", size=" + StringUtils.humanReadableInt(r.length()) );
}
}
}

// For each file, obtain a scanner:
List<StoreFileScanner> scanners = StoreFileScanner
.getScannersForStoreFiles(filesToCompact, false, false);//将filesToCompact里的每个输入文件,都为它们创建一个StoreFileScanner对象,用于之后遍历和管理每个输入文件的首条记录。

// Make the instantiation lazy in case compaction produces no product; i.e.
// where all source cells are expired or deleted.
StoreFile.Writer writer = null;
try {
InternalScanner scanner = null;
try {
Scan scan = new Scan();
scan.setMaxVersions(family.getMaxVersions());
/* include deletes, unless we are doing a major compaction */
scanner = new StoreScanner(this, scan, scanners, !majorCompaction);//用刚才创建好的scanners为成员变量再创建StoreScanner,StoreScanner才是真正排序这些文件的对象。
int bytesWritten = 0;
// since scanner.next() can return 'false' but still be delivering data,
// we have to use a do/while loop.
ArrayList<KeyValue> kvs = new ArrayList<KeyValue>();
while (scanner.next(kvs)) {//StoreScanner将文件的记录在内部排序好后,通过next方法循环输出
if (writer == null && !kvs.isEmpty()) {
writer = createWriterInTmp(maxKeyCount,
this.compactionCompression);
}
if (writer != null) {
// output to writer:
for (KeyValue kv : kvs) {
writer.append(kv);//直接将这些记录循环输出到合并文件就行了。

// check periodically to see if a system stop is requested
if (Store.closeCheckInterval > 0) {
bytesWritten += kv.getLength();
if (bytesWritten > Store.closeCheckInterval) {
bytesWritten = 0;
if (!this.region.areWritesEnabled()) {
writer.close();
fs.delete(writer.getPath(), false);
throw new InterruptedIOException(
"Aborting compaction of store " + this +
" in region " + this.region +
" because user requested stop.");
}
}
}
}
}
kvs.clear();
}
} finally {
if (scanner != null) {
scanner.close();
}
}
} finally {
if (writer != null) {
writer.appendMetadata(maxId, majorCompaction);
writer.close();
}
}
return writer;
}


2。StoreScanner.StoreScanner(Store store, Scan scan, List<? extends KeyValueScanner> scanners,
boolean retainDeletesInOutput)
throws IOException {

这个是StoreScanner的构造函数,主要做了三方面工作,见下面注释

this.store = store;
this.cacheBlocks = false;
this.isGet = false;
matcher = new ScanQueryMatcher(scan, store.getFamily().getName(),
null, store.ttl, store.comparator.getRawComparator(),
store.versionsToReturn(scan.getMaxVersions()), retainDeletesInOutput);//创建ScanQueryMatcher对象,用于以后判断记录是该过滤还是输出

// Seek all scanners to the initial key
for(KeyValueScanner scanner : scanners) {
scanner.seek(matcher.getStartKey());//每个输入文件都初始化好首条记录
}

// Combine all seeked scanners with a heap
heap = new KeyValueHeap(scanners, store.comparator);//创建堆对象,用于之后每个输入文件首条记录的排序

}


3.StoreScanner.next(List<KeyValue> outResult, int limit) throws IOException {
//DebugPrint.println("SS.next");

这个方法封装了处理heap取出的记录值的逻辑,根据matcher对该值的判断来决定这个值是输出还是跳过,主要是下面三个方法,见注释

checkReseek();

// if the heap was left null, then the scanners had previously run out anyways, close and
// return.
if (this.heap == null) {
close();
return false;
}

KeyValue peeked = this.heap.peek();
if (peeked == null) {
close();
return false;
}

// only call setRow if the row changes; avoids confusing the query matcher
// if scanning intra-row
if ((matcher.row == null) || !peeked.matchingRow(matcher.row)) {
matcher.setRow(peeked.getRow());
}

KeyValue kv;
List<KeyValue> results = new ArrayList<KeyValue>();
LOOP: while((kv = this.heap.peek()) != null) {//拿出heap里排好序的值
// kv is no longer immutable due to KeyOnlyFilter! use copy for safety
KeyValue copyKv = new KeyValue(kv.getBuffer(), kv.getOffset(), kv.getLength());
ScanQueryMatcher.MatchCode qcode = matcher.match(copyKv);//判断该值该如何处理,输出或跳过
//DebugPrint.println("SS peek kv = " + kv + " with qcode = " + qcode);
switch(qcode) {//根据判断具体的去处理该值
case INCLUDE:
results.add(copyKv);
this.heap.next();
if (limit > 0 && (results.size() == limit)) {
break LOOP;
}
continue;

case DONE:
// copy jazz
outResult.addAll(results);
return true;

case DONE_SCAN:
close();

// copy jazz
outResult.addAll(results);

return false;

case SEEK_NEXT_ROW:
// This is just a relatively simple end of scan fix, to short-cut end us if there is a
// endKey in the scan.
if (!matcher.moreRowsMayExistAfter(kv)) {
outResult.addAll(results);
return false;
}

reseek(matcher.getKeyForNextRow(kv));
break;

case SEEK_NEXT_COL:
reseek(matcher.getKeyForNextColumn(kv));
break;

case SKIP:
this.heap.next();
break;

case SEEK_NEXT_USING_HINT:
KeyValue nextKV = matcher.getNextKeyHint(kv);
if (nextKV != null) {
reseek(nextKV);
} else {
heap.next();
}
break;

default:
throw new RuntimeException("UNEXPECTED");
}
}

if (!results.isEmpty()) {
// copy jazz
outResult.addAll(results);
return true;
}

// No more keys
close();
return false;
}


4.KeyValueHeap.KeyValueHeap(List<? extends KeyValueScanner> scanners,
KVComparator comparator) {

排序堆的构造方法,原来就是一个PriorityQueue,因为要比较的对象是kv记录,所以将kv的comparator作为参数来生成PriorityQueue的comparator

this.comparator = new KVScannerComparator(comparator);
if (!scanners.isEmpty()) {
this.heap = new PriorityQueue<KeyValueScanner>(scanners.size(),
this.comparator);//堆内部就是一个PriorityQueue
for (KeyValueScanner scanner : scanners) {
if (scanner.peek() != null) {
this.heap.add(scanner);//把要排的文件scanner装入堆,文件自动按首条记录通过comparator排好序
} else {
scanner.close();
}
}
this.current = heap.poll();//堆排序后的首个文件scanner
}
}


5.KeyValueHeap.next() throws IOException {

堆里面最重要的方法其实就是next,不过看这个方法的主要功能不是为了算出nextKeyValue,而主要是为了算出nextScanner,然后需在外部再次调用peek方法来取得nextKeyValue,不是很简练。


if(this.current == null) {
return null;
}
KeyValue kvReturn = this.current.next();
KeyValue kvNext = this.current.peek();
if (kvNext == null) {//如果currentScanner已经读完
this.current.close();
this.current = this.heap.poll();//直接从余下scanner中取出新的nextScanner

} else {//如果currentScanner还有值,把余下的scanner中首值最小的scanner取出来比一比,谁小谁就是新的nextScanner

KeyValueScanner topScanner = this.heap.peek();
if (topScanner == null ||
this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
this.heap.add(this.current);
this.current = this.heap.poll();
}
}
return kvReturn;
}


总结:Hbase的Region Compact算法是在一个RegionServer上执行的,在这个执行的过程中,可能有两个瓶颈,
第一,如果待排序的文件数目过多,有导致内存堆溢出的风险,虽然在内存很大的今天,这种情形应该较难出现,但是如果一个regionserver管理了很多region,而这些region又恰好同时都在执行compact的话这就很难说了;
第二,文件的合并工作是在一台RegionServer上执行的,为何不考虑用mapred的形式通过整个集群来分担这种文件合并的工作,最后regionserver只需把合并好的文件装入region就行。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值