Hbase的Region Compact算法实现分析

最新推荐文章于 2022-12-09 15:17:47 发布

uestzengting

最新推荐文章于 2022-12-09 15:17:47 发布

阅读量165

点赞数

分类专栏： hbase 文章标签： hbase 算法 hadoop compact 源代码

hbase 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Hbase的Region Compact算法属于一种多路归并的外排算法。这种算法的特点是，待排序文件本身是有序的，同时打开这些文件，顺序遍历并对比它们的首条数据，最后合并输出为一个文件，多个文件遍历时的首条数据用内存堆进行内排。

Hbase在实现该算法的过程中重要的是下面这五个类。
1.org.apache.hadoop.hbase.regionserver.Store
2.org.apache.hadoop.hbase.regionserver.StoreScanner
3.org.apache.hadoop.hbase.regionserver.StoreFileScanner
4.org.apache.hadoop.hbase.regionserver.KeyValueHeap
5.org.apache.hadoop.hbase.regionserver.ScanQueryMatcher

这五个类的关系是
1.Store类调用StoreScanner的next方法，并循环输出kv到合并文件；
2.StoreScanner的作用是负责创建并持有多个输入文件的StoreFileScanner，内部遍历这些StoreFileScanner并通过KeyValueHeap来排序这些输入文件的首条记录；
3.StoreFileScanner的作用是遍历单个输入文件，管理并提供单个输入文件的首条记录；
4.KeyValueHeap的作用就是通过堆来排序每个输入文件的首条记录。
5.ScanQueryMatcher的作用是当输入文件的首条记录来的时候，根据一定的策略判断这条记录到底是该输出还是该跳过。

源代码如下：

1.Store.compact(final List<StoreFile> filesToCompact,
final boolean majorCompaction, final long maxId)
throws IOException {

这个方法重点是对StoreScanner的方法的调用，因此比较简单，关键的是下面有注释的几句。

// calculate maximum key count after compaction (for blooms)
int maxKeyCount = 0;
for (StoreFile file : filesToCompact) {
StoreFile.Reader r = file.getReader();
if (r != null) {
// NOTE: getFilterEntries could cause under-sized blooms if the user
// switches bloom type (e.g. from ROW to ROWCOL)
long keyCount = (r.getBloomFilterType() == family.getBloomFilterType())
? r.getFilterEntries() : r.getEntries();
maxKeyCount += keyCount;
if (LOG.isDebugEnabled()) {
LOG.debug("Compacting " + file +
", keycount=" + keyCount +
", bloomtype=" + r.getBloomFilterType().toString() +
", size=" + StringUtils.humanReadableInt(r.length()) );
}
}
}

// For each file, obtain a scanner:
List<StoreFileScanner> scanners = StoreFileScanner
.getScannersForStoreFiles(filesToCompact, false, false);//将filesToCompact里的每个输入文件，都为它们创建一个StoreFileScanner对象，用于之后遍历和管理每个输入文件的首条记录。

// Make the instantiation lazy in case compaction produces no product; i.e.
// where all source cells are expired or deleted.
StoreFile.Writer writer = null;
try {
InternalScanner scanner = null;
try {
Scan scan = new Scan();
scan.setMaxVersions(family.getMaxVersions());
/* include deletes, unless we are doing a major compaction */
scanner = new StoreScanner(this, scan, scanners, !majorCompaction);//用刚才创建好的scanners为成员变量再创建StoreScanner，StoreScanner才是真正排序这些文件的对象。
int bytesWritten = 0;
// since scanner.next() can return 'false' but still be delivering data,
// we have to use a do/while loop.
ArrayList<KeyValue> kvs = new ArrayList<KeyValue>();
while (scanner.next(kvs)) {//StoreScanner将文件的记录在内部排序好后，通过next方法循环输出
if (writer == null && !kvs.isEmpty()) {
writer = createWriterInTmp(maxKeyCount,
this.compactionCompression);
}
if (writer != null) {
// output to writer:
for (KeyValue kv : kvs) {
writer.append(kv);//直接将这些记录循环输出到合并文件就行了。

// check periodically to see if a system stop is requested
if (Store.closeCheckInterval > 0) {
bytesWritten += kv.getLength();
if (bytesWritten > Store.closeCheckInterval) {
bytesWritten = 0;
if (!this.region.areWritesEnabled()) {
writer.close();
fs.delete(writer.getPath(), false);
throw new InterruptedIOException(
"Aborting compaction of store " + this +
" in region " + this.region +
" because user requested stop.");
}
}
}
}
}
kvs.clear();
}
} finally {
if (scanner != null) {
scanner.close();
}
}
} finally {
if (writer != null) {
writer.appendMetadata(maxId, majorCompaction);
writer.close();
}
}
return writer;
}

2。StoreScanner.StoreScanner(Store store, Scan scan, List<? extends KeyValueScanner> scanners,
boolean retainDeletesInOutput)
throws IOException {

这个是StoreScanner的构造函数，主要做了三方面工作，见下面注释

this.store = store;
this.cacheBlocks = false;
this.isGet = false;
matcher = new ScanQueryMatcher(scan, store.getFamily().getName(),
null, store.ttl, store.comparator.getRawComparator(),
store.versionsToReturn(scan.getMaxVersions()), retainDeletesInOutput);//创建ScanQueryMatcher对象，用于以后判断记录是该过滤还是输出

// Seek all scanners to the initial key
for(KeyValueScanner scanner : scanners) {
scanner.seek(matcher.getStartKey());//每个输入文件都初始化好首条记录
}

// Combine all seeked scanners with a heap
heap = new KeyValueHeap(scanners, store.comparator);//创建堆对象，用于之后每个输入文件首条记录的排序

}

3.StoreScanner.next(List<KeyValue> outResult, int limit) throws IOException {
//DebugPrint.println("SS.next");

这个方法封装了处理heap取出的记录值的逻辑，根据matcher对该值的判断来决定这个值是输出还是跳过，主要是下面三个方法，见注释

checkReseek();

// if the heap was left null, then the scanners had previously run out anyways, close and
// return.
if (this.heap == null) {
close();
return false;
}

KeyValue peeked = this.heap.peek();
if (peeked == null) {
close();
return false;
}

// only call setRow if the row changes; avoids confusing the query matcher
// if scanning intra-row
if ((matcher.row == null) || !peeked.matchingRow(matcher.row)) {
matcher.setRow(peeked.getRow());
}

KeyValue kv;
List<KeyValue> results = new ArrayList<KeyValue>();
LOOP: while((kv = this.heap.peek()) != null) {//拿出heap里排好序的值
// kv is no longer immutable due to KeyOnlyFilter! use copy for safety
KeyValue copyKv = new KeyValue(kv.getBuffer(), kv.getOffset(), kv.getLength());
ScanQueryMatcher.MatchCode qcode = matcher.match(copyKv);//判断该值该如何处理，输出或跳过
//DebugPrint.println("SS peek kv = " + kv + " with qcode = " + qcode);
switch(qcode) {//根据判断具体的去处理该值
case INCLUDE:
results.add(copyKv);
this.heap.next();
if (limit > 0 && (results.size() == limit)) {
break LOOP;
}
continue;

case DONE:
// copy jazz
outResult.addAll(results);
return true;

case DONE_SCAN:
close();

// copy jazz
outResult.addAll(results);

return false;

case SEEK_NEXT_ROW:
// This is just a relatively simple end of scan fix, to short-cut end us if there is a
// endKey in the scan.
if (!matcher.moreRowsMayExistAfter(kv)) {
outResult.addAll(results);
return false;
}

reseek(matcher.getKeyForNextRow(kv));
break;

case SEEK_NEXT_COL:
reseek(matcher.getKeyForNextColumn(kv));
break;

case SKIP:
this.heap.next();
break;

case SEEK_NEXT_USING_HINT:
KeyValue nextKV = matcher.getNextKeyHint(kv);
if (nextKV != null) {
reseek(nextKV);
} else {
heap.next();
}
break;

default:
throw new RuntimeException("UNEXPECTED");
}
}

if (!results.isEmpty()) {
// copy jazz
outResult.addAll(results);
return true;
}

// No more keys
close();
return false;
}

4.KeyValueHeap.KeyValueHeap(List<? extends KeyValueScanner> scanners,
KVComparator comparator) {

排序堆的构造方法，原来就是一个PriorityQueue，因为要比较的对象是kv记录，所以将kv的comparator作为参数来生成PriorityQueue的comparator

this.comparator = new KVScannerComparator(comparator);
if (!scanners.isEmpty()) {
this.heap = new PriorityQueue<KeyValueScanner>(scanners.size(),
this.comparator);//堆内部就是一个PriorityQueue
for (KeyValueScanner scanner : scanners) {
if (scanner.peek() != null) {
this.heap.add(scanner);//把要排的文件scanner装入堆，文件自动按首条记录通过comparator排好序
} else {
scanner.close();
}
}
this.current = heap.poll();//堆排序后的首个文件scanner
}
}

5.KeyValueHeap.next() throws IOException {

堆里面最重要的方法其实就是next，不过看这个方法的主要功能不是为了算出nextKeyValue，而主要是为了算出nextScanner，然后需在外部再次调用peek方法来取得nextKeyValue，不是很简练。

if(this.current == null) {
return null;
}
KeyValue kvReturn = this.current.next();
KeyValue kvNext = this.current.peek();
if (kvNext == null) {//如果currentScanner已经读完
this.current.close();
this.current = this.heap.poll();//直接从余下scanner中取出新的nextScanner

} else {//如果currentScanner还有值，把余下的scanner中首值最小的scanner取出来比一比，谁小谁就是新的nextScanner

KeyValueScanner topScanner = this.heap.peek();
if (topScanner == null ||
this.comparator.compare(kvNext, topScanner.peek()) >= 0) {
this.heap.add(this.current);
this.current = this.heap.poll();
}
}
return kvReturn;
}

总结：Hbase的Region Compact算法是在一个RegionServer上执行的，在这个执行的过程中，可能有两个瓶颈，
第一，如果待排序的文件数目过多，有导致内存堆溢出的风险，虽然在内存很大的今天，这种情形应该较难出现，但是如果一个regionserver管理了很多region，而这些region又恰好同时都在执行compact的话这就很难说了；
第二，文件的合并工作是在一台RegionServer上执行的，为何不考虑用mapred的形式通过整个集群来分担这种文件合并的工作，最后regionserver只需把合并好的文件装入region就行。

uestzengting

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hbase的Region Compact算法实现分析

Hbase的Region Compact算法属于一种多路归并的外排算法。这种算法的特点是，待排序文件本身是有序的，同时打开这些文件，顺序遍历并对比它们的首条数据，最后合并输出为一个文件，多个文件遍历时的首条数据用内存堆进行内排。Hbase在实现该算法的过程中重要的是下面这五个类。1.org.apache.hadoop.hbase.regionserver.Store2.org.ap...
复制链接

扫一扫