HBase代码学习---Flush流程-CSDN博客

引言：

在HBase的架构设计中，为了降低写入数据的延迟，将每个写请求分为了两个阶段，第一阶段是接收写请求并将数据写入内存，第二阶段是在后台批量地将数据刷写到磁盘。由此将内存的告诉随机写与磁盘的高速顺序写结合起来，已达到较低写入时延的目的。

基本原理：

在hbase系统中，regionserver会不断接收到写请求，并将数据写入memstore，每个region的每个Store都有自己对应的MemStore。当发现某个memstore写满或者系统状态满足一些条件时，则发起flush请求。整个flush请求的响应过程是异步的，会有后台常驻线程来处理，处理步骤主要是两步：第一步是创建memstore的snapshot，第二部是将snapshot写入对应文件路径形成新的hfile。

Flush涉及的类：

1.MemStoreFlusher：

a.flushQueue：用于保存已提交的flush request；

b.regionsInQueue：用于记录哪些region正在flush；

c.FlushHandler：一组flush线程，负责处理flush request。

2.HRegionServer：

在server初始化时，负责创建MemStoreFlusher。具体见initializeThreads方法。

// Cache flushing thread.
this.cacheFlusher = new MemStoreFlusher(conf, this);

3.HRegion：

该类负责管理对region的所有操作，因此也是flush请求最主要的发起者。在HRegion中保存了每个region对应的Flush策略(FlushPolicy)。

4.LogRoller：

该类的功能是不断滚动WAL日志，但是每次滚动时都需要保证日志对应的所有MemStore被flush了(需要刷写所有的MemStore)，才可以进行日志滚动。

5.FlushPolicy：

Flush策略用于决定每次需要刷写哪些Store。

在1.0以前默认是将一个region下所有的MemStore都刷写到磁盘；在1.0之后默认是选择大小大于阈值的MemStore进行刷写。具体见FlushLargeStoresPolicy和FlushAllStoresPolicy。

Flush的触发点（以hbase1.2.4为例）：

1.所有put类操作（包括Put，Delete，Increment，Append，BatchMutate）开始前都会checkResource，在这里检查memstore的大小，如果满足条件2，则发起请求。

2.在Put类操作后会再进行检查，如果满足条件1，则发起请求。具体涉及的方法：

1.batchMutate

2.processRowsWithLocks

3.append

4.doIncrement

3.LogRoller中在发起log roll时会强制将内存中的数据flush。

Flush需要满足的条件（以hbase1.2.4为例）：

条件1.common flush

private boolean isFlushSize(final long size) {
  return size > this.memstoreFlushSize;  // size为当前region下所有MemStore大小的和。
}

memstoreFlushSize的默认大小是：134217728B=128MB，可以在HTableDescriptor中设置。

条件2.block flush

  if (this.memstoreSize.get() > this.blockingMemStoreSize) {
    requestFlush();
    throw new RegionTooBusyException();
  }

blockingMemStoreSize默认大小是： 2 * memstoreFlushSize，系数可以设置。

条件3. 文件选择

在1.0之前每次发起的flush请求会默认将改region下的所有memstore刷写，但是从1.0开始默认的策略改为了尽量只刷写大小大于阈值的memstore。

阈值由两个参数决定，取其中较大的一个。第一个参数可以通过hbase.hregion.percolumnfamilyflush.size.lower.bound设置，默认值为16777216；第二个参数为hbase.hregion.memstore.flush.size / column_family_number。

Flush的具体流程（代码分析）：

1.在put类操作开始时会checkResource，如果满足条件则通过MemStoreFlusher发起request

2.MemStoreFlusher将对应请求封装后放入flushQueue和regionsInQueue中

3.MemStoreFlusher在初始化时，会启动hbase.hstore.flusher.count设置数量的flushHandler，默认为2

1 int handlerCount = conf.getInt("hbase.hstore.flusher.count", 2);
2 this.flushHandlers = new FlushHandler[handlerCount];

4.flushHandler调用flushRegion

--> 这里有个限流措施：不是metaRegion并且storefile太多则将请求重新放入队列

if (!region.getRegionInfo().isMetaRegion() && isTooManyStoreFiles(region)) {

--> 正常流程是调用HRegion.flushCache

5.HRegion.flush() --> flushcache()

5.1.获取region.updatesLock.writeLock --> 与所有的put操作互斥（put操作获取readLock）

5.2.根据flush policy来选择需要刷写的store，然后调用internalFlushcache

1 Collection<Store> specificStoresToFlush =  forceFlushAllStores ? stores.values() : flushPolicy.selectStoresToFlush();
2 FlushResult fs = internalFlushcache(specificStoresToFlush, status, writeFlushRequestWalMarker);

5.3.prepare阶段：internalPrepareFlushCache，主要完成memstore建立snapshot，然后将snapshot写到一个临时目录。

5.3.1.记录mvcc，新建当前mvcc entry，并advance mvcc，这可以保证flush前的所有事务都完成了。

1 writeEntry = mvcc.begin();
2 mvcc.completeAndWait(writeEntry);

5.3.2为region下所有store创建storeFlushContext --> 本质是storeFlusherImpl

5.3.3storeFlusherImpl.prepare，本质是建立memstore的snapshot

 1     /**
 2      * This is not thread safe. The caller should have a lock on the region or the store.
 3      * If necessary, the lock can be added with the patch provided in HBASE-10087
 4      */
 5     @Override
 6     public void prepare() {
 7       this.snapshot = memstore.snapshot();
 8       this.cacheFlushCount = snapshot.getCellsCount();
 9       this.cacheFlushSize = snapshot.getSize();
10       committedFiles = new ArrayList<Path>(1);
11     }

5.3.4释放region.updatesLock.writeLock --> 占用时间还是很短暂的

5.4.commit阶段：internalFlushCacheAndCommit，将snapshot写入临时目录，并提交。

5.4.1开始flush，这里采用了通用的原子操作实现方法：先写到一个tmp目录，再commit

--> StoreFulsherImpl.flushCache --> HStore.flushCache --> storeEngine.flusher.flushSnapshot

--> flushSnapshot中会创建memStoreScanner，然后基于它创建StoreScanner

for (StoreFlushContext flush : storeFlushCtxs.values()) {
    flush.flushCache(status);
}
 
StoreFlusher flusher = storeEngine.getStoreFlusher();
    IOException lastException = null;
for (int i = 0; i < flushRetriesNumber; i++) {
   try {
     List<Path> pathNames = flusher.flushSnapshot(snapshot, logCacheFlushId, status);
     Path lastPathName = null;
     try {
       for (Path pathName : pathNames) {
          lastPathName = pathName;
          validateStoreFile(pathName);
        }
        return pathNames;
      } catch (Exception e) {
        LOG.warn("Failed validating store file " + lastPathName + ", retrying num=" + i, e);
        if (e instanceof IOException) {
          lastException = (IOException) e;
        } else {
          lastException = new IOException(e);
        }
      }
    } catch (IOException e) {
      LOG.warn("Failed flushing store file, retrying num=" + i, e);
      lastException = e;
    }
....
}

5.4.2.flusher.flushSnapshot --> performFlush

 1 boolean hasMore;
 2 do {
 3    hasMore = scanner.next(kvs, scannerContext);
 4    if (!kvs.isEmpty()) {
 5      for (Cell c : kvs) {
 6      // If we know that this KV is going to be included always, then let us
 7      // set its memstoreTS to 0. This will help us save space when writing to
 8      // disk.
 9      sink.append(c);
10    }
11    kvs.clear();
12  }
13 } while (hasMore);

5.4.3.执行commit，将临时目录提交到线上目录，并更新内存中的列表。

 1     for (StoreFlushContext flush : storeFlushCtxs.values()) {
 2         boolean needsCompaction = flush.commit(status);
 3         if (needsCompaction) {
 4           compactionRequested = true;
 5         }
 6         byte[] storeName = it.next().getFamily().getName();
 7         List<Path> storeCommittedFiles = flush.getCommittedFiles();
 8         committedFiles.put(storeName, storeCommittedFiles);
 9         // Flush committed no files, indicating flush is empty or flush was canceled
10         if (storeCommittedFiles == null || storeCommittedFiles.isEmpty()) {
11           totalFlushableSizeOfFlushableStores -= prepareResult.storeFlushableSize.get(storeName);
12         }
13       }

5.4.4.向WAL写入flush完成的标记。

1     if (wal != null) {
2         // write flush marker to WAL. If fail, we should throw DroppedSnapshotException
3         FlushDescriptor desc = ProtobufUtil.toFlushDescriptor(FlushAction.COMMIT_FLUSH,
4           getRegionInfo(), flushOpSeqId, committedFiles);
5         WALUtil.writeFlushMarker(wal, this.htableDescriptor, getRegionInfo(),
6           desc, true, mvcc);
7       }