HBase MemStore flush触发条件

MemStore flush触发条件

 flush操作是Memstore最核心的操作,针对Memstore的flush操作进行深入地解析:首先分析HBase在哪些场景下会触发flush,然后结合源代码分析整个flush的操作流程,最后再重点整理总结和flush相关的配置参数,这些参数对于性能调优、运维中的问题定位都非常重要。

1、触发条件

HBase会在如下几种情况下触发flush操作, 需要注意的是MemStore的最小flush单元是HRegion而不是单个MemStore。可想而知,如果一个HRegion中Memstore过多,每次flush的开销必然会很大,因此我们也建议在进行表设计的时候尽量减少ColumnFamily的个数。
HBase官方文档总结的刷写时机有6种:
1) 手动执行flush:用户通过shell命令 flush ‘tablename’ 或者 flush ‘regionname’分别对表或一个region进行flush。(org.apache.Hadoop.hbase.client.HBaseAdmin调用flush操作实现,该操作会直接触发HRegion的internalFlush)
2)Memstore级别限制:当Region中任意一个MemStore的大小达到了上限(hbase.hregion.memstore.flush.size,默认128MB),会触发Memstore刷新。 (沒找到)
3)Region级别限制:当Region中所有Memstore的大小总和达到了上限(hbase.hregion.memstore.block.multiplier  * hbase.hregion.memstore.flush.size,默认 4 * 128M = 512M),会触发memstore刷新。 (执行更新操作前,checkresource操作)
4)Region Server级别限制:当一个Region Server中所有Memstore的大小总和达到了上限(hbase.regionserver.global.memstore.upperLimit * hbase_heapsize,默认 40%的JVM内存使用量),会触发部分Memstore刷新。Flush顺序是按照Memstore由大到小执行,先Flush Memstore最大的Region,再执行次大的,直至总体Memstore内存使用量低于阈值(hbase.regionserver.global.memstore.lowerLimit * hbase_heapsize,默认 38%的JVM内存使用量)。(flush为空,执行RS级别检查)
5)当一个Region Server中HLog数量达到上限(可通过参数hbase.regionserver.max.logs配置)时,系统会选取最早的一个 HLog对应的一个或多个Region进行flush。
6)HBase定期刷新Memstore:默认周期为1小时,确保Memstore不会长时间没有持久化。为避免所有的MemStore在同一时间都进行flush导致的问题,定期的flush操作有20000左右的随机延时。
7)数据更新操作引起,如put/delete等

以下从源码(2.0.1版本)角度,对上述几种触发机制分析。

2、触发机制分析

2.1 put操作

第一个会在hbase的put/update/delete时候发生,首先会调用checkResources()方法检查资源,这个checkResources()实际上就是检查HRegion的MemStore大小是否超过一定的阈值(hbase.hregion.memstore.flush.size),如果超过,则会调用requestFlush()方法发起对该HRegion的MemStore进行flush的请求,并抛出RegionTooBusyException异常,阻止该操作继续,后续将要讲的Delete、update等数据更新操作也是如此,在开始执行操作前都会调用这个checkResources()方法来检查资源。而requestFlush方法核心的方法即是调用HRegion的flushcache方法。

以put操作为例,HRegion#put:

  public void put(Put put) throws IOException {
    ...
    // 检查是否满足flush的条件
    checkResources();
    startRegionOperation(Operation.PUT);
    try {
      // All edits for the given row (across all column families) must happen atomically.
      doBatchMutate(put);
    } finally {
      closeRegionOperation(Operation.PUT);
    }
  }
  void checkResources() throws RegionTooBusyException {
    // If catalog region, do not impose resource constraints or block updates.
    if (this.getRegionInfo().isMetaRegion()) return;
    MemStoreSize mss = this.memStoreSizing.getMemStoreSize();
    if (mss.getHeapSize() + mss.getOffHeapSize() > this.blockingMemStoreSize) {
      // 如果当前region上的memstore的值大于 128M * 4,对当前region强制发起flush
      blockedRequestsCount.increment();
      requestFlush();
      ...
    }
  }
  // HRegion初始化,默认为 128M * 4
  this.blockingMemStoreSize = this.memstoreFlushSize * mult;

调用HRegion的requestFlush方法

  private void requestFlush() {
    if (this.rsServices == null) {
      return;
    }
    requestFlush0(FlushLifeCycleTracker.DUMMY);
  }
  private void requestFlush0(FlushLifeCycleTracker tracker) {
    boolean shouldFlush = false;
    synchronized (writestate) {  //检查状态是为了避免重复请求
      if (!this.writestate.isFlushRequested()) {
        shouldFlush = true;
        writestate.flushRequested = true;
      }
    }
    if (shouldFlush) {
      // Make request outside of synchronize block; HBASE-818.
      // 通过rsServices请求flush
      this.rsServices.getFlushRequester().requestFlush(this, false, tracker);
      if (LOG.isDebugEnabled()) {
        LOG.debug("Flush requested on " + this.getRegionInfo().getEncodedName());
      }
    } else {
      tracker.notExecuted("Flush already requested on " + this);
    }
  }
其中rsServices向RegionServer发起一个RPC请求,getFlushRequester()用于获取RegionServer中的成员变量cacheFlusher,该变量是MemStoreFlusher类型,用于管理该RegionServer上的各种flush请求,它里面定义的几个关键变量如下:
  // BlockingQueue阻塞队列 DelayQueue使用优先级队列实现的无界阻塞队列
  private final BlockingQueue<FlushQueueEntry> flushQueue = new DelayQueue<>();
  private final Map<Region, FlushRegionEntry> regionsInQueue = new HashMap<>();
  // 原子bool
  private AtomicBoolean wakeupPending = new AtomicBoolean();
  private final long threadWakeFrequency;
  // HRegionServer实例
  private final HRegionServer server;
  private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
  // blockSignal定义在这里是作为一个信号量么
  private final Object blockSignal = new Object();
  // HRegion的一个阻塞更新的等待时间
  private long blockingWaitTime;
  private final LongAdder updatesBlockedMsHighWater = new LongAdder();
  private final FlushHandler[] flushHandlers;
  private List<FlushRequestListener> flushRequestListeners = new ArrayList<>(1);
  private FlushType flushType;

调用MemStoreFlusher#requestFlush

  // 将待flush的region放入待处理队列
  public void requestFlush(HRegion r, boolean forceFlushAllStores, FlushLifeCycleTracker tracker) {
    r.incrementFlushesQueuedCount();
    synchronized (regionsInQueue) {
      if (!regionsInQueue.containsKey(r)) {
        // This entry has no delay so it will be added at the top of the flush
        // queue. It'll come out near immediately.
        FlushRegionEntry fqe = new FlushRegionEntry(r, forceFlushAllStores, tracker);
        // flushQueue是一个无界阻塞队列,属于flush的工作队列,而regionsInQueue则用于保存位于flush队列的region的信息。
        this.regionsInQueue.put(r, fqe);
        this.flushQueue.add(fqe);
      } else {
        tracker.notExecuted("Flush already requested on " + r);
      }
    }
  }

至此,flush任务已经放入了工作队列,等待flush线程的处理。

2.2 日志数量达到上限

wal下的AbstractFSWAL类构造函数定义日志最大数量,并根据数量判断是否需要执行 flush操作

  protected AbstractFSWAL(...) throws FailedLogCloseException, IOException {
    ...
    boolean maxLogsDefined = conf.get("hbase.regionserver.maxlogs") != null;
    if (maxLogsDefined) {
      LOG.warn("'hbase.regionserver.maxlogs' was deprecated.");
    }
    this.maxLogs = conf.getInt("hbase.regionserver.maxlogs",
            Math.max(32, calculateMaxLogFiles(conf, logrollsize)));
    ...
  }
  byte[][] findRegionsToForceFlush() throws IOException {
    byte[][] regions = null;
    int logCount = getNumRolledLogFiles();
    if (logCount > this.maxLogs && logCount > 0) {
      Map.Entry<Path, WalProps> firstWALEntry = this.walFile2Props.firstEntry();
      regions =
              this.sequenceIdAccounting.findLower(firstWALEntry.getValue().encodedName2HighestSequenceId);
    }
    ...
    return regions;
  }

2.3 定时flush操作

HRegionServer类中有一个内部类PeriodicMemStoreFlusher,定时去检查对应的region以及RegionServer的memstore是否到达了阈值然后去刷写。

  static class PeriodicMemStoreFlusher extends ScheduledChore {
    final HRegionServer server;
    // 每次flush的操作有20000ms左右的延迟(0~5分钟),避免所有memstore同时刷新,对硬盘造成压力
    final static int RANGE_OF_DELAY = 5 * 60 * 1000; // 5 min in milliseconds
    final static int MIN_DELAY_TIME = 0; // millisec
    public PeriodicMemStoreFlusher(int cacheFlushInterval, final HRegionServer server) {
      super("MemstoreFlusherChore", server, cacheFlushInterval);
      this.server = server;
    }
    @Override
    protected void chore() {
      final StringBuilder whyFlush = new StringBuilder();
      for (HRegion r : this.server.onlineRegions.values()) {
        if (r == null) continue;
        if (r.shouldFlush(whyFlush)) {
          FlushRequester requester = server.getFlushRequester();
          if (requester != null) {
            ...
            requester.requestDelayedFlush(r, randomDelay, false);
          }
        }
      }
    }
  }

其中,shouldFlush根据时间间隔判断是否需要flush。
HRegion#shouldFlush

  boolean shouldFlush(final StringBuilder whyFlush) {
    whyFlush.setLength(0);
    // This is a rough measure.
    if (this.maxFlushedSeqId > 0
          && (this.maxFlushedSeqId + this.flushPerChanges < this.mvcc.getReadPoint())) {
      whyFlush.append("more than max edits, " + this.flushPerChanges + ", since last flush");
      return true;
    }
    long modifiedFlushCheckInterval = flushCheckInterval;
    if (getRegionInfo().getTable().isSystemTable() &&
        getRegionInfo().getReplicaId() == RegionInfo.DEFAULT_REPLICA_ID) {
      modifiedFlushCheckInterval = SYSTEM_CACHE_FLUSH_INTERVAL;
    }
    if (modifiedFlushCheckInterval <= 0) { //disabled
      return false;
    }
    long now = EnvironmentEdgeManager.currentTime();
    //if we flushed in the recent past, we don't need to do again now
    if ((now - getEarliestFlushTimeForAllStores() < modifiedFlushCheckInterval)) {
      return false;
    }
    //since we didn't flush in the recent past, flush now if certain conditions
    //are met. Return true on first such memstore hit.
    for (HStore s : stores.values()) {
      if (s.timeOfOldestEdit() < now - modifiedFlushCheckInterval) {
        // we have an old enough edit in the memstore, flush
        whyFlush.append(s.toString() + " has an old edit so flush to free WALs");
        return true;
      }
    }
    return false;
  }

server.getFlushRequester生成flush请求

  public FlushRequester getFlushRequester() {
    return this.cacheFlusher;
  }
  protected MemStoreFlusher cacheFlusher;

2.4 手动flush操作

HBaseAdmin中包含flush相关指令,包括表级、region级和region server级别。

  public void flush(final TableName tableName) throws IOException {
    checkTableExists(tableName);
    if (isTableDisabled(tableName)) {
      LOG.info("Table is disabled: " + tableName.getNameAsString());
      return;
    }
    execProcedure("flush-table-proc", tableName.getNameAsString(), new HashMap<>());
  }
  @Override
  public void flushRegion(final byte[] regionName) throws IOException {
    Pair<RegionInfo, ServerName> regionServerPair = getRegion(regionName);
    if (regionServerPair == null) {
      throw new IllegalArgumentException("Unknown regionname: " + Bytes.toStringBinary(regionName));
    }
    if (regionServerPair.getSecond() == null) {
      throw new NoServerForRegionException(Bytes.toStringBinary(regionName));
    }
    final RegionInfo regionInfo = regionServerPair.getFirst();
    ServerName serverName = regionServerPair.getSecond();
    flush(this.connection.getAdmin(serverName), regionInfo);
  }
  private void flush(AdminService.BlockingInterface admin, final RegionInfo info)
    throws IOException {
    ProtobufUtil.call(() -> {
      // TODO: There is no timeout on this controller. Set one!
      HBaseRpcController controller = rpcControllerFactory.newController();
      FlushRegionRequest request =
        RequestConverter.buildFlushRegionRequest(info.getRegionName());
      admin.flushRegion(controller, request);
      return null;
    });
  }
  @Override
  public void flushRegionServer(ServerName serverName) throws IOException {
    for (RegionInfo region : getRegions(serverName)) {
      flush(this.connection.getAdmin(serverName), region);
    }
  }

以上,是1、3、5、6、7几种触发条件的源码分析。

第4种触发机制,将在下一篇博客中,和生成的待flush的队列的处理流程一起分析。

 

参考:

https://cloud.tencent.com/developer/article/1005744

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值