HLog代码分析及与HBase replication延时

最新推荐文章于 2023-04-07 11:00:00 发布

brianf

最新推荐文章于 2023-04-07 11:00:00 发布

阅读量344

点赞数

分类专栏： HBase 文章标签： hbase replication hlog Hbase代码

HBase 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

在分享replication时，有同事提出replication延时怎么样，（基于0.94.3）
本文主要代码分析一下Hlog生成及对relication的影响。具体replication请参考
[url]http://brianf.iteye.com/blog/1776936[/url]

首先分析hlog什么时候产生：

在生成HLog对象时，会调用HLog的rollWriter()，此时由于this.writer为null，所以通过rollWriter方法会创建第一个hlog文件，之后会调用replicaton相关的参见http://brianf.iteye.com/blog/1776936

    // rollWriter sets this.hdfs_out if it can.
    rollWriter();

    // handle the reflection necessary to call getNumCurrentReplicas()
    this.getNumCurrentReplicas = getGetNumCurrentReplicas(this.hdfs_out);

    logSyncerThread = new LogSyncer(this.optionalFlushInterval);

-----------------

 public byte [][] rollWriter(boolean force)
      throws FailedLogCloseException, IOException {
    // Return if nothing to flush.
    if (!force && this.writer != null && this.numEntries.get() <= 0) {
      return null;
    }

-------------
LogRoller.run中

  public void run() {
    while (!server.isStopped()) {
      long now = System.currentTimeMillis();
      boolean periodic = false;
      if (!rollLog.get()) {
        periodic = (now - this.lastrolltime) > this.rollperiod;
        if (!periodic) {
          synchronized (rollLog) {
            try {
              rollLog.wait(this.threadWakeFrequency);
            } catch (InterruptedException e) {
              // Fall through
            }
          }
          continue;
        }
        // Time for periodic roll
        if (LOG.isDebugEnabled()) {
          LOG.debug("Hlog roll period " + this.rollperiod + "ms elapsed");
        }
      } else if (LOG.isDebugEnabled()) {
        LOG.debug("HLog roll requested");
      }
      rollLock.lock(); // FindBugs UL_UNRELEASED_LOCK_EXCEPTION_PATH
      try {
        this.lastrolltime = now;
        // This is array of actual region names.
        byte [][] regionsToFlush = this.services.getWAL().rollWriter(rollLog.get());

LogRoller线程默认会等待1小时，也就是默认是1个小时一个log（后面会说还有hlog size也会是一个因素）

    this.rollperiod = this.server.getConfiguration().
      getLong("hbase.regionserver.logroll.period", 3600000);

而rollLog是AtomicBoolean，当为true时，调用rollWriter会创建新log, 什么时候rollLog为true呢？

  public void logRollRequested() {
    synchronized (rollLog) {
      rollLog.set(true);
      rollLog.notifyAll();
    }
  }

是在Hlog.syncer方法中调用的。

数据写入hbase时，如put,调用Hlog的append ，此方法中将数据写到Hlog的缓存中（List），再同步sync数据到HDSF，还有LogSyncer线程会1000ms执行一次 Hlog.syncer方法。

  private long append(HRegionInfo info, byte [] tableName, WALEdit edits, UUID clusterId,
      final long now, HTableDescriptor htd, boolean doSync)
    throws IOException {
      if (edits.isEmpty()) return this.unflushedEntries.get();;
      if (this.closed) {
        throw new IOException("Cannot append; log is closed");
      }
      long txid = 0;
      synchronized (this.updateLock) {
        long seqNum = obtainSeqNum();
        // The 'lastSeqWritten' map holds the sequence number of the oldest
        // write for each region (i.e. the first edit added to the particular
        // memstore). . When the cache is flushed, the entry for the
        // region being flushed is removed if the sequence number of the flush
        // is greater than or equal to the value in lastSeqWritten.
        // Use encoded name.  Its shorter, guaranteed unique and a subset of
        // actual  name.
        byte [] encodedRegionName = info.getEncodedNameAsBytes();
        this.lastSeqWritten.putIfAbsent(encodedRegionName, seqNum);
        HLogKey logKey = makeKey(encodedRegionName, tableName, seqNum, now, clusterId);
        doWrite(info, logKey, edits, htd);
        this.numEntries.incrementAndGet();
        txid = this.unflushedEntries.incrementAndGet();
        if (htd.isDeferredLogFlush()) {
          lastDeferredTxid = txid;
        }
      }
      // Sync if catalog region, and if not then check if that table supports
      // deferred log flushing
      if (doSync && 
          (info.isMetaRegion() ||
          !htd.isDeferredLogFlush())) {
        // sync txn to file system
        this.sync(txid);
      }
      return txid;
    }

其中在Hlog.syncer方法中调用checkLowReplication方法用来判断是否hlog在hdfs上的副本数低于配置项，若低于则requestLogRoll，最终调用logRollRequested方法，但是调用次数不超过默认5次（

 this.lowReplicationRollLimit = conf.getInt(
        "hbase.regionserver.hlog.lowreplication.rolllimit", 5);

）

然后判断正在写的hlog是否大于一个size(64MB*0.95),若大于，说明也要生成新的Hlog

    this.blocksize = conf.getLong("hbase.regionserver.hlog.blocksize",
        getDefaultBlockSize());
    // Roll at 95% of block size.
    float multi = conf.getFloat("hbase.regionserver.logroll.multiplier", 0.95f);
    this.logrollsize = (long)(this.blocksize * multi);
--------------------
          if (tempWriter.getLength() > this.logrollsize) {
            requestLogRoll();
          }

对于replication来说，延迟时间主要是与ZK的通讯及RPC调用slave RS时间。

hbase.regionserver.optionallogflushinterval
将Hlog同步到HDFS的间隔。如果Hlog没有积累到一定的数量，到了时间，也会触发同步。默认是1秒，单位毫秒。
默认: 1000
hbase.regionserver.logroll.period
提交commit log的间隔，不管有没有写足够的值。
默认: 3600000
hbase.master.logcleaner.ttl
Hlog存在于.oldlogdir 文件夹的最长时间, 超过了就会被 Master 的线程清理掉.
默认: 600000
hbase.master.logcleaner.plugins
值用逗号间隔的文本表示。这些WAL/HLog cleaners会按顺序调用。可以把先调用的放在前面。可以实现自己的LogCleanerDelegat，加到Classpath下，然后在这里写上类的全路径就可以。一般都是加在默认值的前面。
具体的初始是在CleanerChore 的initCleanerChain方法，此方法同时也实现HFile的cleaner的初台化。
默认: org.apache.hadoop.hbase.master.TimeToLiveLogCleaner

hbase.regionserver.hlog.blocksize
hbase.regionserver.maxlogs

WAL的最大值由hbase.regionserver.maxlogs * hbase.regionserver.hlog.blocksize (2GB by default)决定。一旦达到这个值，Memstore flush就会被触发。通过WAL限制来触发Memstore的flush并非最佳方式，这样做可能会会一次flush很多Region，引发flush雪崩。

最好将hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs 设置为稍微大于hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE.