HBase-split

HBase-split代码分析

相关类:

  • SplitRequest : 具体执行split过程的类
  • CompactSplitThread : compact split 线程控制
  • MemStoreFlusher : flush memstore 实现
  • RSRpcServices : regionserver RPC实现类
  • TableLock : 是TableLockManager中的一个接口,实现类:ZKTableLockManager
  • IncreasingToUpperBoundRegionSplitPolicy:默认的split策略

hbase-split代码01


触发split的情况

  • HBaseAdmin : HBaseAdmin.split
  • compact : CompactSplitThread.CompactRunner
  • memstore flush : FlushHandler.java

这里写图片描述

  1. compact 触发 split:
    CompactionRunner.run()中:
    public void run() {
      //....一些条件判断
      if (this.compaction == null) {
        .....
      // Finally we can compact something.
      assert this.compaction != null;

      ...
      try {
        ...
        boolean completed = region.compact(compaction, store);
        ...
        if (completed) {
          // degenerate case: blocked regions require recursive enqueues
          if (store.getCompactPriority() <= 0) {
            requestSystemCompaction(region, store, "Recursive enqueue");
          } else {
            // see if the compaction has caused us to exceed max region size
            //*********如果超过最大的region大小******
            requestSplit(region);
          }
        }
      } catch (IOException ex) {
       ...
        }
        server.checkFileSystem();
      } catch (Exception ex) {
        ...
      } finally {
        LOG...
      }
      this.compaction.getRequest().afterExecute();//一个空的方法
    }

上面的:store.getCompactPriority() <= 0 是什么意思??

我们来看一下HStore.java中 getCompactPriority()

@Override
  public int getCompactPriority() {

    // 从StoreFileManager中获取Compact Priority
    int priority = this.storeEngine.getStoreFileManager().getStoreCompactionPriority();
    if (priority == PRIORITY_USER) {
      LOG.warn("Compaction priority is USER despite there being no user compaction");
    }
    return priority;
  }

它转而从StoreFileManager中获取Compact Priority,继续吧!在StoreFileManager的默认实现DefaultStoreFileManager中,代码如下:

@Override
  public int getStoreCompactionPriority() {
    isTooManyStoreFiles:MemStore在进行flush时会判断HRegion上每个HStore下的文件数是否太多,太多则意味着MemStore的flush会被推迟进行,优先进行compact,否则文件数则会越来越多,而这里,离blockingFileCount越远,当前文件数越小的话,则意味着MemStore的flush可以优先进行,而compact可以在它flush之后再进行,将资源利用效率最大化
    // BLOCKING_STOREFILES_KEY = "hbase.hstore.blockingStoreFiles"
    // HStore.DEFAULT_BLOCKING_STOREFILE_COUNT = 7  为什么为7???
    int blockingFileCount = conf.getInt(
        HStore.BLOCKING_STOREFILES_KEY, HStore.DEFAULT_BLOCKING_STOREFILE_COUNT);

    // 优先级为上述blockingFileCount减去当前storefiles的数目
    int priority = blockingFileCount - storefiles.size();

    // 如果priority为1,则返回2,否则返回原值
    return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;
  }

回到 store.getCompactPriority() <= 0 这个问题

如果 store.getCompactPriority() <= 0 则 blockingFileCount(=7) - storefiles.size() <= 0
说明 storefiles.size() >=7
然后执行 requestSystemCompaction(region, store, “Recursive enqueue”);
这个里面又是需要执行CompactionRunner


requestSplit

好了,相当于 storefiles.size() < 7 的话
CompactionRunner.run()中执行requestSplit()
这个方法是CompactSplitThread中的requestSplit()

  public synchronized boolean requestSplit(final HRegion r) {
    // 1.shouldSplitRegion()  判断当前RS上region数量是否大于系统设置
    // 2.r.getCompactPriority() >= 1
    if (shouldSplitRegion() && r.getCompactPriority() >= Store.PRIORITY_USER) {
      byte[] midKey = r.checkSplit();
      if (midKey != null) {
        requestSplit(r, midKey);
        return true;
      }
    }
    return false;
  }

看一下shouldSplitRegion()方法里面做了什么判断?

  private boolean shouldSplitRegion() {
 //this.regionSplitLimit=conf.getInt(
//REGION_SERVER_REGION_SPLIT_LIMIT,
//DEFAULT_REGION_SERVER_REGION_SPLIT_LIMIT);   默认为1000
    if(server.getNumberOfOnlineRegions() > 0.9*regionSplitLimit) {
    //如果当前regionserver上的region数 > 900 打印WARN LOG
      LOG.warn("Total number of regions is approaching the upper limit " + regionSplitLimit + ". "
          + "Please consider taking a look at http://hbase.apache.org/book.html#ops.regionmgt");
    }
    // regionSplitLimit 大于 当前RS的online region数则返回true
    return (regionSplitLimit > server.getNumberOfOnlineRegions());
  }

region在RS上的数量和compact优先级都判断完了
下面执行HRegion checkSplit()

  /**
   * Return the splitpoint. null indicates the region isn't splittable
   * If the splitpoint isn't explicitly specified, it will go over the stores
   * to find the best splitpoint. Currently the criteria of best splitpoint
   * is based on the size of the store.
   * 返回split point。null 表示不能被split。
   * 如果split point 没有指定。则会根据stores寻找最佳split point 。最佳split point基于store的size
   */
  public byte[] checkSplit() {
    // META表和NAMESPACE元数据表不能被split

    // recovering(恢复中)状态的表不能被split

    //splitPolicy(split策略)默认为IncreasingToUpperBoundRegionSplitPolicy
    if (!splitPolicy.shouldSplit()) {
      return null;
    }

//获取具体的split point
    byte[] ret = splitPolicy.getSplitPoint();

    if (ret != null) {
      try {
      //判断row是否在这个region当中
        checkRow(ret, "calculated split");
      } catch (IOException e) {
        LOG.error("Ignoring invalid split", e);
        return null;
      }
    }
    return ret;
  }

默认splitPolicy为:
IncreasingToUpperBoundRegionSplitPolicy
看一下它里面的shouldSplit()方法

  @Override
  protected boolean shouldSplit() {
    if (region.shouldForceSplit()) return true;
    boolean foundABigStore = false;
    // Get count of regions that have the same common table as this.region
    // table的region数量
    int tableRegionsCount = getCountOfCommonTableRegions();
    // Get size to check
    // 获取根据hbase.hregion.max.filesize和region数量以及hbase.hregion.memstore.flush.size计算的CheckSize
    long sizeToCheck = getSizeToCheck(tableRegionsCount);


    //循环遍历region下面所有store
    for (Store store : region.getStores().values()) {
      // 如果有的region不能被split,比如有的region包含引用文件,则返回false
      if ((!store.canSplit())) {
        return false;
      }

      // Mark if any store is big enough
      long size = store.getSize();
      //如果store大于check size,设置foundABigStore为true
      if (size > sizeToCheck) {
        LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +
          " size=" + size + ", sizeToCheck=" + sizeToCheck +
          ", regionsWithCommonTable=" + tableRegionsCount);
        foundABigStore = true;
      }
    }

    return foundABigStore;
  }

IncreasingToUpperBoundRegionSplitPolicy getSizeToCheck()


  /**
   * @return Region max size or <code>count of regions squared * flushsize, which ever is
   * smaller; guard against there being zero regions on this server.
   */
  protected long getSizeToCheck(final int tableRegionsCount) {
    // safety check for 100 to avoid numerical overflow in extreme cases
    //如果 region数=0或者>100 返回 hbase.hregion.max.filesize 值
    //否则 在 max_filesize和 之间选择一个小的值 128M * regionCt * regionCt * regionCt
    initialSize= table属性里设置的MEMSTORE_FLUSHSIZE,或者默认为hbase.hregion.memstore.flush.size(默认128M)
    return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():
      Math.min(getDesiredMaxFileSize(),
        this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
  }

getDesiredMaxFileSize()方法:
返回IncreasingToUpperBoundRegionSplitPolicy继承的ConstantSizeRegionSplitPolicy类的desiredMaxFileSize值.(hbase.hregion.max.filesize)

desiredMaxFileSize的赋值过程:

  @Override
  protected void configureForRegion(HRegion region) {
    super.configureForRegion(region);
    Configuration conf = getConf();
    HTableDescriptor desc = region.getTableDesc();
    if (desc != null) {
    //如果table设置了MAX_FILESIZE属性,则返回这个属性的值,否则返回-1
      this.desiredMaxFileSize = desc.getMaxFileSize();
    }
    //如果 desc.getMaxFileSize()返回 < 0 的值
    //则获取hbase.hregion.max.filesize属性值,或者默认值:10 * 1024 * 1024 * 1024L=10G
    if (this.desiredMaxFileSize <= 0) {
      this.desiredMaxFileSize = conf.getLong(HConstants.HREGION_MAX_FILESIZE,
        HConstants.DEFAULT_MAX_FILE_SIZE);
    }
  }

上面根据IncreasingToUpperBoundRegionSplitPolicy的shouldSplit()方法判断了:
region数量与max filesize 以及 当前region的store中是否包含引用文件等

下面我们继续看HRegion checkSplit()后面执行了什么:
splitPolicy.getSplitPoint()

IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()

//IncreasingToUpperBoundRegionSplitPolicy getSplitPoint()

经过上面的各种check和get mid ,现在终于要执行requestSplit了
CompactSplitThread requestSplit()

  public synchronized void requestSplit(final HRegion r, byte[] midKey) {
...
//this.splits  线程池,默认线程数为1
      this.splits.execute(new SplitRequest(r, midKey, this.server));
 ...
  }

看看SplitRequest的run()方法吧

  @Override
  public void run() {
   //判断server是否停止服务
    //split metric + 1 
    long startTime = EnvironmentEdgeManager.currentTime();
    SplitTransaction st = new SplitTransaction(parent, midKey);
    try {

      //获取table的读锁TableLock类型
        tableLock.acquire();
      } catch (IOException ex) {
        tableLock = null;
        throw ex;
      }

      // If prepare does not return true, for some reason -- logged inside in
      // the prepare call -- we are not ready to split just now. Just return.
      //st.prepare()也是比较重要的一步
      if (!st.prepare()) return;
      try {
        st.execute(this.server, this.server);
        success = true;
      } catch (Exception e) {
        ...
        try {
          //split失败,进行回滚操作
          if (st.rollback(this.server, this.server)) {
           ...
          } else {
           ...
          }
        } catch (RuntimeException ee) {
          ...
          this.server.abort(msg + " -- Cause: " + ee.getMessage());
        }
        return;
      }
    } catch (IOException ex) {
      ...
      server.checkFileSystem();
    } finally {
      //Coprocessor  postCompleteSplit
      if (parent.shouldForceSplit()) {
        parent.clearSplit();
      }
      //释放TableLock
      releaseTableLock();
      // Split succ LOG
    }//end finally
  }

先看一下prepare():
new了A,B两个HRegionInfo


  /**
   * Does checks on split inputs.
   * @return <code>true</code> if the region is splittable else
   * <code>false</code> if it is not (e.g. its already closed, etc.).
   */
  public boolean prepare() {
    //parent region如果不能被split,则直接return false
    //mid不能为null
    HRegionInfo hri = this.parent.getRegionInfo();
    parent.prepareToSplit();
    // Check splitrow.
    byte [] startKey = hri.getStartKey();
    byte [] endKey = hri.getEndKey();
    if (Bytes.equals(startKey, splitrow) ||
        !this.parent.getRegionInfo().containsRow(splitrow)) {
      LOG.info("Split row is not inside region key range or is equal to " +
          "startkey: " + Bytes.toStringBinary(this.splitrow));
      return false;
    }

    //构造regionId,如果构造的regionId小于parent regionId,则自动加1(保证在meta表中的顺序)
    long rid = getDaughterRegionIdTimestamp(hri);

    //创建A,B两个子region
    this.hri_a = new HRegionInfo(hri.getTable(), startKey, this.splitrow, false, rid);
    this.hri_b = new HRegionInfo(hri.getTable(), this.splitrow, endKey, false, rid);
    this.journal.add(new JournalEntry(JournalEntryType.PREPARED));
    return true;
  }

看看executor

  /**
   * Run the transaction.
   * @param server Hosting server instance.  Can be null when testing
   * @param services Used to online/offline regions.
   * @throws IOException If thrown, transaction failed.
   *          Call {@link #rollback(Server, RegionServerServices)}
   * @return Regions created
   * @throws IOException
   * @see #rollback(Server, RegionServerServices)
   */
  public PairOfSameType<HRegion> execute(final Server server,
      final RegionServerServices services)
  throws IOException {
    useZKForAssignment = server == null ? true :
      ConfigUtil.useZKForAssignment(server.getConfiguration());
    if (useCoordinatedStateManager(server)) {//状态判断
      std =
          ((BaseCoordinatedStateManager) server.getCoordinatedStateManager())
              .getSplitTransactionCoordination().getDefaultDetails();
    }
    PairOfSameType<HRegion> regions = createDaughters(server, services);
    if (this.parent.getCoprocessorHost() != null) {
      this.parent.getCoprocessorHost().preSplitAfterPONR();
    }
    return stepsAfterPONR(server, services, regions);
  }

createDaughters()负责 下线parent region 上线子region


  /**
   * 准备region和region files
   * 参数:services 用来上下线region
   * 返回的是创建的region
   */
  /* package */PairOfSameType<HRegion> createDaughters(final Server server,
      final RegionServerServices services) throws IOException {
    LOG.info("Starting split of region " + this.parent);
    if ((server != null && server.isStopped()) ||
        (services != null && services.isStopping())) {
      throw new IOException("Server is stopped or stopping");
    }
    assert !this.parent.lock.writeLock().isHeldByCurrentThread():
      "Unsafe to hold write lock while performing RPCs";

    journal.add(new JournalEntry(JournalEntryType.BEFORE_PRE_SPLIT_HOOK));

    // Coprocessor callback
    if (this.parent.getCoprocessorHost() != null) {
      // TODO: Remove one of these
      this.parent.getCoprocessorHost().preSplit();
      this.parent.getCoprocessorHost().preSplit(this.splitrow);
    }

    journal.add(new JournalEntry(JournalEntryType.AFTER_PRE_SPLIT_HOOK));

    // If true, no cluster to write meta edits to or to update znodes in.
    boolean testing = server == null? true:
        server.getConfiguration().getBoolean("hbase.testing.nocluster", false);
    this.fileSplitTimeout = testing ? this.fileSplitTimeout :
        server.getConfiguration().getLong("hbase.regionserver.fileSplitTimeout",
          this.fileSplitTimeout);

    PairOfSameType<HRegion> daughterRegions = stepsBeforePONR(server, services, testing);

    List<Mutation> metaEntries = new ArrayList<Mutation>();
    if (this.parent.getCoprocessorHost() != null) {
      if (this.parent.getCoprocessorHost().
          preSplitBeforePONR(this.splitrow, metaEntries)) {
        throw new IOException("Coprocessor bypassing region "
            + this.parent.getRegionNameAsString() + " split.");
      }
      try {
        for (Mutation p : metaEntries) {
          HRegionInfo.parseRegionName(p.getRow());
        }
      } catch (IOException e) {
        LOG.error("Row key of mutation from coprossor is not parsable as region name."
            + "Mutations from coprocessor should only for hbase:meta table.");
        throw e;
      }
    }

    // This is the point of no return.  Adding subsequent edits to .META. as we
    // do below when we do the daughter opens adding each to .META. can fail in
    // various interesting ways the most interesting of which is a timeout
    // BUT the edits all go through (See HBASE-3872).  IF we reach the PONR
    // then subsequent failures need to crash out this regionserver; the
    // server shutdown processing should be able to fix-up the incomplete split.
    // The offlined parent will have the daughters as extra columns.  If
    // we leave the daughter regions in place and do not remove them when we
    // crash out, then they will have their references to the parent in place
    // still and the server shutdown fixup of .META. will point to these
    // regions.
    // We should add PONR JournalEntry before offlineParentInMeta,so even if
    // OfflineParentInMeta timeout,this will cause regionserver exit,and then
    // master ServerShutdownHandler will fix daughter & avoid data loss. (See
    // HBase-4562).
    this.journal.add(new JournalEntry(JournalEntryType.PONR));

    // Edit parent in meta.  Offlines parent region and adds splita and splitb
    // as an atomic update. See HBASE-7721. This update to META makes the region
    // will determine whether the region is split or not in case of failures.
    // If it is successful, master will roll-forward, if not, master will rollback
    // and assign the parent region.

     //不是测试模式******************
    if (!testing && useZKForAssignment) {
      if (metaEntries == null || metaEntries.isEmpty()) {
        MetaTableAccessor.splitRegion(server.getConnection(),
          parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(),
          daughterRegions.getSecond().getRegionInfo(), server.getServerName(),
          parent.getTableDesc().getRegionReplication());
      } else {
      //元数据的变化  下线parent,并且更新新的region信息
        offlineParentInMetaAndputMetaEntries(server.getConnection(),
          parent.getRegionInfo(), daughterRegions.getFirst().getRegionInfo(), daughterRegions
              .getSecond().getRegionInfo(), server.getServerName(), metaEntries,
              parent.getTableDesc().getRegionReplication());
      }
    } else if (services != null && !useZKForAssignment) {
      if (!services.reportRegionStateTransition(TransitionCode.SPLIT_PONR,
          parent.getRegionInfo(), hri_a, hri_b)) {
        // Passed PONR, let SSH clean it up
        throw new IOException("Failed to notify master that split passed PONR: "
          + parent.getRegionInfo().getRegionNameAsString());
      }
    }
    return daughterRegions;
  }

executor的最后
stepsAfterPONR(server, services, regions) open新region,修改zookeeper里面的信息

/hbase/region-in-transition

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值
>