HBase Region split 策略


Region split policy

HBase的region split策略一共有以下几种:

  • IncreasingToUpperBoundRegionSplitPolicy
  • ConstantSizeRegionSplitPolicy
  • DisabledRegionSplitPolicy
  • KeyPrefixRegionSplitPolicy
  • DelimitedKeyPrefixRegionSplitPolicy




Split size is the number of regions that are on this server that all are
of the same table, cubed, times 2x the region flush size OR the maximum
region split size, whichever is smaller. For example, if the flush size
is 128M, then after two flushes (256MB) we will split which will make two regions
that will split when their size is 2^3 * 128M * 2 = 2048M. If one of these
regions splits, then there are three regions and now the split size is
3^3 * 128M * 2 = 6912M, and so on until we reach the configured
maximum filesize and then from there on out, we’ll use that.

region split的计算公式是:regioncount^3 * 128M * 2,当region达到该size的时候进行split


   * @return Region max size or <code>count of regions squared * flushsize, which ever is
   * smaller; guard against there being zero regions on this server.
  protected long getSizeToCheck(final int tableRegionsCount) {
    // safety check for 100 to avoid numerical overflow in extreme cases
    return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():
        this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);


 Math.min(getDesiredMaxFileSize(), this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount)

// getDesiredMaxFileSize() 这个值是hbase.hregion.max.filesize参数值,10GB
// this.initialSize值为2 * hbase.hregion.memstore.flush.size256MB
// 最终是取Math.min(10G, 256 * regioncount^3)

第一次split:1^3 * 256 = 256MB
第二次split:2^3 * 256 = 2048MB
第三次split:3^3 * 256 = 6912MB
第四次split:4^3 * 256 = 16384MB > 10GB,因此取较小的值10GB


This is the default split policy. From 0.94.0 on the default split policy has changed to {@link IncreasingToUpperBoundRegionSplitPolicy}

0.94.0之前该策略是region的默认split策略,0.94.0之后region的默认split策略为IncreasingToUpperBoundRegionSplitPolicy,当region size达到hbase.hregion.max.filesize(默认10G)配置的大小后进行split。



This should be used with care, since it will disable automatic sharding.




A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key

根据rowKey的前缀对数据进行分组,这里是指定rowKey的前多少位作为前缀,比如rowKey都是16位的,指定前5位是前缀,那么前5位相同的rowKey在进行region split的时候会分到相同的region中。



A custom RegionSplitPolicy implementing a SplitPolicy that groups rows by a prefix of the row-key with a delimiter. Only the first delimiter for the row key will define the prefix of the row key that is used for grouping.This ensures that a region is not split “inside” a prefix of a row key.
I.e. rows can be co-located in a region by their prefix.
As an example, if you have row keys delimited with _ , like userid_eventtype_eventid, and use prefix delimiter _, this split policy ensures that all rows starting with the same userid, belongs to the same region.

保证相同前缀的数据在同一个region中,例如rowKey的格式为:userid_eventtype_eventid,指定的delimiter为 _ ,则split的的时候会确保userid相同的数据在同一个region中。

©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页