Hbase预分区参考

背景

   我们都知道hbase的数据是分布在多台RegionServer角色的机器上的,每个RegionServer都有一到多个Region管理不同rowkey范围的数据,所以建表前通过合理的Region的分区及数量,可以避免热点读写问题和充分利用各RegionServer的资源,vmaster-hbase提供了预分区的功能             

手动分区

   用户根据数据特点和资源组机器数量提供分割点

1.1分割点是字符串

   

1.2分割点是整数

     hbase存储的都是二进制的byte,所有Int类型的分割点都要转换为十六进制传入,比如我们有如下分割点:1,10,15,每个分割点都是一个Int类型,可以利用Bytes.toHex(Bytes.toBytes(splitPoint))得出分割点的十六进制表示:  分割点十六进制表示

1.2.1分割点十六进制表示

Int十六进制表示
1\x00\x00\x00\x01
10\x00\x00\x00\x0a
15\x00\x00\x00\x0f  

1.2.2分割点测试

 

自动分区

2.1 HexStringSplit

分区数根据机器数选择,推荐每台机器20~30个region

rowkey是整数时,建议采用此分区算法,HexStringSplit将整个无符号整数范围00000000~FFFFFFFF根据region数据平均划分,转化为十六进制字符,长度不够8自动左填充'0',调用Bytes.toBytes(bigIntegerString)转到字节数组,核心代码如下:

 2.1.1Rowkey范围切分

    public byte[][] split(int n) {
      Preconditions.checkArgument(lastRowInt.compareTo(firstRowInt) > 0,
          "last row (%s) is configured less than first row (%s)", lastRow,
          firstRow);
      // +1 to range because the last row is inclusive
      BigInteger range = lastRowInt.subtract(firstRowInt).add(BigInteger.ONE);
      Preconditions.checkState(range.compareTo(BigInteger.valueOf(n)) >= 0,
          "split granularity (%s) is greater than the range (%s)", n, range);

      BigInteger[] splits = new BigInteger[n - 1];
      BigInteger sizeOfEachSplit = range.divide(BigInteger.valueOf(n));
      for (int i = 1; i < n; i++) {
        // NOTE: this means the last region gets all the slop.
        // This is not a big deal if we're assuming n << MAXHEX
        splits[i - 1] = firstRowInt.add(sizeOfEachSplit.multiply(BigInteger
            .valueOf(i)));
      }
      return convertToBytes(splits);
    }

2.1.2分割点转为字节数组

    /**
     * Returns the bytes corresponding to the BigInteger
     *
     * @param bigInteger number to convert
     * @param pad padding length
     * @return byte corresponding to input BigInteger
     */
    public static byte[] convertToByte(BigInteger bigInteger, int pad) {
      String bigIntegerString = bigInteger.toString(16);
      bigIntegerString = StringUtils.leftPad(bigIntegerString, pad, '0');
      return Bytes.toBytes(bigIntegerString);
    }

2.2 UniformSplit

     分区数根据机器数选择,推荐每台机器20~30个region

     当rowkey是原始字节数组byte[],raw byte的范围是\x00~\xff,rowKey接近统一随机的byte值比如hashes,采用此分区算法,UniformSplit采用BigInteger的toByteArray()转化分割点

2.2.1分割点算法

  /**
   * Iterate over keys within the passed range.
   */
  public static Iterable<byte[]> iterateOnSplits(
      final byte[] a, final byte[]b, boolean inclusive, final int num)
  {
    byte [] aPadded;
    byte [] bPadded;
    if (a.length < b.length) {
      aPadded = padTail(a, b.length - a.length);
      bPadded = b;
    } else if (b.length < a.length) {
      aPadded = a;
      bPadded = padTail(b, a.length - b.length);
    } else {
      aPadded = a;
      bPadded = b;
    }
    if (compareTo(aPadded,bPadded) >= 0) {
      throw new IllegalArgumentException("b <= a");
    }
    if (num <= 0) {
      throw new IllegalArgumentException("num cannot be <= 0");
    }
    byte [] prependHeader = {1, 0};
    final BigInteger startBI = new BigInteger(add(prependHeader, aPadded));
    final BigInteger stopBI = new BigInteger(add(prependHeader, bPadded));
    BigInteger diffBI = stopBI.subtract(startBI);
    if (inclusive) {
      diffBI = diffBI.add(BigInteger.ONE);
    }
    final BigInteger splitsBI = BigInteger.valueOf(num + 1);
    //when diffBI < splitBI, use an additional byte to increase diffBI
    if(diffBI.compareTo(splitsBI) < 0) {
      byte[] aPaddedAdditional = new byte[aPadded.length+1];
      byte[] bPaddedAdditional = new byte[bPadded.length+1];
      for (int i = 0; i < aPadded.length; i++){
        aPaddedAdditional[i] = aPadded[i];
      }
      for (int j = 0; j < bPadded.length; j++){
        bPaddedAdditional[j] = bPadded[j];
      }
      aPaddedAdditional[aPadded.length] = 0;
      bPaddedAdditional[bPadded.length] = 0;
      return iterateOnSplits(aPaddedAdditional, bPaddedAdditional, inclusive,  num);
    }
    final BigInteger intervalBI;
    try {
      intervalBI = diffBI.divide(splitsBI);
    } catch(Exception e) {
      LOG.error("Exception caught during division", e);
      return null;
    }

    final Iterator<byte[]> iterator = new Iterator<byte[]>() {
      private int i = -1;

      @Override
      public boolean hasNext() {
        return i < num+1;
      }

      @Override
      public byte[] next() {
        i++;
        if (i == 0) return a;
        if (i == num + 1) return b;

        BigInteger curBI = startBI.add(intervalBI.multiply(BigInteger.valueOf(i)));
        byte [] padded = curBI.toByteArray();
        if (padded[1] == 0)
          padded = tail(padded, padded.length - 2);
        else
          padded = tail(padded, padded.length - 1);
        return padded;
      }

      @Override
      public void remove() {
        throw new UnsupportedOperationException();
      }

    };

    return new Iterable<byte[]>() {
      @Override
      public Iterator<byte[]> iterator() {
        return iterator;
      }
    };
  }

    

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值