hbase 之 presplit

最新推荐文章于 2022-10-09 16:13:39 发布

_牧童

最新推荐文章于 2022-10-09 16:13:39 发布

阅读量1.5k

点赞数

分类专栏： hbase

本文链接：https://blog.csdn.net/asdfasdfsadffds/article/details/24498367

版权

hbase 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

表在初始化的时候默认会创建一个region，对于批量导入的时候所有的客户端都会向一个region上面写，直到它足够大然后分裂分不到集群不同机器上，一个有用的方式加快批量导入的方法是预分区、不过region太多的话会影响性能，参考：TODO

对于预分区有两种方式,HexStringSplit andUniformSplit 是两种预定义的分区方式：

第一种主要处理前缀为十六进制的rowkey ，此处要特别注意，如果rowkey不包含在十六进制范围内的话，会造成中间很大一部分空隙。详细原因大家参考http://hbase.apache.org/book.html#rowkey.regionsplits

下面是一个十六进制分割的一个例子：

public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
  try {
    admin.createTable( table, splits );
    return true;
  } catch (TableExistsException e) {
    logger.info("table " + table.getNameAsString() + " already exists");
    // the table already exists...
    return false;
  }
}

public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
  byte[][] splits = new byte[numRegions-1][];
  BigInteger lowestKey = new BigInteger(startKey, 16);
  BigInteger highestKey = new BigInteger(endKey, 16);
  BigInteger range = highestKey.subtract(lowestKey);
  BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
  lowestKey = lowestKey.add(regionIncrement);
  for(int i=0; i < numRegions-1;i++) {
    BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
    byte[] b = String.format("%016x", key).getBytes();
    splits[i] = b;
  }
  return splits;
}

上面只是一个描述、我们在使用的时候可以之间调用HbaseAdmin 就行了。

通过命令行我们可以

hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1

-c 表示创建表 10表示10个region -f 指定列族，这样机会创建一个预分区为10的test_table

第二种uniform 范围是00--ff，然后右面补零

再就是可以自己扩展自己的SplitAlgorithm

下面是一些命令行创建split的介绍：

使用HBase Shell建表的时候，除了一些常用的option以外，我们还可以同时建立一些预分区，这样可以预防初次插入数据时热点问题。

通过直接输入create，我们可以看到有如下提示：

Examples:
 
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
  hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

例子中仅给出了要么有普通option，要么是有指定分区等选项，但是没有给出既有普通option(例如VERSIONS，COMPRESSION等)，又创建预分区的例子。

如果有这个需求呢？如下对吗？

create 't', {NAME => 'f', VERSIONS => 1, COMPRESSION => 'SNAPPY', SPLITS => ['10','20','30']}

运行后发现肯定是不行的。正确的写法应该是这样的：

create 't', {NAME => 'f', VERSIONS => 1, COMPRESSION => 'SNAPPY'},
    {SPLITS => ['10','20','30']}

因为分区时针对全表而非某个Column Family的。

辅助参考：

http://zh.hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

_牧童

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录