shell和javaAPI两种方式创建hbase表并预分区

最新推荐文章于 2022-05-17 23:35:14 发布

lijie_cq

最新推荐文章于 2022-05-17 23:35:14 发布

阅读量6.3k

点赞数 2

分类专栏： hbase 文章标签： hbase预分区 hbase建表 hbase分区 region个数 hbase-api

本文链接：https://blog.csdn.net/qq_20641565/article/details/56482407

版权

hbase 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

在hbase里面，如果我们建表不预分区，那么一个表的数据都会被一个region处理，如果数据过多就会执行region的split，如果数据量很大这样会很费性能，所以最好我们先根据业务的数据量在建表的时候就能指定region个数并且进行预先分区，下面说说两种创建表并且建立预分区的方法。

1.shell createTable并预分区：

hbase(main):002:0> create 'split01','cf1',SPLITS=>['1000000','2000000','3000000']
0 row(s) in 0.7880 seconds

=> Hbase::Table - split01

利用上面的命令创建表，会预先创建4个regin，每个regin都有个startKey和endKey，第一个region没有startKey，最后一个没有endKey：
第一个region：“ to 1000000”
第二个region：“1000000 to 2000000”
第三个region：“2000000to 3000000”
第四个region：“3000000 to ”
如下图所示：

这里写图片描述

也可以使用如下命令创建:

hbase(main):003:0> create 'split02','cf1',SPLITS_FILE=>'/usr/java/split.txt'
0 row(s) in 1.2420 seconds

=> Hbase::Table - split02

这里需要指定一个文件’/usr/java/split.txt’，文件里的内容如下（这里会创建6个region）：

这里写图片描述

2.javaAPI createTable并预分区：

在hbase包的Admin类中提供了4个create表的方法（前三个为同步创建，第四个为异步）：

一.直接根据描述创建表

这里是直接根据表描述创建表，不指定分区。

  /**
   * Creates a new table. Synchronous operation.
   *
   * @param desc table descriptor for table
   * @throws IllegalArgumentException if the table name is reserved
   * @throws MasterNotRunningException if master is not running
   * @throws org.apache.hadoop.hbase.TableExistsException if table already exists (If concurrent
   * threads, the table may have been created between test-for-existence and attempt-at-creation).
   * @throws IOException if a remote or network exception occurs
   */
  void createTable(HTableDescriptor desc) throws IOException;

二.根据描述和region个数以及startKey以及endKey自动分配

根据表描述以及指定startKey和endKey和region个数创建表，这里hbase会自动创建region个数，并且会为你的每一个region指定key的范围，但是所有的范围都是连续的且均匀的，如果业务key的某些范围内数据量很多有的很少，这样就会造成数据的数据的倾斜,这样的场景就必须自己指定分区的范围，可以用第三种或者第四种方式预分区。

/**
   * Creates a new table with the specified number of regions.  The start key specified will become
   * the end key of the first region of the table, and the end key specified will become the start
   * key of the last region of the table (the first region has a null start key and the last region
   * has a null end key). BigInteger math will be used to divide the key range specified into enough
   * segments to make the required number of total regions. Synchronous operation.
   *
   * @param desc table descriptor for table
   * @param startKey beginning of key range
   * @param endKey end of key range
   * @param numRegions the total number of regions to create
   * @throws IllegalArgumentException if the table name is reserved
   * @throws MasterNotRunningException if master is not running
   * @throws org.apache.hadoop.hbase.TableExistsException if table already exists (If concurrent
   * threads, the table may have been created between test-for-existence and attempt-at-creation).
   * @throws IOException
   */
  void createTable(HTableDescriptor desc, byte[] startKey, byte[] endKey, int numRegions)
      throws IOException;

三.根据表的描述和自定义的分区设置创建表（同步）

根据表的描述和自定义的分区设置创建表，这个就可以自己自定义指定region执行的key的范围，比如：

byte[][] splitKeys = new byte[][] { Bytes.toBytes("100000"),
                Bytes.toBytes("200000"), Bytes.toBytes("400000"),
                Bytes.toBytes("500000") };

调用接口的时候splitKeys传入上面的值，那么他会自动创建5个region并且为之分配key的分区范围。
startKey，最后一个没有endKey：
第一个region：“ to 100000”
第二个region：“100000 to 200000”
第三个region：“200000 to 400000” 这里的key的跨度是其他的两倍（根据业务需求可以自己定义）
第四个region：“400000 to 500000”
第五个region：“500000 to ”

/**
   * Creates a new table with an initial set of empty regions defined by the specified split keys.
   * The total number of regions created will be the number of split keys plus one. Synchronous
   * operation. Note : Avoid passing empty split key.
   *
   * @param desc table descriptor for table
   * @param splitKeys array of split keys for the initial regions of the table
   * @throws IllegalArgumentException if the table name is reserved, if the split keys are repeated
   * and if the split key has empty byte array.
   * @throws MasterNotRunningException if master is not running
   * @throws org.apache.hadoop.hbase.TableExistsException if table already exists (If concurrent
   * threads, the table may have been created between test-for-existence and attempt-at-creation).
   * @throws IOException
   */
  void createTable(final HTableDescriptor desc, byte[][] splitKeys) throws IOException;

四.根据表的描述和自定义的分区设置创建表（异步）

同上面的三是一样的，不过是异步执行。

/**
   * Creates a new table but does not block and wait for it to come online. Asynchronous operation.
   * To check if the table exists, use {@link #isTableAvailable} -- it is not safe to create an
   * HTable instance to this table before it is available. Note : Avoid passing empty split key.
   *
   * @param desc table descriptor for table
   * @throws IllegalArgumentException Bad table name, if the split keys are repeated and if the
   * split key has empty byte array.
   * @throws MasterNotRunningException if master is not running
   * @throws org.apache.hadoop.hbase.TableExistsException if table already exists (If concurrent
   * threads, the table may have been created between test-for-existence and attempt-at-creation).
   * @throws IOException
   */
  void createTableAsync(final HTableDescriptor desc, final byte[][] splitKeys) throws IOException;