在说明怎样做HBase region预拆分之前,要先介绍一个概念:region热点。什么是region热点?对于拥有很多region的大表来说,大部分region分布并不是均匀的, 有些regionserver具有较少的region,有些则具有较多的region。那么即使是使用随机的方式往表写数据,region多的那台服务器的负载也会大于其他的regionserver,这就形成了region热点。
解决region热点有多种方式,其中一个方法就是在HBase建表的就把region进行预拆分。默认情况下,创建HBase表只会有一个region,但HBase提供这样的方法可以让用户在创建表的时候指定region的数目来进行预拆分。
下面就通过HBase Shell介绍如何在创建表的时候将表进行预拆分?
hbase(main):001:0> create 'test_split_tbl', 'colfam1', {SPLITS => ['ROW-100','ROW-200','ROW-300','ROW-400']}
0 row(s) in 0.4820 seconds
=> Hbase::Table - test_split_tbl
打开HBase Master Web UI,查看对应的region,http://192.168.0.47:60010/master-status
打开HBase Regionserver Web UI,查看对应的region,http://192.168.0.16:60030/rs-status
除了在HBase Shell的建表语句中指定SPLITS,也可以用以下方式,
[root@cent-1 bin]# ./hbase org.apache.hadoop.hbase.util.RegionSplitter
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
usage: RegionSplitter <TABLE> <SPLITALGORITHM>
SPLITALGORITHM is a java class name of a class
implementing SplitAlgorithm, or one of the special
strings HexStringSplit or UniformSplit, which are
built-in split algorithms. HexStringSplit treats
keys as hexadecimal ASCII, and UniformSplit treats
keys as arbitrary bytes.
-c <region count> Create a new table with a pre-split number of
regions
-D <property=value> Override HBase Configuration Settings
-f <family:family:...> Column Families to create with new table.
Required with -c
--firstrow <arg> First Row in Table for Split Algorithm
-h Print this usage help
--lastrow <arg> Last Row in Table for Split Algorithm
-o <count> Max outstanding splits that have unfinished
major compactions
-r Perform a rolling split of an existing region
--risky Skip verification steps to complete
quickly.STRONGLY DISCOURAGED for production
systems.
[root@cent-1 bin]# ./hbase org.apache.hadoop.hbase.util.RegionSplitter -c 10 test -f colfam1:colfam2:colfam3