HBase Merging Regions

我承认我之前不知道hbase还能做merge region操作,而且它适合在什么情况下用呢,下面的这篇文章给出了一些结论:

有的时候region个数太多不是什么好事情,所以merge region大势所趋啦~ 

While it is much more common for regions to split automatically over time as you are adding data to the corresponding table, there might be situations where you need to merge regions, for example, after you have removed a large amount of data and you want to reduce the number of regions hosted by each server.

HBase ships with a tool that allows you to merge two adjacent regions as long as the cluster is not online. You can use the command line tool to get the usage details:

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>

Here is an example of a table that has more than one region, which are then subsequently merged:

$ ./bin/hbase shell

hbase(main):001:0> create 'testtable', 'colfam1', \
 {SPLITS => ['row-10','row-20','row-30','row-40','row-50']}
0 row(s) in 0.2640 seconds

hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do \
 put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
0 row(s) in 1.0450 seconds

hbase(main):003:0> flush 'testtable'
0 row(s) in 0.2000 seconds

hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
ROW                                  COLUMN+CELL
 testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130...
 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10'
 testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130...
 fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY => 'row-20'
 testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130...
 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY => 'row-30'
 testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130...
 e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY => 'row-40'
 testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130...
 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY => 'row-50'
 testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130...
 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''
6 row(s) in 0.0440 seconds

hbase(main):005:0> exit

$ ./bin/stop-hbase.sh

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge testtable \
 testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
 testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.


The example creates a table with five split points, resulting in six regions. It then inserts some rows and flushes the data to ensure that there are store files for the subsequent merge. The scan is used to get the names of the regions, but you can also use the web UI of the master: click on the table name in the User Tables section to get the same list of regions.

Note

Note how the shell wraps the values in each column. The region name is split over two lines, which you need to copy&paste separately. The web UI is easier to use in that respect as it has the names in one column and in a single line.

The content of the column values is abbreviated to the start and end keys. You can see how the create command using the split keys has created the regions. The example goes on to exit the shell, and stop the HBase cluster. Note that HDFS still needs to run for the merge to work as it needs to read the store files of each region and merge them into a new combined one.




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值