HBase Merging Regions

最新推荐文章于 2022-02-25 23:57:00 发布

macyang

最新推荐文章于 2022-02-25 23:57:00 发布

阅读量4.8k

点赞数

分类专栏： database/nosql 文章标签： hbase merge table command each shell

本文链接：https://blog.csdn.net/macyang/article/details/6624482

版权

database/nosql 专栏收录该内容

102 篇文章 0 订阅

订阅专栏

我承认我之前不知道hbase还能做merge region操作，而且它适合在什么情况下用呢，下面的这篇文章给出了一些结论：

有的时候region个数太多不是什么好事情，所以merge region大势所趋啦～　

While it is much more common for regions to split automatically over time as you are adding data to the corresponding table, there might be situations where you need to merge regions, for example, after you have removed a large amount of data and you want to reduce the number of regions hosted by each server.

HBase ships with a tool that allows you to merge two adjacent regions as long as the cluster is not online. You can use the command line tool to get the usage details:

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>

Here is an example of a table that has more than one region, which are then subsequently merged:

$ ./bin/hbase shell

hbase(main):001:0> create 'testtable', 'colfam1', \
 {SPLITS => ['row-10','row-20','row-30','row-40','row-50']}
0 row(s) in 0.2640 seconds

hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do \
 put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
0 row(s) in 1.0450 seconds

hbase(main):003:0> flush 'testtable'
0 row(s) in 0.2000 seconds

hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
ROW                                  COLUMN+CELL
 testtable,,1309614509037.612d1e0112 column=info:regioninfo, timestamp=130...
 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10'
 testtable,row-10,1309614509040.2fba column=info:regioninfo, timestamp=130...
 fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY => 'row-20'
 testtable,row-20,1309614509041.e7c1 column=info:regioninfo, timestamp=130...
 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY => 'row-30'
 testtable,row-30,1309614509041.a9cd column=info:regioninfo, timestamp=130...
 e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY => 'row-40'
 testtable,row-40,1309614509041.d458 column=info:regioninfo, timestamp=130...
 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY => 'row-50'
 testtable,row-50,1309614509041.74a5 column=info:regioninfo, timestamp=130...
 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''
6 row(s) in 0.0440 seconds

hbase(main):005:0> exit

$ ./bin/stop-hbase.sh

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge testtable \
 testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
 testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.

The example creates a table with five split points, resulting in six regions. It then inserts some rows and flushes the data to ensure that there are store files for the subsequent merge. The scan is used to get the names of the regions, but you can also use the web UI of the master: click on the table name in the User Tables section to get the same list of regions.

Note

Note how the shell wraps the values in each column. The region name is split over two lines, which you need to copy&paste separately. The web UI is easier to use in that respect as it has the names in one column and in a single line.

The content of the column values is abbreviated to the start and end keys. You can see how the create command using the split keys has created the regions. The example goes on to exit the shell, and stop the HBase cluster. Note that HDFS still needs to run for the merge to work as it needs to read the store files of each region and merge them into a new combined one.