Hadoop HBase建表时预分区(region)的方法学习

最新推荐文章于 2023-09-05 16:49:46 发布

艾伦蓝

最新推荐文章于 2023-09-05 16:49:46 发布

阅读量1.2k

点赞数

分类专栏： Hadoop HBase 文章标签：大数据 shell

本文链接：https://blog.csdn.net/lan12334321234/article/details/84885425

版权

Hadoop 同时被 2 个专栏收录

82 篇文章 0 订阅

订阅专栏

HBase

24 篇文章 0 订阅

订阅专栏

如果知道Hbase数据表的key的分布情况，就可以在建表的时候对hbase进行region的预分区。这样做的好处是[color=blue][b]防止大数据量插入的热点问题，提高数据插入的效率。[/b][/color]

[color=red][size=large][b]1.规划hbase预分区[/b][/size][/color]
-------------------------
首先就是要想明白数据的key是如何分布的，然后规划一下要分成多少region，[b]每个region的startkey和endkey是多少，然后将规划的key写到一个文件中。[/b]比如，key的前几位字符串都是从0001~0010的数字，这样可以分成10个region，划分key的文件如下：

为什么后面会跟着一个"|"，[color=red][b]是因为在ASCII码中，"|"的值是124，大于所有的数字和字母等符号，当然也可以用“~”（ASCII-126）。[/b][/color]分隔文件的第一行为第一个region的stopkey，每行依次类推，最后一行不仅是倒数第二个region的stopkey，同时也是最后一个region的startkey。也就是说分区文件中填的都是key取值范围的分隔点，如下图所示：

[img]http://dl2.iteye.com/upload/attachment/0124/8649/459daa9b-3f5f-34e8-b2bf-9b02604b7baa.jpg[/img]

[color=red][b]2.hbase shell中建分区表，指定分区文件[/b][/color]
-------------------------------------
在hbase shell中直接输入create，会看到如下的提示：



Create a table with namespace=ns1 and table qualifier=t1  
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}  

Create a table with namespace=default and table qualifier=t1  
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}  
  hbase> # The above in shorthand would be the following:  
  hbase> create 't1', 'f1', 'f2', 'f3'  
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}  
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}  

Table configuration options can be put at the end.  
Examples:  

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']  
  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']  
  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'  
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }  
  hbase> # Optionally pre-split the table into NUMREGIONS, using  
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)  
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}  
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}  
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'if1', LOCAL_INDEX=>'COMBINE_INDEX|INDEXED=f1:q1:8|rowKey:rowKey:10,UPDATE=true'}

可以通过指定SPLITS_FILE的值指定分区文件,如果分区信息比较少，也可以直接用SPLITS分区。我们可以通过如下命令建一个分区表，指定第一步中生成的分区文件：


create 'split_table_test', 'cf', {SPLITS_FILE => 'region_split_info.txt'}

[size=medium][color=red][b]SNAPPY压缩[/b][/color][/size]
--------------------------------


create 'split_table_test',{NAME =>'cf', COMPRESSION => 'SNAPPY'}, {SPLITS_FILE => '/tmp/region_split_info.txt'}

这里注意，一定要将分区的参数指定单独用一个大括号扩起来，因为分区是针对全表，而不是针对某一个column family。

转自：[url]http://blog.csdn.net/chaolovejia/article/details/46375849#[/url]

艾伦蓝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录