在不同hbase集群中迁移数据

最新推荐文章于 2019-11-04 20:11:20 发布

predict_wise

最新推荐文章于 2019-11-04 20:11:20 发布

阅读量495

点赞数 1

本文链接：https://blog.csdn.net/predict_wise/article/details/53766677

版权

CopyTable

（1）首先，看一下CopyTable命令的使用方法

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

Options:
 rs.class     hbase.regionserver.class of the peer cluster
     specify if different from current cluster
 rs.impl      hbase.regionserver.impl of the peer cluster
 startrow     the start row
 stoprow      the stop row
 starttime    beginning of the time range (unixtime in millis)
     without endtime means from starttime to forever
 endtime      end of the time range.  Ignored if no starttime specified.
 versions     number of cell versions to copy
 new.name     new table's name
 peer.adr     Address of the peer cluster given in the format
     hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
 families     comma-separated list of families to copy
     To copy from cf1 to cf2, give sourceCfName:destCfName. 
     To keep the same name, just give "cfName"
 all.cells    also copy delete markers and deleted cells

Args:
 tablename    Name of the table to copy

Examples:
 To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable 
 --starttime=1265875194289 
 --endtime=1265878794289 
 --peer.adr=server1,server2,server3:2181:/hbase 
 --families=myOldCf:myNewCf,cf2,cf3 
 TestTable 

For performance consider the following general options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false

（2）三步畅玩数据迁移

从集群A（源集群）中收集表的splits
从HBase Web UI（hbasemaster:16010）中获取所要迁移的表所拥有的End Key，End Key即为splits
收集End Key的目的在于对目的集群上的表做预分割，使数据迁移之后不会全都集中在一个region上，实现负载均衡

在集群B（目的集群）上使用从步骤1中收集到的splits创建表

create 'tablename','cf-name',{SPLITS => ['endkey-1','endkey-2',....,'endkey-n']}

使用CopyTable命令将表数据从ClusterA迁移到ClusterB中
在ClusterA上执行：

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable 
-Dhbase.client.scanner.caching=1000
-Dmapred.map.tasks.speculative.execution=false
--peer.adr=worker1,worker2,...,workern:2181:/hbase 
'tablename'

注意事项

在ClusterA的/etc/hosts 中添加worker1,worker2,…,workern的ip地址，否则在执行mapreduce作业的时候会一直卡在“map 0% reduce 0%” 处。

sudo vim /etc/hosts

192.168.1.100 worker1
192.168.1.101 worker2
....

predict_wise

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
在不同hbase集群中迁移数据

CopyTable（1）首先，看一下CopyTable命令的使用方法$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTableUsage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>O
复制链接

扫一扫