CopyTable
(1)首先,看一下CopyTable命令的使用方法
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
Options:
rs.class hbase.regionserver.class of the peer cluster
specify if different from current cluster
rs.impl hbase.regionserver.impl of the peer cluster
startrow the start row
stoprow the stop row
starttime beginning of the time range (unixtime in millis)
without endtime means from starttime to forever
endtime end of the time range. Ignored if no starttime specified.
versions number of cell versions to copy
new.name new table's name
peer.adr Address of the peer cluster given in the format
hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
families comma-separated list of families to copy
To copy from cf1 to cf2, give sourceCfName:destCfName.
To keep the same name, just give "cfName"
all.cells also copy delete markers and deleted cells
Args:
tablename Name of the table to copy
Examples:
To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
--starttime=1265875194289
--endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase
--families=myOldCf:myNewCf,cf2,cf3
TestTable
For performance consider the following general options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false
(2)三步畅玩数据迁移
从集群A(源集群)中收集表的splits
从HBase Web UI(hbasemaster:16010)中获取所要迁移的表所拥有的End Key,End Key即为splits
收集End Key的目的在于对目的集群上的表做预分割,使数据迁移之后不会全都集中在一个region上,实现负载均衡在集群B(目的集群)上使用从步骤1中收集到的splits创建表
create 'tablename','cf-name',{SPLITS => ['endkey-1','endkey-2',....,'endkey-n']}
使用CopyTable命令将表数据从ClusterA迁移到ClusterB中
在ClusterA上执行:$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.client.scanner.caching=1000 -Dmapred.map.tasks.speculative.execution=false --peer.adr=worker1,worker2,...,workern:2181:/hbase 'tablename'
注意事项
在ClusterA的/etc/hosts 中添加worker1,worker2,…,workern的ip地址,否则在执行mapreduce作业的时候会一直卡在“map 0% reduce 0%” 处。
sudo vim /etc/hosts
192.168.1.100 worker1
192.168.1.101 worker2
....