HBase备份之ExportSnapshot或CopyTable

文章《HBase备份之导入导出》介绍了使用HBase的自带工具Export和Import来实现在主集群和从集群之间拷贝表的目的。本篇介绍一种相比导入导出而言,更快速的一种备份办法。即ExportSnapshot。

1、ExportSnapshot

和Export类似,ExportSnapshot也是使用MapReduce方式来进行表的拷贝。不过和Export不同,ExportSnapshot导出的是表的快照。我们可以使用ExportSnapshot将表的快照数据先导出到从集群,然后再从集群中使用restore_snapshot命令恢复快照,即可实现表在主从集群之间的复制工作。具体的操作步骤如下:

1)在主集群中为表建立快照

 

 
  1. $ cd $HBASE_HOME/

  2. $ bin/hbase shell

  3. 2014-08-13 15:59:12,495 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

  4. HBase Shell; enter 'help<RETURN>' for list of supported commands.

  5. Type "exit<RETURN>" to leave the HBase Shell

  6. Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

  7.  
  8. hbase(main):001:0> snapshot 'test_table', 'test_table_snapshot'

  9. 0 row(s) in 0.3370 seconds

 

2)使用ExportSnapshot命令导出快照数据

 

 
  1. $ cd $HBASE_HOME/

  2. $ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase

其中,test_table_snapshot为刚建的快照名,hdfs://follow_cluster_namenode:8082/hbase为从集群的hbase的hdfs根目录的全路径。

 

ExportSnapshot命令也可以限定mapper个数,如下:

 

$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase -mapers n

还可以限定拷贝的流量,如下:

 

 

$ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot test_table_snapshot -copy-to hdfs://follow_cluster_namenode:8082/hbase -mapers n -bandwidth 200

上面的例子将拷贝的流量限定为200M。

 

执行ExportSnapshot命令之后的输出很长,部分如下:

 

 
  1. 2014-08-13 16:08:26,318 INFO [main] mapreduce.Job: Running job: job_1407910396081_0027

  2. 2014-08-13 16:08:33,494 INFO [main] mapreduce.Job: Job job_1407910396081_0027 running in uber mode : false

  3. 2014-08-13 16:08:33,495 INFO [main] mapreduce.Job: map 0% reduce 0%

  4. 2014-08-13 16:08:41,567 INFO [main] mapreduce.Job: map 100% reduce 0%

  5. 2014-08-13 16:08:42,581 INFO [main] mapreduce.Job: Job job_1407910396081_0027 completed successfully

  6. 2014-08-13 16:08:42,677 INFO [main] mapreduce.Job: Counters: 30

  7. File System Counters

  8. FILE: Number of bytes read=0

  9. FILE: Number of bytes written=116030

  10. FILE: Number of read operations=0

  11. FILE: Number of large read operations=0

  12. FILE: Number of write operations=0

  13. HDFS: Number of bytes read=1386

  14. HDFS: Number of bytes written=988

  15. HDFS: Number of read operations=7

  16. HDFS: Number of large read operations=0

  17. HDFS: Number of write operations=3

  18. Job Counters

  19. Launched map tasks=1

  20. Rack-local map tasks=1

  21. Total time spent by all maps in occupied slots (ms)=13518

  22. Total time spent by all reduces in occupied slots (ms)=0

  23. Map-Reduce Framework

  24. Map input records=1

  25. Map output records=0

  26. Input split bytes=174

  27. Spilled Records=0

  28. Failed Shuffles=0

  29. Merged Map outputs=0

  30. GC time elapsed (ms)=23

  31. CPU time spent (ms)=1860

  32. Physical memory (bytes) snapshot=323575808

  33. Virtual memory (bytes) snapshot=1867042816

  34. Total committed heap usage (bytes)=1029177344

  35. org.apache.hadoop.hbase.snapshot.ExportSnapshot$Counter

  36. BYTES_COPIED=988

  37. BYTES_EXPECTED=988

  38. FILES_COPIED=1

  39. File Input Format Counters

  40. Bytes Read=224

  41. File Output Format Counters

  42. Bytes Written=0

  43. 2014-08-13 16:08:42,685 INFO [main] snapshot.ExportSnapshot: Finalize the Snapshot Export

  44. 2014-08-13 16:08:42,697 INFO [main] snapshot.ExportSnapshot: Verify snapshot validity

  45. 2014-08-13 16:08:42,698 INFO [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS

  46. 2014-08-13 16:08:42,713 INFO [main] snapshot.ExportSnapshot: Export Completed: test_table_snapshot

3)到从集群中恢复快照

 
  1. $ cd $HBASE_HOME/

  2. $ bin/hbase shell

  3. 2014-08-13 16:16:13,817 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

  4. HBase Shell; enter 'help<RETURN>' for list of supported commands.

  5. Type "exit<RETURN>" to leave the HBase Shell

  6. Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

  7.  
  8. hbase(main):001:0> restore_snapshot 'test_table_snapshot'

  9. 0 row(s) in 16.4940 seconds

4)查看表是否恢复成功

 

 

 
  1. hbase(main):002:0> list

  2. TABLE test_table

  3. 1 row(s) in 1.0460 seconds

  4.  
  5. => ["test_table"]

另外,还可以通过scan或count命令进行检验。

 

快照恢复操作一般会很快,相比较Export和Import需要导出和导入两次MapReduce任务才能完成表的复制来讲,使用ExportSnapshot会快很多。

2、CopyTable

首先,看一下CopyTable命令的使用方法

 

 
  1. $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable

  2. Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

  3.  
  4. Options:

  5. rs.class hbase.regionserver.class of the peer cluster

  6. specify if different from current cluster

  7. rs.impl hbase.regionserver.impl of the peer cluster

  8. startrow the start row

  9. stoprow the stop row

  10. starttime beginning of the time range (unixtime in millis)

  11. without endtime means from starttime to forever

  12. endtime end of the time range. Ignored if no starttime specified.

  13. versions number of cell versions to copy

  14. new.name new table's name

  15. peer.adr Address of the peer cluster given in the format

  16. hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent

  17. families comma-separated list of families to copy

  18. To copy from cf1 to cf2, give sourceCfName:destCfName.

  19. To keep the same name, just give "cfName"

  20. all.cells also copy delete markers and deleted cells

  21.  
  22. Args:

  23. tablename Name of the table to copy

  24.  
  25. Examples:

  26. To copy 'TestTable' to a cluster that uses replication for a 1 hour window:

  27. $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable

  28. For performance consider the following general options:

  29. -Dhbase.client.scanner.caching=100

  30. -Dmapred.map.tasks.speculative.execution=false

可以看到,它支持设定需要复制的表的时间范围,cell的版本,也可以指定列簇,设定从集群的地址等。

 

对于上面的test_table表,我们可以使用如下命令进行拷贝:

 

$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=slave1,slave2,slave3:2181:/hbase  test_table

注意:在使用上述语句之前,需要在从集群建立一个模式和主集群表test_table相同的表。

使用上述语句的部分执行结果如下:

 

 
  1. 2014-08-13 16:18:21,812 INFO [main] mapreduce.Job: Running job: job_1407910396081_0062

  2. 2014-08-13 16:18:29,955 INFO [main] mapreduce.Job: Job job_1407910396081_0062 running in uber mode : false

  3. 2014-08-13 16:18:29,957 INFO [main] mapreduce.Job: map 0% reduce 0%

  4. 2014-08-13 16:18:36,005 INFO [main] mapreduce.Job: map 100% reduce 0%

  5. 2014-08-13 16:18:37,029 INFO [main] mapreduce.Job: Job job_1407910396081_0062 completed successfully

  6. 2014-08-13 16:18:37,137 INFO [main] mapreduce.Job: Counters: 37

  7. File System Counters

  8. FILE: Number of bytes read=0

  9. FILE: Number of bytes written=117527

  10. FILE: Number of read operations=0

  11. FILE: Number of large read operations=0

  12. FILE: Number of write operations=0

  13. HDFS: Number of bytes read=88

  14. HDFS: Number of bytes written=0

  15. HDFS: Number of read operations=1

  16. HDFS: Number of large read operations=0

  17. HDFS: Number of write operations=0

  18. Job Counters

  19. Launched map tasks=1

  20. Rack-local map tasks=1

  21. Total time spent by all maps in occupied slots (ms)=9740

  22. Total time spent by all reduces in occupied slots (ms)=0

  23. Map-Reduce Framework

  24. Map input records=1

  25. Map output records=1

  26. Input split bytes=88

  27. Spilled Records=0

  28. Failed Shuffles=0

  29. Merged Map outputs=0

  30. GC time elapsed (ms)=254

  31. CPU time spent (ms)=1810

  32. Physical memory (bytes) snapshot=345137152

  33. Virtual memory (bytes) snapshot=1841782784

  34. Total committed heap usage (bytes)=1029177344

  35. HBase Counters

  36. BYTES_IN_REMOTE_RESULTS=34

  37. BYTES_IN_RESULTS=34

  38. MILLIS_BETWEEN_NEXTS=254

  39. NOT_SERVING_REGION_EXCEPTION=0

  40. NUM_SCANNER_RESTARTS=0

  41. REGIONS_SCANNED=1

  42. REMOTE_RPC_CALLS=3

  43. REMOTE_RPC_RETRIES=0

  44. RPC_CALLS=3

  45. RPC_RETRIES=0

  46. File Input Format Counters

  47. Bytes Read=0

  48. File Output Format Counters

  49. Bytes Written=0

然后,就可以对比主集群中的表和从集群中对应的表数据是否一致。
 

 

转载请注明出处:http://blog.csdn.net/iAm333

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值