我们在上一篇文章《HBase复制》中讲述了如何建立主/从集群,实现数据的实时备份。但是,HBase复制只对设置好复制以后的数据生效,也即,配置好复制之后插入HBase主集群的数据才能同步复制到HBase从集群中,而对之前的历史数据,采用HBase复制这种办法是无能为力的。本文介绍如何使用HBase的导入导出功能来实现历史数据的备份。
1)将HBase表数据导出到hdfs的一个指定目录中,具体命令如下:
$ cd $HBASE_HOME/
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export test_table /data/test_table
其中,$HBASE_HOME为HBase主目录,test_table为要导出的表名,/data/test_table为hdfs中的目录地址。
执行结果太长,这里截取最后一部分,如下所示:2014-08-11 16:49:44,484 INFO [main] mapreduce.Job: Running job: job_1407491918245_0021
2014-08-11 16:49:51,658 INFO [main] mapreduce.Job: Job job_1407491918245_0021 running in uber mode : false
2014-08-11 16:49:51,659 INFO [main] mapreduce.Job: map 0% reduce 0%
2014-08-11 16:49:57,706 INFO [main] mapreduce.Job: map 100% reduce 0%
2014-08-11 16:49:57,715 INFO [main] mapreduce.Job: Job job_1407491918245_0021 completed successfully
2014-08-11 16:49:57,789 INFO [main] mapreduce.Job: Counters: 37
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=118223
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=84
HDFS: Number of bytes written=243
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=9152
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=84
Spilled Records=0
Failed Shuffles=0
Merged