Hbase备份（HBase Backup）之 Hbase Export、Hbase Import

最新推荐文章于 2023-06-14 21:15:00 发布

張萠飛

最新推荐文章于 2023-06-14 21:15:00 发布

阅读量2k

点赞数

分类专栏： Hbase 文章标签： hbase export hbase import

本文链接：https://blog.csdn.net/zpf_940810653842/article/details/102582678

版权

大数据同时被 2 个专栏收录

97 篇文章 1 订阅

订阅专栏

Hbase

12 篇文章 0 订阅

订阅专栏

Export

mapreduce-based Export

endpoint-based Export

对照表

Import

export 方法将表的内容转储到同一集群上的HDFS。要恢复数据，将使用 import 。

Export

将表数据以 sequence file 的格式转存到 HDFS，通过运行 Coprocessor Endpoint 或 MapReduce。

mapreduce-based Export

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

endpoint-based Export

注意：通过添加 org.apache.hadoop.hbase.coprocessor.Export 到 hbase.coprocessor.region.classes 确保 Export 启用

$ bin/hbase org.apache.hadoop.hbase.coprocessor.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

outputdir 是导出之前不存在的 HDFS 目录。完成后，调用 export 命令的用户将拥有导出的文件。

对照表

	Endpoint-based Export	Mapreduce-based Export
HBase 版本需求	2.0+	0.2.1+
Maven dependency	hbase-endpoint	hbase-mapreduce (2.0+), hbase-server(prior to 2.0)
Requirement before dump	mount the endpoint.Export on the target table	deploy the MapReduce framework
Read 延迟	low, 直接从 region 读数据	normal, 传统的RPC扫描
Read 可伸缩性	取决于 region 的数量	取决于 mapper 的数量 (see TableInputFormatBase#getSplits)
Timeout	操作超时。由hbase.client.operation.timeout配置	扫描超时。由 hbase.client.scanner.timeout.period 配置
需求权限	READ, EXECUTE	READ
容错	no	取决于 MapReduce

Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

  Note: -D properties will be applied to the conf used. 
  For example: 
   -D mapreduce.output.fileoutputformat.compress=true
   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
   -D mapreduce.output.fileoutputformat.compress.type=BLOCK
此外，可以指定以下扫描属性控制/限制出口。
   -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
   -D hbase.mapreduce.include.deleted.rows=true
   -D hbase.mapreduce.scan.row.start=<ROWSTART>
   -D hbase.mapreduce.scan.row.stop=<ROWSTOP>
   -D hbase.client.scanner.caching=100
   -D hbase.export.visibility.labels=<labels>
对于非常宽的表，可以考虑设置如下的批大小:
   -D hbase.export.scanner.batch=10
   -D hbase.export.scanner.caching=100
   -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
对于MR性能，考虑以下特性:
   -D mapreduce.map.speculative=false
   -D mapreduce.reduce.speculative=false

默认情况下，Export 只导出给定 cells 的最新版本，而不考虑存储的版本数。要导出多个版本，请将<versions>替换为所需的版本数。

注意：输入扫描的缓存是通过 job 配置中的 hbase.client.scanner.caching

Import

Import是一个实用程序，它将加载已导出回HBase的数据。通过调用:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

Usage: Import [options] <tablename> <inputdir>
 默认情况下，import会将数据直接加载到hbase中。要生成数据文件以准备大容量数据加载，请传递选项:
  -Dimport.bulk.output=/path/for/output
 如果有一个大的结果，其中包含了太多可能由Memery Sort in Reducer引起的单元格空白，请传递选项： 
  -Dimport.bulk.hasLargeResult=true
 要对输入应用通用org.apache.hadoop.hbase.filter.filter，请使用 :
  -Dimport.filter.class=<name of filter class>
  -Dimport.filter.args=<comma separated list of args for filter
 注意:过滤器将在通过HBASE_IMPORTER_RENAME_CFS属性进行键重命名之前生效。
此外，过滤器将仅使用Filter#filterRowKey(byte[] buffer, int offset, int length)方法来确定是否需要完全忽略当前行进行处理，而Filter#filterCell(Cell)方法来确定是否应该添加Cell;
Filter.ReturnCode#INCLUDE 和 #INCLUDE_AND_NEXT_COL将被视为包含 Cell。
要导入从HBase 0.94导出的数据，请使用
  -Dhbase.import.version=0.94
  -D mapreduce.job.name=jobName - use the specified mapreduce job name for the import
就表现而言，可考虑下列方案:
  -Dmapreduce.map.speculative=false
  -Dmapreduce.reduce.speculative=false
  -Dimport.wal.durability=<在将数据写入hbase时使用。允许的值是受支持的持久性值，如SKIP_WAL/ASYNC_WAL/SYNC_WAL/…>

要在0.96集群或更高版本中导入0.94个导出文件，需要设置系统属性“hbase.import”。当运行import命令如下：

$ bin/hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>

張萠飛

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hbase备份（HBase Backup）之 Hbase Export、Hbase Import

目录Exportmapreduce-based Exportendpoint-based Export对照表Importexport 方法将表的内容转储到同一集群上的HDFS。要恢复数据，将使用 import 。Export将表数据以 sequence file 的格式转存到 HDFS，通过运行Coprocessor Endpoint 或MapReduce。...
复制链接

扫一扫

专栏目录