hbase的 export以及import工具使用示例 + 时间区间+ key前缀

最新推荐文章于 2023-01-18 13:59:45 发布

mtj66

最新推荐文章于 2023-01-18 13:59:45 发布

阅读量1.5w

点赞数 2

分类专栏： hbase 文章标签： hbase export import

本文链接：https://blog.csdn.net/mtj66/article/details/52742521

版权

hbase 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

1.hbase中的数据

hbase(main):025:0> scan 'users'
ROW COLUMN+CELL
TheRealMT column=cfInfo:name, timestamp=1475718775174, value=Twain2append
TheRealMT column=cfInfo:password, timestamp=1473735751989, value=example
row222 column=cfInfo:name, timestamp=1475719789383, value=lily
row222 column=cfInfo:password, timestamp=1475719813067, value=lilyipwd
2 row(s) in 0.0150 seconds

hbase(main):026:0>

2.进行有条件的导出操作:

先看看hbase export的使用说明

hbase org.apache.hadoop.hbase.mapreduce.Export -help
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
ERROR: Wrong number of arguments: 1
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<familyName>
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
For performance consider the following properties:
-Dhbase.client.scanner.caching=100
-Dmapreduce.map.speculative=false
-Dmapreduce.reduce.speculative=false
For tables with very wide rows consider setting the batch size as below:
-Dhbase.export.scanner.batch=10] or [Prefix] to filter]]

下面是导出user表中 version=1,start_time=0, end_time=99999999999 key的prefix=row222的用户,当数据非常大的时候,比采用hive进行where过滤,能提高.

[hdfs@test-hadoop-slave ~]$ hbase org.apache.hadoop.hbase.mapreduce.Export 'users' /test/source/fromhbasetohdfs/users 1 0 999999999999999 '^^(?!row222)'

Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/06 10:18:10 INFO mapreduce.Export: Setting Scan Filter for Export.
16/10/06 10:18:10 INFO mapreduce.Export: versions=1, starttime=0, endtime=999999999999999, keepDeletedCells=false
16/10/06 10:18:12 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
16/10/06 10:18:12 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm223
16/10/06 10:18:14 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x59aa20b3 connecting to ZooKeeper ensemble=shjq-np-test-hadoop-node-srv3:2181,shjq-np-test-hadoop-node-srv4:2181,shjq-np-test-hadoop-node-srv5:2181

16/10/06 10:18:14 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.7.0--1, built on 03/23/2016

16/10/06 10:18:16 INFO mapreduce.Job: Running job: job_1474531499351_0257

16/10/06 10:18:23 INFO mapreduce.Job: Job job_1474531499351_0257 running in uber mode : false
16/10/06 10:18:23 INFO mapreduce.Job: map 0% reduce 0%
16/10/06 10:18:34 INFO mapreduce.Job: map 100% reduce 0%
16/10/06 10:18:35 INFO mapreduce.Job: Job job_1474531499351_0257 completed successfully
16/10/06 10:18:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=157261
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=66
HDFS: Number of bytes written=239
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8089
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8089
Total vcore-seconds taken by all map tasks=8089
Total megabyte-seconds taken by all map tasks=8283136
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=66
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=140
CPU time spent (ms)=3920
Physical memory (bytes) snapshot=242630656
Virtual memory (bytes) snapshot=2809552896
Total committed heap usage (bytes)=178782208
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=239
[hdfs@shjq-np-test-hadoop-slave ~]$

[hdfs@shjq-np-test-hadoop-slave ~]$

3.查看导出的数据：

[hdfs@test-hadoop-slave ~]$ hadoop fs -ls /test/source/fromhbasetohdfs
Found 1 items
drwxr-xr-x - hdfs supergroup 0 2016-10-06 10:18 /test/source/fromhbasetohdfs/users
[hdfs@test-hadoop-slave ~]$ hadoop fs -test /test/source/fromhbasetohdfs/users
-test: No test flag given
Usage: hadoop fs [generic options] -test -[defsz] <path>
[hdfs@test-hadoop-slave ~]$ hadoop fs -text /test/source/fromhbasetohdfs/users
text: `/test/source/fromhbasetohdfs/users': Is a directory
[hdfs@test-hadoop-slave ~]$ hadoop fs -text /test/source/fromhbasetohdfs/users/
text: `/test/source/fromhbasetohdfs/users': Is a directory
[hdfs@test-hadoop-slave ~]$ hadoop fs -ls /test/source/fromhbasetohdfs/users/
Found 2 items
-rw-r--r-- 2 hdfs supergroup 0 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/_SUCCESS
-rw-r--r-- 2 hdfs supergroup 239 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/part-m-00000
[hdfs@shjq-np-test-hadoop-slave ~]$ hadoop fs -ls /test/source/fromhbasetohdfs/users/part-m-00000

-rw-r--r-- 2 hdfs supergroup 239 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/part-m-00000

3.1 此方法居然报错,说明导出的是非常规csv等文本文件

[hdfs@test-hadoop-slave ~]$ hadoop fs -text /test/source/fromhbasetohdfs/users/part-m-00000
-text: Fatal internal error
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2023)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1954)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
at org.apache.hadoop.fs.shell.Display$TextRecordInputStream.<init>(Display.java:222)
at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:152)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
Caused by: java.io.IOException: WritableName can't load class: org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:77)
at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2021)
... 16 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.io.ImmutableBytesWritable not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:75)
... 17 more

3.2 此方法会乱码：

[hdfs@shjq-np-test-hadoop-slave ~]$ hadoop fs -cat /test/source/fromhbasetohdfs/users/part-m-00000

SEQ1orgTheRealMTgdoop.hbase.io.ImmutableBytesWritable%org.apache.hadoop.hbase.client.Resultj檥泖q|>Qw鮀H

4. 创建hbase表并且先插入两条记录

hbase(main):026:0> create 'usersfromhdfswithfilter','cfInfo'
0 row(s) in 1.2920 seconds

hbase(main):028:0> put 'usersfromhdfswithfilter','row333','cfInfo:name','row333value'
0 row(s) in 0.0200 seconds

hbase(main):029:0> put 'usersfromhdfswithfilter','row333','cfInfo:password','row333pwd'
0 row(s) in 0.0070 seconds

5.从hdfs导入数据到hbase表

[hdfs@test-hadoop-slave ~]$ hbase org.apache.hadoop.hbase.mapreduce.Import 'usersfromhdfswithfilter' /test/source/fromhbasetohdfs/users
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/06 10:32:17 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
16/10/06 10:32:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm223
16/10/06 10:32:20 INFO input.FileInputFormat: Total input paths to process : 1
16/10/06 10:32:20 INFO mapreduce.JobSubmitter: number of splits:1
16/10/06 10:32:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474531499351_0258
16/10/06 10:32:21 INFO impl.YarnClientImpl: Submitted application application_1474531499351_0258
16/10/06 10:32:21 INFO mapreduce.Job: The url to track the job: http://shjq-np-test-hadoop-slave:8088/proxy/application_1474531499351_0258/
16/10/06 10:32:21 INFO mapreduce.Job: Running job: job_1474531499351_0258
16/10/06 10:32:29 INFO mapreduce.Job: Job job_1474531499351_0258 running in uber mode : false
16/10/06 10:32:29 INFO mapreduce.Job: map 0% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job: map 100% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 completed successfully
16/10/06 10:32:37 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=156683
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=366
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6627
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6627
Total vcore-seconds taken by all map tasks=6627
Total megabyte-seconds taken by all map tasks=6786048
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=127
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=113
CPU time spent (ms)=2180
Physical memory (bytes) snapshot=226000896
Virtual memory (bytes) snapshot=2785763328
Total committed heap usage (bytes)=179306496
File Input Format Counters
Bytes Read=239
File Output Format Counters
Bytes Written=0
16/10/06 10:32:37 INFO mapreduce.Job: Running job: job_1474531499351_0258
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 running in uber mode : false
16/10/06 10:32:37 INFO mapreduce.Job: map 100% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 completed successfully
16/10/06 10:32:37 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=156683
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=366
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6627
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6627
Total vcore-seconds taken by all map tasks=6627
Total megabyte-seconds taken by all map tasks=6786048
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=127
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=113
CPU time spent (ms)=2180
Physical memory (bytes) snapshot=226000896
Virtual memory (bytes) snapshot=2785763328
Total committed heap usage (bytes)=179306496
File Input Format Counters
Bytes Read=239
File Output Format Counters
Bytes Written=0
[hdfs@test-hadoop-slave ~]$

6.查看导入之后的结果表，发现过滤得到的数据已经被导入进新表中。

hbase(main):030:0> scan 'usersfromhdfswithfilter'
ROW COLUMN+CELL
TheRealMT column=cfInfo:name, timestamp=1475718775174, value=Twain2append
TheRealMT column=cfInfo:password, timestamp=1473735751989, value=example
row333 column=cfInfo:name, timestamp=1475721070129, value=row333value
row333 column=cfInfo:password, timestamp=1475721085801, value=row333pwd
2 row(s) in 0.0480 seconds

hbase(main):033:0>

创建hbase表并且先插入两条记录

============================================================================

附：

采用mr方式将数据导入到hdfs中

一、目的

把hbase中某张表的数据导出到hdfs上一份。

实现方式这里介绍两种：一种是自己写mr程序来完成，一种是使用hbase提供的类来完成。

二、自定义mr程序将hbase数据导出到hdfs上

2.1首先看看hbase中t1表中的数据：

2.2mr的代码如下：

比较重要的语句是

job.setNumReduceTasks(0);//为什么要设置reduce的数量是0呢？读者可以自己考虑下
TableMapReduceUtil.initTableMapperJob(args[0], new Scan(),HBaseToHdfsMapper.class ,Text.class, Text.class, job);//这行语句指定了mr的输入是hbase的哪张表，scan可以对这个表进行filter操作。

 
          public  
          class  
          HBaseToHdfs { 
         
          public  
          static  
          void  
          main(String[] args)  
          throws  
          Exception { 
         
          Configuration conf = HBaseConfiguration.create(); 
         
          Job job = Job.getInstance(conf, HBaseToHdfs. 
          class 
          .getSimpleName()); 
         
          job.setJarByClass(HBaseToHdfs. 
          class 
          ); 
         
          job.setMapperClass(HBaseToHdfsMapper. 
          class 
          ); 
         
          job.setMapOutputKeyClass(Text. 
          class 
          ); 
         
          job.setMapOutputValueClass(Text. 
          class 
          ); 
         
          job.setNumReduceTasks( 
          0 
          ); 
         
          TableMapReduceUtil.initTableMapperJob(args[ 
          0 
          ],  
          new  
          Scan(),HBaseToHdfsMapper. 
          class  
          ,Text. 
          class 
          , Text. 
          class 
          , job); 
         
          //TableMapReduceUtil.addDependencyJars(job); 
         
          job.setOutputFormatClass(TextOutputFormat. 
          class 
          ); 
         
          FileOutputFormat.setOutputPath(job,  
          new  
          Path(args[ 
          1 
          ])); 
         
          job.waitForCompletion( 
          true 
          ); 
         
          } 
         
          public  
          static  
          class  
          HBaseToHdfsMapper  
          extends  
          TableMapper<Text, Text> { 
         
          private  
          Text outKey =  
          new  
          Text(); 
         
          private  
          Text outValue =  
          new  
          Text(); 
         
          @Override 
         
          protected  
          void  
          map(ImmutableBytesWritable key, Result value, Context context)  
          throws  
          IOException, InterruptedException { 
         
          //key在这里就是hbase的rowkey 
         
          byte 
          [] name =  
          null 
          ; 
         
          byte 
          [] age =  
          null 
          ; 
         
          byte 
          [] gender =  
          null 
          ; 
         
          byte 
          [] birthday =  
          null 
          ; 
         
          try  
          { 
         
          name = value.getColumnLatestCell( 
          "f1" 
          .getBytes(),  
          "name" 
          .getBytes()).getValue(); 
         
          }  
          catch  
          (Exception e) {} 
         
          try  
          { 
         
          age = value.getColumnLatestCell( 
          "f1" 
          .getBytes(),  
          "age" 
          .getBytes()).getValue(); 
         
          }  
          catch  
          (Exception e) {} 
         
          try  
          { 
         
          gender = value.getColumnLatestCell( 
          "f1" 
          .getBytes(),  
          "gender" 
          .getBytes()).getValue(); 
         
          }  
          catch  
          (Exception e) {} 
         
          try  
          { 
         
          birthday = value.getColumnLatestCell( 
          "f1" 
          .getBytes(),  
          "birthday" 
          .getBytes()).getValue(); 
         
          }  
          catch  
          (Exception e) {} 
         
          outKey.set(key.get()); 
         
          String temp = ((name== 
          null  
          || name.length== 
          0 
          )? 
          "NULL" 
          : 
          new  
          String(name)) +  
          "\t"  
          + ((age== 
          null  
          || age.length== 
          0 
          )? 
          "NULL" 
          : 
          new  
          String(age)) +  
          "\t"  
          + ((gender== 
          null 
          ||gender.length== 
          0 
          )? 
          "NULL" 
          : 
          new  
          String(gender)) +  
          "\t"  
          +  ((birthday== 
          null 
          ||birthday.length== 
          0 
          )? 
          "NULL" 
          : 
          new  
          String(birthday)); 
         
          System.out.println(temp); 
         
          outValue.set(temp); 
         
          context.write(outKey, outValue); 
         
          } 
         
          } 
         
          }

2.3打包执行

hadoop jar hbaseToDfs.jar com.lanyun.hadoop2.HBaseToHdfs t1 /t1

2.4查看hdfs上的文件

[root@hadoop ~]# hadoop fs -cat /t1/part*
1    zhangsan    10    male    NULL
2    lisi    NULL    NULL    NULL
3    wangwu    NULL    NULL    NULL
4    zhaoliu    NULL    NULL    1993

至此，导出成功

mtj66

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
5
评论
hbase的 export以及import工具使用示例 + 时间区间+ key前缀

1.hbase中的数据hbase(main):025:0&gt; scan 'users'ROW COLUMN+CELL TheRealMT ...
复制链接

扫一扫