hbase的 export以及import工具使用示例 + 时间区间+ key前缀


1.hbase中的数据

hbase(main):025:0> scan 'users'
ROW                                COLUMN+CELL                                                                                      
 TheRealMT                         column=cfInfo:name, timestamp=1475718775174, value=Twain2append                                  
 TheRealMT                         column=cfInfo:password, timestamp=1473735751989, value=example                                   
 row222                            column=cfInfo:name, timestamp=1475719789383, value=lily                                          
 row222                            column=cfInfo:password, timestamp=1475719813067, value=lilyipwd                                  
2 row(s) in 0.0150 seconds


hbase(main):026:0> 

2.进行有条件的导出操作:

先看看hbase export的使用说明

hbase org.apache.hadoop.hbase.mapreduce.Export  -help
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
ERROR: Wrong number of arguments: 1
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]


  Note: -D properties will be applied to the conf used. 
  For example: 
   -D mapreduce.output.fileoutputformat.compress=true
   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
   -D mapreduce.output.fileoutputformat.compress.type=BLOCK
  Additionally, the following SCAN properties can be specified
  to control/limit what is exported..
   -D hbase.mapreduce.scan.column.family=<familyName>
   -D hbase.mapreduce.include.deleted.rows=true
   -D hbase.mapreduce.scan.row.start=<ROWSTART>
   -D hbase.mapreduce.scan.row.stop=<ROWSTOP>
For performance consider the following properties:
   -Dhbase.client.scanner.caching=100
   -Dmapreduce.map.speculative=false
   -Dmapreduce.reduce.speculative=false
For tables with very wide rows consider setting the batch size as below:
   -Dhbase.export.scanner.batch=10] or [Prefix] to filter]]

 下面是 导出user表中 version=1,start_time=0, end_time=99999999999 key的prefix=row222的用户,当数据非常大的时候,比采用hive进行where过滤,能提高.

[hdfs@test-hadoop-slave ~]$ hbase org.apache.hadoop.hbase.mapreduce.Export 'users'  /test/source/fromhbasetohdfs/users 1 0 999999999999999 '^^(?!row222)'

Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/06 10:18:10 INFO mapreduce.Export: Setting Scan Filter for Export.
16/10/06 10:18:10 INFO mapreduce.Export: versions=1, starttime=0, endtime=999999999999999, keepDeletedCells=false
16/10/06 10:18:12 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
16/10/06 10:18:12 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm223
16/10/06 10:18:14 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x59aa20b3 connecting to ZooKeeper ensemble=shjq-np-test-hadoop-node-srv3:2181,shjq-np-test-hadoop-node-srv4:2181,shjq-np-test-hadoop-node-srv5:2181

16/10/06 10:18:14 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.7.0--1, built on 03/23/2016 

16/10/06 10:18:16 INFO mapreduce.Job: Running job: job_1474531499351_0257

16/10/06 10:18:23 INFO mapreduce.Job: Job job_1474531499351_0257 running in uber mode : false
16/10/06 10:18:23 INFO mapreduce.Job:  map 0% reduce 0%
16/10/06 10:18:34 INFO mapreduce.Job:  map 100% reduce 0%
16/10/06 10:18:35 INFO mapreduce.Job: Job job_1474531499351_0257 completed successfully
16/10/06 10:18:36 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=157261
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=66
                HDFS: Number of bytes written=239
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8089
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=8089
                Total vcore-seconds taken by all map tasks=8089
                Total megabyte-seconds taken by all map tasks=8283136
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=66
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=140
                CPU time spent (ms)=3920
                Physical memory (bytes) snapshot=242630656
                Virtual memory (bytes) snapshot=2809552896
                Total committed heap usage (bytes)=178782208
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=239
[hdfs@shjq-np-test-hadoop-slave ~]$ 

[hdfs@shjq-np-test-hadoop-slave ~]$ 


3.查看导出的数据:

[hdfs@test-hadoop-slave ~]$ hadoop fs -ls /test/source/fromhbasetohdfs
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2016-10-06 10:18 /test/source/fromhbasetohdfs/users
[hdfs@test-hadoop-slave ~]$ hadoop fs -test  /test/source/fromhbasetohdfs/users
-test: No test flag given
Usage: hadoop fs [generic options] -test -[defsz] <path>
[hdfs@test-hadoop-slave ~]$ hadoop fs -text  /test/source/fromhbasetohdfs/users  
text: `/test/source/fromhbasetohdfs/users': Is a directory
[hdfs@test-hadoop-slave ~]$ hadoop fs -text  /test/source/fromhbasetohdfs/users/
text: `/test/source/fromhbasetohdfs/users': Is a directory
[hdfs@test-hadoop-slave ~]$ hadoop fs -ls  /test/source/fromhbasetohdfs/users/    
Found 2 items
-rw-r--r--   2 hdfs supergroup          0 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/_SUCCESS
-rw-r--r--   2 hdfs supergroup        239 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/part-m-00000
[hdfs@shjq-np-test-hadoop-slave ~]$ hadoop fs -ls  /test/source/fromhbasetohdfs/users/part-m-00000

-rw-r--r--   2 hdfs supergroup        239 2016-10-06 10:18 /test/source/fromhbasetohdfs/users/part-m-00000


3.1 此方法居然报错,说明导出的是非常规csv等文本文件

[hdfs@test-hadoop-slave ~]$ hadoop fs -text  /test/source/fromhbasetohdfs/users/part-m-00000    
-text: Fatal internal error
java.lang.RuntimeException: java.io.IOException: WritableName can't load class: org.apache.hadoop.hbase.io.ImmutableBytesWritable
        at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2023)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1954)
        at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
        at org.apache.hadoop.fs.shell.Display$TextRecordInputStream.<init>(Display.java:222)
        at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:152)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
Caused by: java.io.IOException: WritableName can't load class: org.apache.hadoop.hbase.io.ImmutableBytesWritable
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:77)
        at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:2021)
        ... 16 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.io.ImmutableBytesWritable not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:75)
        ... 17 more


3.2 此方法会乱码:

[hdfs@shjq-np-test-hadoop-slave ~]$ hadoop fs -cat   /test/source/fromhbasetohdfs/users/part-m-00000    

SEQ1orgTheRealMTgdoop.hbase.io.ImmutableBytesWritable%org.apache.hadoop.hbase.client.Resultj檥泖q|>Qw鮀H


4. 创建hbase表并且先插入两条记录

hbase(main):026:0> create 'usersfromhdfswithfilter','cfInfo'
0 row(s) in 1.2920 seconds
  
hbase(main):028:0> put 'usersfromhdfswithfilter','row333','cfInfo:name','row333value'
0 row(s) in 0.0200 seconds
 
hbase(main):029:0> put 'usersfromhdfswithfilter','row333','cfInfo:password','row333pwd'
0 row(s) in 0.0070 seconds


5.从hdfs导入数据到hbase表

[hdfs@test-hadoop-slave ~]$ hbase org.apache.hadoop.hbase.mapreduce.Import 'usersfromhdfswithfilter' /test/source/fromhbasetohdfs/users
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
16/10/06 10:32:17 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
16/10/06 10:32:18 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm223
16/10/06 10:32:20 INFO input.FileInputFormat: Total input paths to process : 1
16/10/06 10:32:20 INFO mapreduce.JobSubmitter: number of splits:1
16/10/06 10:32:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474531499351_0258
16/10/06 10:32:21 INFO impl.YarnClientImpl: Submitted application application_1474531499351_0258
16/10/06 10:32:21 INFO mapreduce.Job: The url to track the job: http://shjq-np-test-hadoop-slave:8088/proxy/application_1474531499351_0258/
16/10/06 10:32:21 INFO mapreduce.Job: Running job: job_1474531499351_0258
16/10/06 10:32:29 INFO mapreduce.Job: Job job_1474531499351_0258 running in uber mode : false
16/10/06 10:32:29 INFO mapreduce.Job:  map 0% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job:  map 100% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 completed successfully
16/10/06 10:32:37 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=156683
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=366
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=3
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters 
                Launched map tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6627
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=6627
                Total vcore-seconds taken by all map tasks=6627
                Total megabyte-seconds taken by all map tasks=6786048
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=127
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=113
                CPU time spent (ms)=2180
                Physical memory (bytes) snapshot=226000896
                Virtual memory (bytes) snapshot=2785763328
                Total committed heap usage (bytes)=179306496
        File Input Format Counters 
                Bytes Read=239
        File Output Format Counters 
                Bytes Written=0
16/10/06 10:32:37 INFO mapreduce.Job: Running job: job_1474531499351_0258
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 running in uber mode : false
16/10/06 10:32:37 INFO mapreduce.Job:  map 100% reduce 0%
16/10/06 10:32:37 INFO mapreduce.Job: Job job_1474531499351_0258 completed successfully
16/10/06 10:32:37 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=156683
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=366
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=3
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters 
                Launched map tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6627
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=6627
                Total vcore-seconds taken by all map tasks=6627
                Total megabyte-seconds taken by all map tasks=6786048
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=127
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=113
                CPU time spent (ms)=2180
                Physical memory (bytes) snapshot=226000896
                Virtual memory (bytes) snapshot=2785763328
                Total committed heap usage (bytes)=179306496
        File Input Format Counters 
                Bytes Read=239
        File Output Format Counters 
                Bytes Written=0
[hdfs@test-hadoop-slave ~]$ 


6.查看导入之后的结果表,发现过滤得到的数据已经被导入进新表中。


hbase(main):030:0> scan 'usersfromhdfswithfilter'
ROW                                COLUMN+CELL                                                                                      
 TheRealMT                         column=cfInfo:name, timestamp=1475718775174, value=Twain2append                                  
 TheRealMT                         column=cfInfo:password, timestamp=1473735751989, value=example                                   
 row333                            column=cfInfo:name, timestamp=1475721070129, value=row333value                                   
 row333                            column=cfInfo:password, timestamp=1475721085801, value=row333pwd                                 
2 row(s) in 0.0480 seconds

hbase(main):033:0> 


创建hbase表并且先插入两条记录

hbase(main):026:0> create 'usersfromhdfswithfilter','cfInfo'
0 row(s) in 1.2920 seconds
  
hbase(main):028:0> put 'usersfromhdfswithfilter','row333','cfInfo:name','row333value'
0 row(s) in 0.0200 seconds
 
hbase(main):029:0> put 'usersfromhdfswithfilter','row333','cfInfo:password','row333pwd'
0 row(s) in 0.0070 seconds



============================================================================

附:

采用mr方式将 数据导入到hdfs中

一、目的

把hbase中某张表的数据导出到hdfs上一份。

实现方式这里介绍两种:一种是自己写mr程序来完成,一种是使用hbase提供的类来完成。

二、自定义mr程序将hbase数据导出到hdfs上

2.1首先看看hbase中t1表中的数据:

2.2mr的代码如下:

比较重要的语句是

job.setNumReduceTasks(0);//为什么要设置reduce的数量是0呢?读者可以自己考虑下
TableMapReduceUtil.initTableMapperJob(args[0], new Scan(),HBaseToHdfsMapper.class ,Text.class, Text.class, job);//这行语句指定了mr的输入是hbase的哪张表,scan可以对这个表进行filter操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
public  class  HBaseToHdfs {
     public  static  void  main(String[] args)  throws  Exception {
         Configuration conf = HBaseConfiguration.create();
         Job job = Job.getInstance(conf, HBaseToHdfs. class .getSimpleName());
         job.setJarByClass(HBaseToHdfs. class );
         
         job.setMapperClass(HBaseToHdfsMapper. class );
         job.setMapOutputKeyClass(Text. class );
         job.setMapOutputValueClass(Text. class );
         
         job.setNumReduceTasks( 0 );
         
         TableMapReduceUtil.initTableMapperJob(args[ 0 ],  new  Scan(),HBaseToHdfsMapper. class  ,Text. class , Text. class , job);
         //TableMapReduceUtil.addDependencyJars(job);
         
         job.setOutputFormatClass(TextOutputFormat. class );
         FileOutputFormat.setOutputPath(job,  new  Path(args[ 1 ]));
         
         job.waitForCompletion( true );
     }
     
     
     public  static  class  HBaseToHdfsMapper  extends  TableMapper<Text, Text> {
         private  Text outKey =  new  Text();
         private  Text outValue =  new  Text();
         @Override
         protected  void  map(ImmutableBytesWritable key, Result value, Context context)  throws  IOException, InterruptedException {
             //key在这里就是hbase的rowkey
             byte [] name =  null ;
             byte [] age =  null ;
             byte [] gender =  null ;
             byte [] birthday =  null ;
             try  {
                 name = value.getColumnLatestCell( "f1" .getBytes(),  "name" .getBytes()).getValue();
             catch  (Exception e) {}
             try  {
                 age = value.getColumnLatestCell( "f1" .getBytes(),  "age" .getBytes()).getValue();
             catch  (Exception e) {}
             try  {
                 gender = value.getColumnLatestCell( "f1" .getBytes(),  "gender" .getBytes()).getValue();
             catch  (Exception e) {}
             try  {
                 birthday = value.getColumnLatestCell( "f1" .getBytes(),  "birthday" .getBytes()).getValue();
             catch  (Exception e) {}
             outKey.set(key.get());
             String temp = ((name== null  || name.length== 0 )? "NULL" : new  String(name)) +  "\t"  + ((age== null  || age.length== 0 )? "NULL" : new  String(age)) +  "\t"  + ((gender== null ||gender.length== 0 )? "NULL" : new  String(gender)) +  "\t"  +  ((birthday== null ||birthday.length== 0 )? "NULL" : new  String(birthday));
             System.out.println(temp);
             outValue.set(temp);
             context.write(outKey, outValue);
         }
 
     }
}

2.3打包执行

hadoop jar hbaseToDfs.jar com.lanyun.hadoop2.HBaseToHdfs t1 /t1

2.4查看hdfs上的文件

[root@hadoop ~]# hadoop fs -cat /t1/part*
1    zhangsan    10    male    NULL
2    lisi    NULL    NULL    NULL
3    wangwu    NULL    NULL    NULL
4    zhaoliu    NULL    NULL    1993

至此,导出成功


  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

mtj66

看心情

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值