hbase hdfs外部表_HBase从hdfs导入数据

最新推荐文章于 2023-05-05 09:38:57 发布

河湾

最新推荐文章于 2023-05-05 09:38:57 发布

阅读量286

点赞数

文章标签： hbase hdfs外部表

本文链接：https://blog.csdn.net/weixin_34033915/article/details/112832008

版权

需求：将HDFS上的文件中的数据导入到hbase中

实现上面的需求也有两种办法，一种是自定义mr，一种是使用hbase提供好的import工具

一、hdfs中的数据是这样的

每一行的数据是这样的id name age gender birthday

(my_python_env)[root@hadoop26 ~]# hadoop fs -cat /t1/*1 zhangsan 10 male NULL

2 lisi NULL NULL NULL

3 wangwu NULL NULL NULL

4 zhaoliu NULL NULL 1993

二、自定义mr

public classHdfsToHBase {public static void main(String[] args) throwsException{

Configuration conf=HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", "hadoop26:2181");

conf.set("hbase.rootdir", "hdfs://hadoop26:9000/hbase");

conf.set(TableOutputFormat.OUTPUT_TABLE, args[1]);

Job job= Job.getInstance(conf, HdfsToHBase.class.getSimpleName());

TableMapReduceUtil.addDependencyJars(job);

job.setJarByClass(HdfsToHBase.class);

job.setMapperClass(HdfsToHBaseMapper.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

job.setReducerClass(HdfsToHBaseReducer.class);

FileInputFormat.addInputPath(job,new Path(args[0]));

job.setOutputFormatClass(TableOutputFormat.class);

job.waitForCompletion(true);

}public static class HdfsToHBaseMapper extends Mapper{private Text outKey = newText();private Text outValue = newText();

@Overrideprotected void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {

String[] splits= value.toString().split("\t");

outKey.set(splits[0]);

outValue.set(splits[1]+"\t"+splits[2]+"\t"+splits[3]+"\t"+splits[4]);

context.write(outKey, outValue);

}

}public static class HdfsToHBaseReducer extends TableReducer{

@Overrideprotected void reduce(Text k2, Iterable v2s, Context context) throwsIOException, InterruptedException {

Put put= newPut(k2.getBytes());for(Text v2 : v2s) {

String[] splis= v2.toString().split("\t");if(splis[0]!=null && !"NULL".equals(splis[0])){

put.add("f1".getBytes(), "name".getBytes(),splis[0].getBytes());

}if(splis[1]!=null && !"NULL".equals(splis[1])){

put.add("f1".getBytes(), "age".getBytes(),splis[1].getBytes());

}if(splis[2]!=null && !"NULL".equals(splis[2])){

put.add("f1".getBytes(), "gender".getBytes(),splis[2].getBytes());

}if(splis[3]!=null && !"NULL".equals(splis[3])){

put.add("f1".getBytes(), "birthday".getBytes(),splis[3].getBytes());

}

context.write(NullWritable.get(),put);

}

2.1打包运行

首先在hbase中创建一个表

hbase(main):006:0> create 'table1','f1'

0 row(s) in 0.4240seconds=> Hbase::Table - table1

然后运行

hadoop jar HdfsToHBase.jar com.lanyun.hadoop2.HdfsToHBase /t1/part* table1

最后查看table1中的数据

hbase(main):014:0* scan 'table1'ROW COLUMN+CELL1 column=f1:age, timestamp=1469069255119, value=10

1 column=f1:gender, timestamp=1469069255119, value=male1 column=f1:name, timestamp=1469069255119, value=zhangsan2 column=f1:name, timestamp=1469069255119, value=lisi3 column=f1:name, timestamp=1469069255119, value=wangwu4 column=f1:birthday, timestamp=1469069255119, value=1993

4 column=f1:name, timestamp=1469069255119, value=zhaoliu4 row(s) in 0.0430 seconds

三、使用habse提供的import工具

首先查看其用法

(my_python_env)[root@hadoop26 ~]# hbase org.apache.hadoop.hbase.mapreduce.Import

ERROR: Wrong number of arguments:0Usage: Import [options]By default Import will load data directly into HBase. To instead generate

HFiles of data to preparefora bulk data load, pass the option:-Dimport.bulk.output=/path/for/output

在hbase中创建表table2

hbase(main):016:0> create 'table2','f1'

0 row(s) in 0.4080seconds=> Hbase::Table - table2

在命令中中使用命令进行导入

hbase org.apache.hadoop.hbase.mapreduce.Import table2 /t2

查看table2中的数据

hbase(main):018:0> scan 'table2'ROW COLUMN+CELL1 column=f1:age, timestamp=1468824267106, value=10

1 column=f1:gender, timestamp=1468824289990, value=male1 column=f1:name, timestamp=1468824137463, value=zhangsan2 column=f1:name, timestamp=1468824236014, value=lisi3 column=f1:name, timestamp=1468824247109, value=wangwu4 column=f1:birthday, timestamp=1468825870158, value=1993

4 column=f1:name, timestamp=1468825659207, value=zhaoliu4 row(s) in 0.0440 seconds

四、注意

import工具很方便，但是只能导入Export导出的数据。

河湾

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hbase hdfs外部表_HBase从hdfs导入数据

需求：将HDFS上的文件中的数据导入到hbase中实现上面的需求也有两种办法，一种是自定义mr，一种是使用hbase提供好的import工具一、hdfs中的数据是这样的每一行的数据是这样的id name age gender birthday(my_python_env)[root@hadoop26 ~]# hadoop fs -cat /t1/*1 zhangsan 10 m...
复制链接

扫一扫