MapReducer的输出导入到HBase有多种方式可以实现, TableOutputFormat就是其中一种.
1. hbase建表
. hbase建表
- hbase(main):132:0* create 't1','f1'
- 0 row(s) in 1.4890 seconds
- hbase(main):133:0> scan 't1'
- ROW COLUMN+CELL
- 0 row(s) in 1.2330 seconds
2.写MR作业
public class HBaseMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, Text> {
@Override
public void map(LongWritable key, Text values,
OutputCollector<LongWritable, Text> output, Reporter reporter)
throws IOException {
output.collect(key, values);
}
} HBaseReducer.java
HBaseReducer.java
public class HBaseReducer extends MapReduceBase implements Reducer<LongWritable, Text, ImmutableBytesWritable, Put> {
@Override
public void reduce(LongWritable key, Iterator<Text> values,
OutputCollector<ImmutableBytesWritable, Put> output, Reporter reporter)
throws IOException {
String value="";
ImmutableBytesWritable immutableBytesWritable = new ImmutableBytesWritable();
Text text = new Text();
while(values.hasNext())
{
value = values.next().toString();
if(value != null && !"".equals(value))
{
Put put = createPut(value.toString());
if(put!=null)
output.collect(immutableBytesWritable, put);
}
}
}
// str格式为row:family:qualifier:value 简单模拟下而已
private Put createPut(String str)
{
String[] strstrs = str.split(":");
if(strs.length<4)
return null;
String row=strs[0];
String family=strs[1];
String qualifier=strs[2];
String value=strs[3];
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), 1L,Bytes.toBytes(value));
return put;
}
}
HbaseDriver.java
public class HbaseDriver {
public static void main(String[] args) {
JobConf conf = new JobConf(com.test.HbaseDriver.class);
conf.setMapperClass(com.test.HBaseMapper.class);
conf.setReducerClass(com.test.HBaseReducer.class);
conf.setMapOutputKeyClass(LongWritable.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(ImmutableBytesWritable.class);
conf.setOutputValueClass(Put.class);
conf.setOutputFormat(TableOutputFormat.class);
FileInputFormat.setInputPaths(conf, "/home/yinjie/input");
FileOutputFormat.setOutputPath(conf, new Path("/home/yinjie/output"));
conf.set(TableOutputFormat.OUTPUT_TABLE, "t1");
conf.set("hbase.zookeeper.quorum", "localhost");
conf.set("hbase.zookeeper.property.clientPort", "2181");
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
/home/yinjie/input目录下有一个hbasedata.txt文件,内容为
- [root@localhost input]# cat hbasedata.txt
- r1:f1:c1:value1
- r2:f1:c2:value2
- r3:f1:c3:value3
在eclipse下使用MR插件,运行作业:
作业成功后再次查询hbase表,验证数据是否已进去:
- hbase(main):135:0> scan 't1'
- ROW COLUMN+CELL
- r1 column=f1:c1, timestamp=1, value=value1
- r2 column=f1:c2, timestamp=1, value=value2
- r3 column=f1:c3, timestamp=1, value=value3
- 3 row(s) in 0.0580 seconds
数据已进插入^_^, TableOutputFormat效率并不好,大数据量装载到hbase的话最好生成HFile后再倒入到hbase, HFile是hbase内部存储表示形式, 所以装载数度很快.
本文出自 “炽天使” 博客,请务必保留此出处http://3199782.blog.51cto.com/3189782/652188