接上一篇
3、最后就是从hbase中的表作为数据源读取,hdfs作为数据输出,简单的如下:
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "job name ");
job.setJarByClass(test.class);
Scan scan = new Scan();
TableMapReduceUtil.initTableMapperJob(inputTable, scan, mapper.class,
Writable.class, Writable.class, job);
job.setOutputKeyClass(Writable.class);
job.setOutputValueClass(Writable.class);
FileOutputFormat.setOutputPath(job, Path);
job.waitForCompletion(true);
mapper和reducer简单如下:
public class mapper extends
TableMapper<</span>KEYOUT, VALUEOUT>{
public void map(Writable key, Writable value, Context context)
throws IOException, InterruptedException {
//mapper逻辑
context.write(key, value);
}
}
}
public class reducer extends
Reducer<</span>Writable,Writable,Writable,Writable> {
public void reducer(Writable key, Writable value, Context context)
throws IOException, InterruptedException {
//reducer逻辑
context.write(key, value);
}
}
}
最后说一下TableMapper和TableReducer的本质,其实这俩类就是为了简化一下书写代码,因为传入的4个泛型参数里都会有固定的参数类型,所以是Mapper和Reducer的简化版本,本质他们没有任何区别。源码如下:
public abstract class TableMapper<</span>KEYOUT, VALUEOUT>
extends Mapper<</span>ImmutableBytesWritable, Result, KEYOUT, VALUEOUT> {
}
public abstract class TableReducer<</span>KEYIN, VALUEIN, KEYOUT>
extends Reducer<</span>KEYIN, VALUEIN, KEYOUT, Writable> {
}
好了,可以去写第一个wordcount的hbase mapreduce程序了。