本blog将介绍利用MapReduce操作HBase,借助最熟悉的单词计数案例WordCount,将WordCount的统计结果存储到HBase,而不是HDFS。
开发环境
硬件环境:Centos 6.5 服务器4台(一台为Master节点,三台为Slave节点)
软件环境:Java 1.7.0_45、Eclipse Juno Service Release 2、hadoop-1.2.1、hbase-0.94.20。
1、 输入与输出
1)输入文件
file0.txt(WordCountHbaseWriter\input\file0.txt)
Hello World Bye World
file1.txt(WordCountHbaseWriter\input\file1.txt)
Hello Hadoop Goodbye Hadoop
2)输出HBase数据库
以下为输出数据库wordcount的数据库结构,以及预期的输出结果,如下图所示:
2、 Mapper函数实现
WordCountHbaseMapper程序和WordCount的Map程序一样,Map输入为每一行数据,例如”Hello World Bye World”,通过StringTokenizer类按空格分割成一个个单词,
通过context.write(word, one);输出为一系列< key,value>键值对:<”Hello”,1><”World”,1><”Bye”,1><”World”,1>。
详细源码请参考:WordCountHbaseWriter\src\com\zonesion\hbase\WordCountHbaseWriter.java
public static class WordCountHbaseMapper extends
Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);// 输出<key,value>为<word,one>
}
}
}
3、 Reducer函数实现
WordCountHbaseReducer继承的是TableReducer类,在Hadoop中TableReducer继承Reducer类,它的原型为TableReducer< KeyIn,Values,KeyOut>,前两个参数必须对应Map过程的输出类型key/value类型,第三个参数为ImmutableBytesWritable,即为不可变类型。reduce(Text key, Iterable< IntWritable> values,Context context)具体处理过程分析如下表所示。
详细源码请参考:WordCountHbaseWriter\src\com\zonesion\hbase\WordCountHbaseWriter.java
public static class WordCountHbaseReducer extends
TableReducer<Text, IntWritable, ImmutableBytesWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {// 遍历求和
sum += val.get();
}
Put put = new Put(key.getBytes());//put实例化,每一个词存一行