目录
一、MapReduce案例准备
在Idea中创建一个MapReduce工程,对指定目录下文件的单词个数进行统计。MapReduce框架在使用时,需要编写三个类:CountDriver,CountMapper,CountReducer。其中CountDriver为最终的执行类;CountMapper继承Mapper类,重写map方法,实现Map阶段的计算逻辑;CountReducer类继承Reducer类,重写reduce方法,实现reduce阶段的计算逻辑。
使用maven工具,创建新的maven工程,在其main目录下创建以上3个类。实现代码如下
1、CountMapper类
package com.blog.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class CountMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
/**
* @param key
* @param value
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//将文档的一行数据转化成java的String类,
String line = value.toString();
//以空格分隔字符串
String[] words = line.split(" ");
//遍历字符串数组,获得单词
for (String word : words) {
//java字符转化为Text类型
outKey.set(word);
//以<K,V>写出
context.write(outKey,outValue);
}
}
}
2、CountReducer类
package com.blog.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class CountReducer extends Reducer<Text, IntWritable,Text,IntWritab