问题描述:单词统计,统计一个文件,每个单词出现了多少次?
使用技术:mapreduce(map阶段:把输入文件进行切块,切成单词; reduce阶段:进行单词统计)
过程详细分析:
map: 输入:<行偏移量(key),行内容>
LongWritable TEXT
输出:输出:<单词,1>
TEXT LongWritable(1)
package myMapreduceTest;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class myMapTest extends Mapper<LongWritable, Text, Text, LongWritable>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context)
throws IOException, InterruptedException {
//获取整行
String str = value.toString();
//进行切分
String[] words = str.split(" ");
//输出
for(String word:words){
context.write(new Text(word), new LongWritable(1));
}
}
}
reduce:输入:<单词,[1,1,1,1]>
TEXT LongWritable
{