【Bug记录】Hadoop的WordCount结果不合并，不累加

最新推荐文章于 2023-02-06 11:29:15 发布

福气少侠

最新推荐文章于 2023-02-06 11:29:15 发布

阅读量903

点赞数

分类专栏： BUG记录 mapreduce 文章标签： hadoop bug

本文链接：https://blog.csdn.net/qq_16018407/article/details/78894831

版权

在学习Hadoop MapReduce时，遇到WordCount程序结果未进行单词累加的问题。检查源码发现，问题出在Reducer实现的函数名错误，应为reduce()而非reducer()。修复后，结果正常合并。

摘要由CSDN通过智能技术生成

Bug的背景

初学MR程序，一定是从WordCount开始的，跟着敲一遍，发现hadoop执行WordCount后，不对结果进行累加，这是怎么一回事呢？

Bug源码

package mr;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;



public class WordCount {
   

    /**
     * 测试wordcount
     * @Biglucky
     */
    public static class TokenizerMapper extends  Mapper<Object,Text,Text,IntWritable>
    {
   
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key,Text value,Context context) throws IOException,InterruptedException
        {
            StringTokenizer itr = new Stri