Mapreduce实例（二）：求平均值_mapreduce求平均值-CSDN博客

本文链接：https://blog.csdn.net/u011109589/article/details/125064490

MR 实现求平均值

实现思路
编写代码

大家好，我是风云，欢迎大家关注我的博客或者微信公众号【笑看风云路】，在未来的日子里我们一起来学习大数据相关的技术，一起努力奋斗，遇见更好的自己！

实现思路

求平均数是MapReduce比较常见的算法，求平均数的算法也比较简单，一种思路是Map端读取数据，在数据输入到Reduce之前先经过shuffle，将map函数输出的key值相同的所有的value值形成一个集合value-list，然后将输入到Reduce端，Reduce端汇总并且统计记录数，然后作商即可。具体原理如下图所示：

编写代码

Mapper代码

public static class Map extends Mapper<Object , Text , Text , IntWritable>{  
  private static Text newKey=new Text();  
  //实现map函数  
  public void map(Object key,Text value,Context context) throws IOException, InterruptedException{  
    // 将输入的纯文本文件的数据转化成String  
    String line=value.toString();  
    System.out.println(line);  
    String arr[]=line.split("\t");  
    newKey.set(arr[0]);  
    int click=Integer.parseInt(arr[1]);  
    context.write(newKey, new IntWritable(click));  
  }  
}

map端在采用Hadoop的默认输入方式之后，将输入的value值通过split()方法截取出来，我们把截取的商品点击次数字段转化为IntWritable类型并将其设置为value，把商品分类字段设置为key,然后直接输出key/value的值。

Reducer代码

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{  
  //实现reduce函数  
  public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{  
    int num=0;  
    int count=0;  
    for(IntWritable val:values){  
      num+=val.get(); //每个元素求和num  
      count++;        //统计元素的次数count  
    }  
    int avg=num/count;  //计算平均数  

    context.write(key,new IntWritable(avg));  
  }  
}

map的输出<key,value>经过shuffle过程集成<key,values>键值对，然后将<key,values>键值对交给reduce。reduce端接收到values之后，将输入的key直接复制给输出的key，将values通过for循环把里面的每个元素求和num并统计元素的次数count，然后用num除以count 得到平均值avg，将avg设置为value，最后直接输出<key,value>就可以了。

完整代码

package mapreduce;  
import java.io.IOException;  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.NullWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  
public class MyAverage{  
  public static class Map extends Mapper<Object , Text , Text , IntWritable>{  
    private static Text newKey=new Text();  
    public void map(Object key,Text value,Context context) throws IOException, InterruptedException{  
      String line=value.toString();  
      System.out.println(line);  
      String arr[]=line.split("\t");  
      newKey.set(arr[0]);  
      int click=Integer.parseInt(arr[1]);  
      context.write(newKey, new IntWritable(click));  
    }  
  }  
  public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{  
    public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{  
      int num=0;  
      int count=0;  
      for(IntWritable val:values){  
        num+=val.get();  
        count++;  
      }  
      int avg=num/count;  
      context.write(key,new IntWritable(avg));  
    }  
  }  
  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{  
    Configuration conf=new Configuration();  
    System.out.println("start");  
    Job job =new Job(conf,"MyAverage");  
    job.setJarByClass(MyAverage.class);  
    job.setMapperClass(Map.class);  
    job.setReducerClass(Reduce.class);  
    job.setOutputKeyClass(Text.class);  
    job.setOutputValueClass(IntWritable.class);  
    job.setInputFormatClass(TextInputFormat.class);  
    job.setOutputFormatClass(TextOutputFormat.class);  
    Path in=new Path("hdfs://localhost:9000/mymapreduce4/in/goods_click");  
    Path out=new Path("hdfs://localhost:9000/mymapreduce4/out");  
    FileInputFormat.addInputPath(job,in);  
    FileOutputFormat.setOutputPath(job,out);  
    System.exit(job.waitForCompletion(true) ? 0 : 1);  

  }  
}