Hadoop小练习——利用MapReduce求平均数

最新推荐文章于 2024-05-17 16:48:52 发布

ShawshankLin

最新推荐文章于 2024-05-17 16:48:52 发布

阅读量8.6k

点赞数 1

分类专栏： Hadoop

本文链接：https://blog.csdn.net/u012554102/article/details/46765467

版权

本文通过一个MapReduce实例，详细介绍了如何利用Hadoop计算学生分数的平均值。在Map阶段，数据被分割并转化为键值对；Reduce阶段则对每个个体的总成绩进行求和并除以课程数目，得出平均分。通过代码解析，揭示了MapReduce执行过程中Combiner的角色，它实际上是一次预减少操作，提高了效率。

摘要由CSDN通过智能技术生成

前面对MapRuduce理念作了学习，有一点领会，趁热打铁做一个小练习，巩固下理念知识才是真理，实践是检验真理的唯一标准。

这里做一个求分数平均数的MapReduce例子，这里引导一位前辈说的方法，我觉得非常道理。就是：

map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白，一个MapReduce程序就会跃然纸上。这里：

Map：指定格式的数据集（如"张三　　60"）——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能

　　得到指定键值对类型的输出（如"（new Text（张三），new　IntWritable（60））"）——输出结果

Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果

鉴于上面的map和reduce过程，我们可以得到如下的代码：

package com.linxiaosheng.test;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.GenericOptionsParser;

import com.linxiaosheng.test.Test1123.MapperClass;
import com.linxiaosheng.test.Test1123.ReducerClass;

public class ScoreAvgTest {
	/**
	 * 
	 * @author hadoop
	 * KEYIN：输入map的key值，为每行文本的开始位置子字节计算，（0,11...）
	 * VALUEIN：输入map的value，为每行文本值
	 * KEYOUT ：输出的key值
	 * VALUEOUT：输出的value值
	 */
	public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{
		 private final static IntWritable score = new IntWritable();
		 private Text name = new Text();
		@Override
		protected void map(Object key, Text value,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			String lineText=value.toString();
			System.out.println("Before Map:"+key+","+lineText);
			StringTokenizer stringTokenizer=new StringTokenizer(lineText);
			while(stringTokenizer.hasMoreTokens()){
				name.set(stringTokenizer.nextToken());
				score.set(Integer.parseInt(stringTokenizer.nextToken()));		
				System.out.println("Aefore Map:"+name+","+score);
				try {
					context.write(name, score);
		            } catch (IOException e) {
		                e.printStackTrace();
		            } catch (InterruptedException e) {
		                e.printStackTrace();
		            }
			}
			
		}
		
	}


	/**
	 * 
	 * @author hadoop
	 *KEYIN:输入的名字
	 *VALUEIN：输入的分数
	 *KEYOUT：输出的名字
	 *VALUEOUT：统计输出的平均分
	 */
	public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{
		public IntWritable result = new IntWritable();
		protected void reduce(Text name, Iterable<IntWritable> scores,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			
			StringBuffer sb=new StringBuffer();
			int sum=0;
			int avg=0;
			int num=0;
			for(IntWritable score:scores){
				int s=score.get();
				sum+=s;
				num++;
				sb.append(s+",");
			}
			avg=sum/num;
			System.out.println("Bfter Reducer:"+name+","+sb.toString());
			System.out.println("After Reducer:"+name+","+avg);
			result.set(avg);
			 try {
				 context.write(name, result);
	            } catch (IOException e) {
	                e.printStackTrace();
	            } catch (InterruptedException e) {
	                e.printStackTrace();
	            }
		}
	}
	
	
	public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        /*if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }*/
        Job job = new Job(conf, "ScoreAvgTest");
        
        job.setJarByClass(ScoreAvgTest.class);
        job.setMapperClass(MapperClass.class);
       job.setCombinerClass(ReducerClass.class);
        job.setReducerClass(ReducerClass.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
        org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
        org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
        System.out.println("end");

    }
}

数据集：这里的数据是码农我自己手工创建的，主要是想看看mapreduce的运行过程，所以就创建了两个文件，当然这里面的成绩也就没有什么是否符合正态分布的考

最低0.47元/天解锁文章

ShawshankLin

关注

1
点赞
踩
17

收藏

觉得还不错? 一键收藏
1
评论
Hadoop小练习——利用MapReduce求平均数

前面对MapRuduce理念作了学习，有一点领会，趁热打铁做一个小练习，巩固下理念知识才是真理，实践是检验真理的唯一标准。这里做一个求分数平均数的MapReduce例子，这里引导一位前辈说的方法，我觉得非常道理。就是：map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白，一个MapReduce程序就会跃然纸上
复制链接

扫一扫

专栏目录