前面对MapRuduce理念作了学习,有一点领会,趁热打铁做一个小练习,巩固下理念知识才是真理,实践是检验真理的唯一标准。
这里做一个求分数平均数的MapReduce例子,这里引导一位前辈说的方法,我觉得非常道理。就是:
map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白,一个MapReduce程序就会跃然纸上。这里:
Map:指定格式的数据集(如"张三 60")——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能
得到指定键值对类型的输出(如"(new Text(张三),new IntWritable(60))")——输出结果
Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果
鉴于上面的map和reduce过程,我们可以得到如下的代码:
package com.linxiaosheng.test;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.GenericOptionsParser;
import com.linxiaosheng.test.Test1123.MapperClass;
import com.linxiaosheng.test.Test1123.ReducerClass;
public class ScoreAvgTest {
/**
*
* @author hadoop
* KEYIN:输入map的key值,为每行文本的开始位置子字节计算,(0,11...)
* VALUEIN:输入map的value,为每行文本值
* KEYOUT :输出的key值
* VALUEOUT:输出的value值
*/
public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable score = new IntWritable();
private Text name = new Text();
@Override
protected void map(Object key, Text value,Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
String lineText=value.toString();
System.out.println("Before Map:"+key+","+lineText);
StringTokenizer stringTokenizer=new StringTokenizer(lineText);
while(stringTokenizer.hasMoreTokens()){
name.set(stringTokenizer.nextToken());
score.set(Integer.parseInt(stringTokenizer.nextToken()));
System.out.println("Aefore Map:"+name+","+score);
try {
context.write(name, score);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
/**
*
* @author hadoop
*KEYIN:输入的名字
*VALUEIN:输入的分数
*KEYOUT:输出的名字
*VALUEOUT:统计输出的平均分
*/
public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{
public IntWritable result = new IntWritable();
protected void reduce(Text name, Iterable<IntWritable> scores,Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
StringBuffer sb=new StringBuffer();
int sum=0;
int avg=0;
int num=0;
for(IntWritable score:scores){
int s=score.get();
sum+=s;
num++;
sb.append(s+",");
}
avg=sum/num;
System.out.println("Bfter Reducer:"+name+","+sb.toString());
System.out.println("After Reducer:"+name+","+avg);
result.set(avg);
try {
context.write(name, result);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
/*if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}*/
Job job = new Job(conf, "ScoreAvgTest");
job.setJarByClass(ScoreAvgTest.class);
job.setMapperClass(MapperClass.class);
job.setCombinerClass(ReducerClass.class);
job.setReducerClass(ReducerClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("end");
}
}
数据集:这里的数据是码农我自己手工创建的,主要是想看看mapreduce的运行过程,所以就创建了两个文件,当然这里面的成绩也就没有什么是否符合正态分布的考