Inputs and Outputs
- MapReduce 框架执行 <key,Value> < k e y , V a l u e > <script type="math/tex" id="MathJax-Element-9"> </script>对。输入job设置为 <key,Value> < k e y , V a l u e > <script type="math/tex" id="MathJax-Element-10"> </script>,输出也是 <Key,Value> < K e y , V a l u e > <script type="math/tex" id="MathJax-Element-11"> </script>,可以是不同的类型。
- Key的类需要实现WritableCombale接口通过框架排序。
样例:WordCount 1.0
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
(WordCount)MapReduce 执行过程
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
- 输入文件输入后,被分成几份map进行并行处理, Mapper M a p p e r 是实现了,把字符串通过空格进行分割,并以 <字符,1> < 字 符 , 1 > <script type="math/tex" id="MathJax-Element-13"> <字符,1> </script>的形式输出。
举个例子:
第一个map 输出:
< Hello, 1> < World, 1> < Bye, 1> < World, 1>
第二个map输出:
< Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1>
- Map的执行过程中有个 localcombiner l o c a l c o m b i n e r 使之相同的字符串进行聚合。
第一个map的输出:
< Bye, 1> < Hello, 1> < World, 2>
第二个map输出:
< Goodbye, 1> < Hadoop, 2> < Hello, 1>
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
- Reducer通常被用来计算Values的值
- job的输出是:
< Bye, 1>
< Goodbye, 1>
< Hadoop, 2>
< Hello, 2>
< World, 2>
- main方法主要包括 input/output路径、key/Value类型,input/output格式等 i n p u t / o u t p u t 路 径 、 k e y / V a l u e 类 型 , i n p u t / o u t p u t 格 式 等 , job.waitForCompletion j o b . w a i t F o r C o m p l e t i o n 为了提交job和监控它的进程。
MapReduce -User Interfaces
Mapper
- Maps 是将各个独立的任务转移输入文件到中间文件,被转移的中间文件
不需要
和输入文件的类型是一样。一个输入的 <KeyValue> < K e y V a l u e > <script type="math/tex" id="MathJax-Element-17"> </script>对可能有Map 0个或 多个 <Key,value> < K e y , v a l u e > <script type="math/tex" id="MathJax-Element-18"> </script>对输出。 hadoop MapReduce
框架中,通过job的inputFormat生成的每个InputSplit会产生一个map任务。- Mapper 的实现经过job的 job.setMapperClass j o b . s e t M a p p e r C l a s s 方法,然后框架为在inputSplit的每个键值对调用 map(