仔细分析mapreduce代码

最新推荐文章于 2024-05-14 21:30:47 发布

lyn1539815919

最新推荐文章于 2024-05-14 21:30:47 发布

阅读量605

点赞数 1

分类专栏： hadoop—mapreduce 文章标签： mapreduce

本文链接：https://blog.csdn.net/lyn1539815919/article/details/52254872

版权

hadoop—mapreduce 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

以wordcount为例，参考第一篇代码

程序1：WordCount.java  
package com.wordcount.test;  
  
import java.io.IOException;  
import java.util.StringTokenizer;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.Path;  
import org.apache.hadoop.io.IntWritable;  
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;  
import org.apache.hadoop.mapreduce.Mapper;  
import org.apache.hadoop.mapreduce.Reducer;  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;  
import org.apache.hadoop.util.GenericOptionsParser;  
  
public class WordCount {  
  
  public static class TokenizerMapper   
       extends Mapper<Object, Text, Text, IntWritable>{  
    //IntWritable是 Hadoop 中实现的用于封装 Java 数据类型的类,它的原型是public IntWritable(int value)和public IntWritable()两种。所以new IntWritable(1)是新
//建了这个类的一个对象，而数值1这是参数。在Hadoop中它相当于java中Integer整型变量，为这个变量赋值为1.
//在wordcount这个程序中，后面有语句context.write(word, one)，即将分割后的字符串形成键值对，<单词，1>，就是这个意思。
    private final static IntWritable one = new IntWritable(1);  
    private Text word = new Text();  
        
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {  
      //toString()方法就是把对象转换成String类型
      StringTokenizer itr = new StringTokenizer(value.toString());  
      //一个字符串分解为一个一个的单词或者标记
        while (itr.hasMoreTokens()) {  
        //itr.hasMoreTokens()是看是否还有更多的标记itr.nextToken()是记录下一个标记。
        word.set(itr.nextToken());  
        context.write(word, one); //context是上下文的意思可具体参看下面的解释 
      }  
    }  
  }<span></span><pre name="code" class="java">  public static class IntSumCombiner extends Reducer<Text,IntWritable,Text,IntWritable>{  
      private IntWritable result = new IntWritable();  
        
      @Override  
    protected void reduce(Text key, Iterable<IntWritable> values,  
            Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {  
          int sum = 0;  
          for(IntWritable val : values){  
              sum += val.get();  
          }  
          result.set(sum);//set的用法  
          context.write(key,result);  
      }  
  }  
    
  public static class IntSumReducer   
       extends Reducer<Text,IntWritable,IntWritable,Text> {  
    private IntWritable result = new IntWritable();  
  
    public void reduce(Text key, Iterable<IntWritable> values,   
                       Context context  
                       ) throws IOException, InterruptedException {  
      int sum = 0;  
      for (IntWritable val : values) {  
        sum += val.get();  
      }  
      result.set(sum);  
      context.write(result, key);  
    }  
  }  
<pre name="code" class="java">  public static void main(String[] args) throws Exception {  
    Configuration conf = new Configuration();  //Configuration是运行程序的时候输入的参数的集合
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); //除了输入输出目录，还要加上一些 -D目录，这条代码主要是把其他的目录筛选出来 
    if (otherArgs.length < 2) {  
      System.err.println("Usage: wordcount <in> [<in>...] <out>");  
      System.exit(2);  
    }  
    Job job = new Job(conf, "word count");  
    job.setJarByClass(WordCount.class);  
    job.setMapperClass(TokenizerMapper.class);  
    job.setCombinerClass(IntSumCombiner.class);  
    job.setReducerClass(IntSumReducer.class);  
  
    job.setMapOutputKeyClass(Text.class);  
    job.setMapOutputValueClass(IntWritable.class);  
      
    job.setOutputKeyClass(IntWritable.class);  
    job.setOutputValueClass(Text.class);  
      
    job.setOutputFormatClass(SequenceFileOutputFormat.class);  
     
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  
    FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));  
    System.exit(job.waitForCompletion(true) ? 0 : 1);  
  }  
}

Java中substring方法可以分解字符串，返回的是原字符串的一个子字符串。如果要讲一个字符串分解为一个一个的单词或者标记，StringTokenizer可以帮你。具体参考下面

http://blog.csdn.net/andymu077/article/details/6753182

把Context翻译成“上下文”只是不直观罢了，不过也没大错。我们来看看中文的“上下文”是什么意思。我们常说听话传话不能“断章取义”，而要联系它的“上下文”来看。比如，小丽对王老五说“我爱你”，光看这句还以为在说情话呢。但一看上下文－－“虽然我爱你，但你太穷了，我们还是分手吧”，味道就完全变了。从这里来看“上下文”也有“环境”的意思，就是语言的环境。可具体参看下面的解释

java context解释

lyn1539815919

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
仔细分析mapreduce代码

以wordcount为例，参考第一篇代码程序1：WordCount.java package com.wordcount.test; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import o
复制链接

扫一扫