MapReduce小结

8 篇文章 0 订阅
1 篇文章 0 订阅
1、MapReduce Provides:
      -Automatic parallelization & distribution;
      -Fault-tolerance;
      -Status and monitoring tools;
      -A clean abstraction for programmers
(1)map (in_key, in_value) ->(out_key, intermediate_value) list:
      -Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g.,(filename, line). 
      -map() produces one or more intermediate values along with an output key from the input.
(2)reduce (out_key, intermediate_value list) ->out_value list:
      -After the map phase is over, all the intermediate values for a given output key are combined together into a list;
      -reduce() combines those intermediate values into one or more final values for that same output key  (in practice, usually only one final value per key)
2、Parallelism
(1)map() functions run in parallel,creating different intermediate values from different input data sets
(2)reduce() functions also run in parallel,each working on a different output key
(3)All values are processed independently
(4)Bottleneck: reduce phase can’t start until map phase is completely finished.
3、MapReduce Conclusions
(1)MapReduce has proven to be a useful abstraction in many areas 
(2)Greatly simplifies large-scale computations 
(3)Functional programming paradigm can be applied to large-scale applications
(4)You focus on the “real” problem, library deals with messy details
4、Example Word Count Map()
public static class MapClass extends MapReduceBase implements Mapper {   
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
 
  public void map(WritableComparable key, Writable value,OutputCollector output, Reporter reporter)throws IOException {
    String line = ((Text)value).toString();
    StringTokenizer itr = new StringTokenizer(line);
    while (itr.hasMoreTokens()) {
      word.set(itr.nextToken());
      output.collect(word, one);
    }
  }
}
Reduce()
public static class Reduce extends MapReduceBase implements Reducer {
  public void reduce(WritableComparable key, Iterator  values, OutputCollector output, Reporter reporter)throws IOException {
    int sum = 0;
    while (values.hasNext()) {
       sum += ((IntWritable) values.next()).get();
    }
   output.collect(key, new IntWritable(sum));
  }
}
public static void main(String[] args) throws IOException {
       JobConf conf = new JobConf();
       conf.setOutputKeyClass(Text.class);
       conf.setOutputValueClass(IntWritable.class);
       conf.setMapperClass(MapClass.class);
       conf.setCombinerClass(Reduce.class);
       conf.setReducerClass(Reduce.class);
       conf.setInputPath(new Path(args[0]));
       conf.setOutputPath(new Path(args[1]));
       JobClient.runJob(conf);
}
5、One time setup
      -set hadoop-site.xml and slaves
   -Initiate namenode
       -Run Hadoop MapReduce and DFS
       -Upload your data to DFS
       -Run your process…
       -Download your data from DFS
*A simple programming model for processing large dataset on large set of computer cluster
*Fun to use, focus on problem, and let the library deal with the messy detail
6、References
      - Original paper (http://labs.google.com/papers/mapreduce.html)
      -On wikipedia (http://en.wikipedia.org/wiki/MapReduce)
      -Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)
      -Starfish - MapReduce in Ruby (http://rufy.com/starfish/)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值