MapReduce小结

最新推荐文章于 2020-09-15 19:57:04 发布

slx965

最新推荐文章于 2020-09-15 19:57:04 发布

阅读量673

点赞数

分类专栏： big data Dremel Hadoop 文章标签：大数据 hadoop mapreduce

本文链接：https://blog.csdn.net/hsslx/article/details/17075049

版权

big data 同时被 3 个专栏收录

12 篇文章 0 订阅

订阅专栏

Hadoop

8 篇文章 0 订阅

订阅专栏

Dremel

1 篇文章 0 订阅

订阅专栏

1、MapReduce Provides：

-Automatic parallelization & distribution；

-Fault-tolerance；

-Status and monitoring tools；

-A clean abstraction for programmers

（1）map (in_key, in_value) ->(out_key, intermediate_value) list：

-Records from the data source (lines out of files, rows of a database, etc) are fed into the map function as key*value pairs: e.g.,(filename, line).

-map() produces one or more intermediate values along with an output key from the input.

（2）reduce (out_key, intermediate_value list) ->out_value list：

-After the map phase is over, all the intermediate values for a given output key are combined together into a list；

-reduce() combines those intermediate values into one or more final values for that same output key (in practice, usually only one final value per key)

2、Parallelism

（1）map() functions run in parallel,creating different intermediate values from different input data sets

（2）reduce() functions also run in parallel,each working on a different output key

（3）All values are processed independently

（4）Bottleneck: reduce phase can’t start until map phase is completely finished.

3、MapReduce Conclusions

（1）MapReduce has proven to be a useful abstraction in many areas

（2）Greatly simplifies large-scale computations

（3）Functional programming paradigm can be applied to large-scale applications

（4）You focus on the “real” problem, library deals with messy details

4、Example Word Count ：Map（）

public static class MapClass extends MapReduceBase implements Mapper {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(WritableComparable key, Writable value,OutputCollector output, Reporter reporter)throws IOException {

String line = ((Text)value).toString();

StringTokenizer itr = new StringTokenizer(line);

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

output.collect(word, one);

}

Reduce（）

public static class Reduce extends MapReduceBase implements Reducer {

public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += ((IntWritable) values.next()).get();

}

output.collect(key, new IntWritable(sum));

}

public static void main(String[] args) throws IOException {

JobConf conf = new JobConf();

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(MapClass.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

conf.setInputPath(new Path(args[0]));

conf.setOutputPath(new Path(args[1]));

JobClient.runJob(conf);

}

5、One time setup

-set hadoop-site.xml and slaves

-Initiate namenode

-Run Hadoop MapReduce and DFS

-Upload your data to DFS

-Run your process…

-Download your data from DFS

*A simple programming model for processing large dataset on large set of computer cluster

*Fun to use, focus on problem, and let the library deal with the messy detail

6、References

- Original paper (http://labs.google.com/papers/mapreduce.html)

-On wikipedia (http://en.wikipedia.org/wiki/MapReduce)

-Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)

-Starfish - MapReduce in Ruby (http://rufy.com/starfish/)

slx965

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录