map/reduce实例（四）

最新推荐文章于 2020-05-22 14:08:40 发布

mo_5201314

最新推荐文章于 2020-05-22 14:08:40 发布

阅读量201

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/mo_5201314/article/details/100100132

版权

hadoop 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

map/reduce经典案例——wordcount进阶

输入数据

hello world
dog fish
hadoop 
spark
hello world
dog fish
hadoop 
spark
hello world
dog fish
hadoop 
spark

要求，输出出现次数前三的单词，用一次map/reduce完成。

要点分析：我们知道mapreduce有分许聚合的功能，所以第一步就是：把每个单词读出来，然后在reduce中聚合，求出每个单词出现的次数但是怎么控制只输出前三名呢？我们知道，map是读一行执行一次，reduce是每一组执行一次所以只用map，和reduce是无法控制输出的次数的但是我们又知道，无论map或者reduce都有 setUp 和cleanUp而且这两个执行一次所以我们可以在reduce阶段把每一个单词当做key，单词出现的次数当做value，每一组存放到一个map里面，此时只存，不写出。在reduce的cleanUp阶,map排序，然后输出前三名。

代码分析：在reduce阶段并不直接context输出而是将reduce的结果输出到一个hashmap中，然后在cleanup中将map转为list而后对list进行排序。

map阶段

 public static class wordcmapper extends Mapper<LongWritable , Text , Text , IntWritable>{
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] words = value.toString().split(" ");
            for (String word: words) {
                context.write(new Text(word),new IntWritable(1));
            }
        }
    }

map阶段和之前相同。

reduce阶段

public static class wordcreducer extends Reducer<Text ,IntWritable , Text , IntWritable>{

    HashMap map = new HashMap<String , Integer>();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        for (IntWritable v:values) {
            count +=v.get();
        }
        map.put(key.toString() , count);
    }

    @Override
    protected void cleanup(Reducer<Text , IntWritable , Text , IntWritable>.Context context) throws IOException, InterruptedException {
        LinkedList<Map.Entry<String, Integer>> linkedList = new LinkedList<Map.Entry<String, Integer>>(map.entrySet());
        Collections.sort(linkedList, new Comparator<Map.Entry<String, Integer>>() {
            public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
                return o2.getValue() - o1.getValue();
            }
        });
        int size = linkedList.size();
        for (int i = 0 ; i <size ; i++) {
            context.write(new Text(linkedList.get(i).getKey()) , new IntWritable(linkedList.get(i).getValue()));
        }
    }
}

在reducer类中reduce方法外面重写cleanup方法。在cleanup方法中将map转为list对list进行排序。

public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job();
    	job.setJarByClass(wordcmapred.class);
    	job.setMapperClass(wordcmapper.class);
    job.setReducerClass(wordcreducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.setInputPaths(job,new Path(args[0]));
    FileOutputFormat.setOutputPath(job,new Path(args[1]));

    if(!job.waitForCompletion(true)){
        return;
    }
}

job类

总结：在mapper和reducer中都有setup方法和cleanup方法，这两个方法在map/reduce中只执行一次，可以用于加载资源或者清除资源，也可以用来做一些轻量级工作，如果执行一些比较大的工作量还是不要在这里面执行，严重影响分布式的优势。

mo_5201314

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
map/reduce实例（四）

map/reduce经典案例——wordcount进阶输入数据hello worlddog fishhadoop sparkhello worlddog fishhadoop sparkhello worlddog fishhadoop spark要求，输出出现次数前三的单词，用一次map/reduce完成。要点分析：我们知道mapreduce有分许聚合的功能，...
复制链接

扫一扫