Hadoop的MultipleOutputFormat使用

最新推荐文章于 2020-11-14 11:56:30 发布

VIP文章 dajuezhao

最新推荐文章于 2020-11-14 11:56:30 发布

阅读量8.4k

点赞数

分类专栏： Hadoop 文章标签： hadoop output string 2010 class mapreduce

本文链接：https://blog.csdn.net/dajuezhao/article/details/5799388

版权

一、背景

Hadoop的MapReduce中多文件输出默认是TextOutFormat，输出为part-r- 00000和part-r-00001依次递增的文件名。hadoop提供了

MultipleOutputFormat类，重写该类可实现定制自定义的文件名。

二、技术细节

1.环境：hadoop 0.19（目前hadoop 0.20.2对MultipleOutputFormat支持不好），linux。

2.实现MultipleOutputFormat代码例子如下：

public class WordCount {
   public static class TokenizerMapper extends MapReduceBase  implements
 Mapper<LongWritable, Text, Text, IntWritable> {
     private final static IntWritable count = new IntWritable(1);
     private Text word = new Text();

     public void map(LongWritable key, Text value,
          OutputCollector<Text, IntWritable> output, Reporter  reporter)
          throws IOException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
          word.set(itr.nextToken());
          output.collect(word, count);
        }
     }
   }

   public static class IntSumReducer

最低0.47元/天解锁文章

dajuezhao

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Hadoop的MultipleOutputFormat使用

一、背景 Hadoop的MapReduce中多文件输出默认是TextOutFormat，输出为part-r- 00000和part-r-00001依次递增的文件名。hadoop提供了 MultipleOutputFormat类，重写该类可实现定制自定义的文件名。二、技术细节 1.环境：hadoop 0.19（目前hadoop 0.20.2对MultipleOutputFormat支持不好），linux。 2.实现MultipleOutputFormat代码例子如下：
复制链接

扫一扫