MultipleOutputs in Hadoop

最新推荐文章于 2020-05-07 16:55:39 发布

lantianjialiang

最新推荐文章于 2020-05-07 16:55:39 发布

阅读量166

点赞数

分类专栏： Hadoop

Hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

MultipleOutputs，说白了就是你想在Reduce中，将结果写到不同的文件中去的时候，来用的。
看看Hadoop的官网的例子和解释：
我们的Driver代码中使用到了MultipleOutputs，并且配置了两个输出前缀，分别是seq和text。

 Job job = new Job();

 FileInputFormat.setInputPath(job, inDir);
 FileOutputFormat.setOutputPath(job, outDir);

 job.setMapperClass(MOMap.class);
 job.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class,
 LongWritable.class, Text.class);

 // Defines additional sequence-file based output 'sequence' for the job
 MultipleOutputs.addNamedOutput(job, "seq",
   SequenceFileOutputFormat.class,
   LongWritable.class, Text.class);
 ...

 job.waitForCompletion(true);
 ...

Reduce中的代码，就是根据代码的逻辑，将结果分别写到seq或者text中去。

 <K, V> String generateFileName(K k, V v) {
   return k.toString() + "_" + v.toString();
 }
 
 public class MOReduce extends
   Reducer<WritableComparable, Writable,WritableComparable, Writable> {
 private MultipleOutputs mos;
 public void setup(Context context) {
 ...
 mos = new MultipleOutputs(context);
 }

 public void reduce(WritableComparable key, Iterator<Writable> values,
 Context context)
 throws IOException {
 ...
 mos.write("text", , key, new Text("Hello"));
 mos.write("seq", LongWritable(1), new Text("Bye"), "seq_a");
 mos.write("seq", LongWritable(2), key, new Text("Chau"), "seq_b");
 mos.write(key, new Text("value"), generateFileName(key, new Text("value")));
 ...
 }

 public void cleanup(Context) throws IOException {
 mos.close();
 ...
 }

 }

当你和LazyOutputFormat一起使用的时候，MultipleOutputs 会模仿MultipleTextOutputFormat 和MultipleSequenceFileOutputFormat的行为，可能是为了兼容老的Hadoop。
注意上例中还可以使用下面的方法来写入到我们自定义的文件前缀中去。这里需要注意的是，如果你指定的baseOutputPath 没有在Worker的output目录内的话，破坏output commit的概念，最好不要使用。

MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath)

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
http://www.lichun.cc/blog/2013/11/how-to-use-hadoop-multipleoutputs/

lantianjialiang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录