一、背景
Hadoop的MapReduce中多文件输出默认是TextOutFormat,输出为part-r- 00000和part-r-00001依次递增的文件名。hadoop提供了
MultipleOutputFormat类,重写该类可实现定制自定义的文件名。
二、技术细节
1.环境:hadoop 0.19(目前hadoop 0.20.2对MultipleOutputFormat支持不好),linux。
2.实现MultipleOutputFormat代码例子如下:
public class WordCount { public static class TokenizerMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable count = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, count); } } } public static class IntSumReducer