hadoop的reducer输出多个文件
关键字: hadoop , mapreduce
有时候我们想到这样的功能: reducer能根据key(或value)值来输出多个文件,同一key(或value)处于同一个文件中。现在hadoop的0.17.x版本可以重写MultipleOutputFormat的generateFileNameForKeyValue就可以实现此功能。
比如:
比如:
- package org.apache.hadoop.mapred.lib;
- import java.io.IOException;
- import org.apache.hadoop.fs.FileSystem;
- import org.apache.hadoop.io.Writable;
- import org.apache.hadoop.io.WritableComparable;
- import org.apache.hadoop.mapred.JobConf;
- import org.apache.hadoop.mapred.RecordWriter;
- import org.apache.hadoop.mapred.TextOutputFormat;
- import org.apache.hadoop.util.Progressable;
- public class MultipleTextOutputFormat<K extends WritableComparable, V extends Writable>
- extends MultipleOutputFormat<K, V> {
- private TextOutputFormat<K, V> theTextOutputFormat = null;
- @Override
- protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs, JobConf job,
- String name, Progressable arg3) throws IOException {
- if (theTextOutputFormat ==