OutputFormat是MapReduce输出的基类,所以实现MapReduce输出都继承了OutputFormat。
TextOutputFormat:默认的输出格式,把每条记录写成文本行。它的键值可以是任意类型,因为TextOutputFormat调用toString()方法将它们转换为字符串。
SequenceFileOutputFormat:将多个文件合并成一个文件输出。
自定义OutputFormat
在一个MapReduce程序中根据数据的不同输出到不同的目录。比如下面将文件中的.com网址输出到com.txt,.net网址输出到net.txt。
www.alibaba.com
www.google.com
www.csdn.net
www.php.net
www.taobao.com
www.zhihu.com
www.youtube.com
www.github.com
www.cnki.net
www.minecraft.net
www.savefrom.net
map中将value作为key。
public class OutputFormatMapper extends Mapper<LongWritable, Text, Text, NullWritable>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, NullWritable>.Context context)
throws IOException, InterruptedException {
context.write(value, NullWritable.get());
}
}
public class OutputFormatReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
@Override
protected void reduce(Text key, Iterable<NullWritable> value,
Reducer<Text, NullWritable, Text, NullWritable>.Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
自定义OutputFormat类:
- 创建一个类继承 FileOutputFormat,重写getRecordWriter() 方法,创建一个继承 RecordWriter 的类。
public class OutputFormatClass extends FileOutputFormat<Text, NullWritable> {
@Override
public RecordWriter<Text, NullWritable> getRecordWriter(TaskAttemptContext job)
throws IOException, InterruptedException {
return new OutPutFormatRecordWriter(job);
}
}
public class OutPutFormatRecordWriter extends RecordWriter<Text, NullWritable> {
FSDataOutputStream cfos;
FSDataOutputStream nfos;
public OutPutFormatRecordWriter(TaskAttemptContext job) {
try {
//1.获取文件系统
FileSystem fs = FileSystem.get(job.getConfiguration());
//2.获取输出流
cfos = fs.create(new Path("e:/site/com.txt"));
nfos = fs.create(new Path("e:/site/net.txt"));
} catch (IOException e) {
e.printStackTrace();
}
}
@Override
public void write(Text key, NullWritable value) throws IOException, InterruptedException {
String site = key.toString();
if (site.matches(".*com")) {
cfos.write(site.getBytes());
cfos.writeBytes("\r\n");
} else {
nfos.write(site.getBytes());
nfos.writeBytes("\r\n");
}
}
@Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
IOUtils.closeStream(cfos);
IOUtils.closeStream(nfos);
}
}
驱动类设置输出的OutputFormat,job.setOutputFormatClass(OutputFormatClass.class);。同时,虽然我们自定义了OutputFormat,但因为OutputFormat继承自FileOutputFormat,而FileOutputFormat需要输出一个 _SUCCESS 文件,所以需要指定输出目录 FileOutputFormat.setOutputPath(job, new Path(args[1]));
public class OutputFormatDriver {
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
args = new String[] { "e:/site.txt", "e:/output"};
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(OutputFormatDriver.class);
job.setMapperClass(OutputFormatMapper.class);
job.setReducerClass(OutputFormatReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
//设置输出的OutputFormat
job.setOutputFormatClass(OutputFormatClass.class);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
com.txt:
www.alibaba.com
www.github.com
www.google.com
www.taobao.com
www.youtube.com
www.zhihu.com
net.txt:
www.cnki.net
www.csdn.net
www.minecraft.net
www.php.net
www.savefrom.net