1、需求
MR的map和reduce的输出路径默认就是FileOutPutFormat.setOutPutPath()指定的路径,但是有的时候代码需要对结果进行分类输出,比如错误的信息输出到一个文件中,而正确的输出到另一个文件中。这样就需要自定义去改写一个FileOutPutFormat类来分类指定。
2、自定义代码
-
1、自定义outputFormat插件代码
import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.RecordWriter; import org.apache.hadoop.mapreduce.TaskAttemptContext; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; /** * [@Author](https://my.oschina.net/arthor) liufu */ public class MyOutPutFormat extends FileOutputFormat<Text, NullWritable> { [@Override](https://my.oschina.net/u/1162528) public RecordWriter<Text, NullWritable> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException { Path tocrawlPath = new Path("d:/flow/crawlout/tocrawl.log"); Path enhancedPath = new Path("d:/flow/enhanced/enhanced.log"); FileSystem fs = FileSystem.get(context.getConfiguration()); FSDataOutputStream tocrawlOs = fs.create(tocrawlPath); FSDataOutputStream enhancedOs = fs.create(enhancedPath); return new MyRecordWriter(tocrawlOs,enhancedOs); } static class MyRecordWriter extends RecordWriter<Text, NullWritable>{ FSDataOutputStream tocrawlOs = null; FSDataOutputStream enhancedOs = null; public MyRecordWriter(FSDataOutputStream tocrawlOs, FSDataOutputStream enhancedOs) { this.tocrawlOs = tocrawlOs; this.enhancedOs = enhancedOs; } /** * write方法是把mr程序输出的最后的结果kv写入外部存储系统的实现方法 */ [@Override](https://my.oschina.net/u/1162528) public void write(Text key, NullWritable value) throws IOException, InterruptedException { if(key.toString().contains("tocrawl")){ tocrawlOs.write(key.toString().getBytes()); }else{ enhancedOs.write(key.toString().getBytes()); } } [@Override](https://my.oschina.net/u/1162528) public void close(TaskAttemptContext context) throws IOException, InterruptedException { if(tocrawlOs!=null) tocrawlOs.close(); if(enhancedOs!=null) enhancedOs.close(); } } }
-
2、如何设置使用?
实现:(参考TextOutputFormat.class)
特别注意
第二句还是需要的?因为fileinputformat的最终结果还有一个secusses的文件,需要默认指定到哪里。