* <code>OutputFormat</code> describes the output-specification for a
* Map-Reduce job.
* <p>The Map-Reduce framework relies on the <code>OutputFormat</code> of the
* job to:<p>
* <ol>
* <li>
* Validate the output-specification of the job. For e.g. check that the
* output directory doesn't already exist.
* <li>
* Provide the {@link RecordWriter} implementation to be used to write out
* the output files of the job. Output files are stored in a
* {@link FileSystem}.
* </li>
* </ol>
* @see RecordWriter
public abstract class OutputFormat<K, V>
public synchronized
OutputCommitter getOutputCommitter(TaskAttemptContext context
) throws IOException {
if (committer == null) {
Path output = getOutputPath(context);
committer = new FileOutputCommitter(output, context);
return committer;
* <code>OutputCommitter</code> describes the commit of task output for a
* Map-Reduce job.
* <p>The Map-Reduce framework relies on the <code>OutputCommitter</code> of
* the job to:<p>
* <ol>
* <li>
* Setup the job during initialization. For example, create the temporary
* output directory for the job during the initialization of the job.
* </li>
* <li>
* Cleanup the job after the job completion. For example, remove the
* temporary output directory after the job completion.
* </li>
* <li>
* Setup the task temporary output.
* </li>
* <li>
* Check whether a task needs a commit. This is to avoid the commit
* procedure if a task does not need commit.
* </li>
* <li>
* Commit of the task output.
* </li>
* <li>
* Discard the task commit.
* </li>
* </ol>
* The methods in this class can be called from several different processes and
* from several different contexts. It is important to know which process and
* which context each is called from. Each method should be marked accordingly
* in its documentation. It is also important to note that not all methods are
* guaranteed to be called once and only once. If a method is not guaranteed to
* have this property the output committer needs to handle this appropriately.
* Also note it will only be in rare situations where they may be called
* multiple times for the same task.
* @see org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
* @see JobContext
* @see TaskAttemptContext
public abstract class OutputCommitter
public void checkOutputSpecs(JobContext job
) throws FileAlreadyExistsException, IOException{
// Ensure that the output directory is set and not already there
Path outDir = getOutputPath(job);
if (outDir == null) {
throw new InvalidJobConfException("Output directory not set.");
// get delegation token for outDir's file system
new Path[] { outDir }, job.getConfiguration());
if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) {
throw new FileAlreadyExistsException("Output directory " + outDir +
" already exists");
public RecordWriter<K, V>
getRecordWriter(TaskAttemptContext job
) throws IOException, InterruptedException {
Configuration conf = job.getConfiguration();
boolean isCompressed = getCompressOutput(job);
String keyValueSeparator= conf.get(SEPERATOR, "\t");//指定kv之间的分隔符
CompressionCodec codec = null;
String extension = "";
if (isCompressed) {//是否压缩
Class<? extends CompressionCodec> codecClass =
getOutputCompressorClass(job, GzipCodec.class);
codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);
extension = codec.getDefaultExtension();
Path file = getDefaultWorkFile(job, extension);
FileSystem fs = file.getFileSystem(conf);
if (!isCompressed) {//是否压缩
FSDataOutputStream fileOut = fs.create(file, false);
return new LineRecordWriter<K, V>(fileOut, keyValueSeparator);
} else {
FSDataOutputStream fileOut = fs.create(file, false);
return new LineRecordWriter<K, V>(new DataOutputStream
</pre><pre name="code" class="java">
</pre><pre name="code" class="java">
* <code>RecordWriter</code> writes the output <key, value> pairs
* to an output file.
* <p><code>RecordWriter</code> implementations write the job outputs to the
* {@link FileSystem}.
* @see OutputFormat
public abstract class RecordWriter<K, V>
public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key instanceof NullWritable;
boolean nullValue = value == null || value instanceof NullWritable;
if (nullKey && nullValue) {
if (!nullKey) {
if (!(nullKey || nullValue)) {
if (!nullValue) {