版本:
2.2版
描述:
OutputFormat是设置MR的结果输出写操作格式,包括如何写?写那?也就是定义写规则
类代码:
抽象类定义:
public abstract RecordWriter<K, V> getRecordWriter(
TaskAttemptContext context) throws IOException,
InterruptedException;
public abstract void checkOutputSpecs(JobContext context)
throws IOException, InterruptedException;
public abstract OutputCommitter getOutputCommitter(
TaskAttemptContext context) throws IOException,
InterruptedException;
获取RecordWriter定义了写的具体操作,那么他抽象的方法如下:
public abstract void write(K key, V value) throws IOException,
InterruptedException;
public abstract void close(TaskAttemptContext context) throws IOException,
InterruptedException;
也就是具体的写和资源关闭操作,比如LineRecordWriter那么他就是基于Key和Value分割然后直接写的操作
在OutputCommitter中定义了跟MRjob执行情况的一些操作,比如job启动,job失败等,其抽象操作如下:
public abstract void setupJob(JobContext jobContext) throws IOException;
@Deprecated
public void cleanupJob(JobContext jobContext) throws IOException {
}
public void commitJob(JobContext jobContext) throws IOException {
cleanupJob(jobContext);
}
public void abortJob(JobContext jobContext, JobStatus.State state)
throws IOException {
cleanupJob(jobContext);
}
public abstract void setupTask(TaskAttemptContext taskContext)
throws IOException;
public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
throws IOException;
public abstract void commitTask(TaskAttemptContext taskContext)
throws IOException;
public abstract void abortTask(TaskAttemptContext taskContext)
throws IOException;
public boolean isRecoverySupported() {
return false;
}
public void recoverTask(TaskAttemptContext taskContext) throws IOException {
}
在写的操作中需要核实资源是否够用,资源是否合理被操作等操作都是在checkOutputSpecs中进行的