JobSubmitter,顾名思义,它是MapReduce中作业提交者,而实际上JobSubmitter除了构造方法外,对外提供的唯一一个非private成员变量或方法就是submitJobInternal()方法,它是提交Job的内部方法,实现了提交Job的所有业务逻辑。本文,我们将深入研究MapReduce中用于提交Job的组件JobSubmitter。
首先,我们先看下JobSubmitter的类成员变量,如下:
// 文件系统FileSystem实例
private FileSystem jtFs;
// 客户端通信协议ClientProtocol实例
private ClientProtocol submitClient;
// 提交作业的主机名
private String submitHostName;
// 提交作业的主机地址
private String submitHostAddress;
它一共有四个类成员变量,分别为:
1、文件系统FileSystem实例jtFs:用于操作作业运行需要的各种文件等;
2、客户端通信协议ClientProtocol实例submitClient:用于与集群交互,完成作业提交、作业状态查询等;
3、提交作业的主机名submitHostName;
4、提交作业的主机地址submitHostAddress。
其中,客户端通信协议ClientProtocol实例submitClient是通过Cluster的客户端通信协议ClientProtocol实例client来赋值的,我们在《MapReduce源码分析之新API作业提交(二):连接集群》一文中曾经提到过,它根据MapReduce中参数mapreduce.framework.name的配置为yarn或local,有Yarn模式的YARNRunner和Local模式的LocalJobRunner两种情况。
接下来,我们再看下JobSubmitter的构造函数,如下:
JobSubmitter(FileSystem submitFs, ClientProtocol submitClient)
throws IOException {
// 根据入参赋值成员变量submitClient、jtFs
this.submitClient = submitClient;
this.jtFs = submitFs;
}
很简单,根据入参赋值成员变量submitClient、jtFs而已。
关键的来了,我们看下JobSubmitter唯一的对外核心功能方法submitJobInternal(),它被用于提交作业至集群,代码如下:
/**
* Internal method for submitting jobs to the system.
*
* <p>The job submission process involves:
* <ol>
* <li>
* Checking the input and output specifications of the job.
* </li>
* <li>
* Computing the {@link InputSplit}s for the job.
* </li>
* <li>
* Setup the requisite accounting information for the
* {@link DistributedCache} of the job, if necessary.
* </li>
* <li>
* Copying the job's jar and configuration to the map-reduce system
* directory on the distributed file-system.
* </li>
* <li>
* Submitting the job to the <code>JobTracker</code> and optionally
* monitoring it's status.
* </li>
* </ol></p>
* @param job the configuration to submit
* @param cluster the handle to the Cluster
* @throws ClassNotFoundException
* @throws InterruptedException
* @throws IOException
*/
JobStatus submitJobInternal(Job job, Cluster cluster)
throws ClassNotFoundException, InterruptedException, IOException {
//validate the jobs output spe