作业初始化
mapreduce源码分析作业提交、初始化、分配、计算过程之提交篇最后讲到Client远程RPC调用Jobtracker的submitJob方法,mapreduce作业初始化就从此处作为切入口。
jobtracker 里的submitJob方法:
public synchronized JobStatus submitJob(JobID jobId) throws IOException {
if(jobs.containsKey(jobId)) {
//job already running, don't start twice
return jobs.get(jobId).getStatus();
}
JobInProgress job = new JobInProgress(jobId, this, this.conf); //创建JobInprogress对象维护作业的运行时信息
String queue = job.getProfile().getQueueName();
if(!(queueManager.getQueues().contains(queue))) {
new CleanupQueue().addToQueue(conf,getSystemDirectoryForJob(jobId));
throw new IOException("Queue \"" + queue + "\" does not exist");
}
// check for access //检查用户是否有指定队列作业提交权限
try {
checkAccess(job, QueueManager.QueueOperation.SUBMIT_JOB);
} catch (IOException ioe) {
LOG.warn("Access denied for user " + job.getJobConf().getUser()
+ ". Ignoring job " + jobId, ioe);
new CleanupQueue().addToQueue(conf, getSystemDirectoryForJob(jobId));
throw ioe;
}
// Check the job if it cannot run in the cluster because of invalid memory
// requirements.
try {
checkMemoryRequirements(job); //检查作业配置的内存是否配置合理,用户提交作业时可用//mapred.job.map.memory.mb mapred.job.reduce.memory.mb指定map,reduce占用的内存量,管理员可能过参数mapred.cluster.max.map.memory.mb, mapred.cluster.max.reduce.memory.mb配置用户最大内存使用量,一旦超过,则作业提交失败
} catch (IOException ioe) {
new CleanupQueue().addToQueue(conf, getSystemDirectoryForJob(jobId));
throw ioe;
}
return addJob(jobId, job); //通知taskscheduler将作业加入作业列队,同时初始化作业 }
jobtracker中listeners列表执行add方法,触发到eagerTaskInitializationListener类去执行init(wait notify方法)-调用jobtracker的initJob-再调用JobInprogress的