job提交的源码跟踪
回顾总结提交作业的过程:
1.先执行方法job.waitForCompletion(),该方法是属于job类完成作业的提交并监控和打印job提交的过程。
在job.waitForCompletion()方法中,先判断job的状态是否为DEFINE,若为DEFINE则调用submit()方法,
其中submit()是该方法实现的核心。
其中submit()是该方法实现的核心。
在job.waitForCompletion()方法中,负责监控和打印job提交过程的方法是monitorAndPrintJob();
2.查看submit()方法的源码,该方法完成的功能是:
关键是创建submitter对象,并调用getJobSubmitter()方法,该方法中的重要参数是Cluster对象,不同的submiter对象
的创建是由该cluster对象决定的,Cluster对象是在connect()方法中创建的,连接客户端和服务器端connect()方法。
也检查了job的状态,设置为新的API等方法。
的创建是由该cluster对象决定的,Cluster对象是在connect()方法中创建的,连接客户端和服务器端connect()方法。
也检查了job的状态,设置为新的API等方法。
3.查看connect()方法中就是判断Cluster对象是否为空,Cluster对象是RPC Client对象,为空就创建一个Cluster对象
给其赋值。Cluster对象是如何创建的:
给其赋值。Cluster对象是如何创建的:
在connect()方法中调用了Cluster(),是Cluster对象的构造方法,在Cluster()中重要的是initialize()方法。
4.查看initialize()的源码:
在该方法中我们可以看到通过遍历frameworkLoader的值,由此处也可以看到是本地模式还是yarn集群模式,
根据不同的frameworkLoader的值来调用不同的方法create(),我们从下图中可以看到有两个create()方法的
根据不同的frameworkLoader的值来调用不同的方法create(),我们从下图中可以看到有两个create()方法的
当我们选择LocalClientProtocolProvider时,此处为本地模式,该方法的返回值就是return new LocalJobRunner(conf);
当选择YarnClientProtocolProvider时,此处为yarn集群模式,该方法的返回值就是 return new YARNRunner(conf);
只到此处我们可以看到Runner的创建。
在submit()方法中调用方法submitJobInternal()方法,判断作业的输出路径是否存在,获取jobId,创建路径
最后设置job的状态为RUNNING: state = JobState.RUNNING;
下面是job提交到yarn集群时源码跟踪:
job的提交是 job.waitForCompletion()方法起作用,产生一个提交器。在该处设置一个断点,可以跟踪该过程。
//以下是waitForCompletion的源码
public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
ClassNotFoundException {
// 查看job的状态(可以通过浏览器访问端口8088,查看All Application 页面已完成的job的状态为success但此时的状态应当为DEFINE)
if (state == JobState.DEFINE) {
//将其job提交
submit();
}
//判断传入boolean参数是否为true,为true则打印出job的相关信息
if (verbose) {
//打印监控和打印job提交的信息
monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis =
Job.getCompletionPollInterval(cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
return isSuccessful();
}
跟踪submit()方法,源码如下:
public void submit() throws IOException, InterruptedException, ClassNotFoundException {
//检查job的状态
ensureState(JobState.DEFINE);
//设置新的API,因为版本的更新
setUseNewAPI();
//客户端和服务端的连接
connect();
//创建submitter,submitter的创建是根据getJobSubmitter()方法的参数区别创建的cluster
由connect()方法产生的客户端
final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
public JobStatus run() throws IOException, InterruptedException,
ClassNotFoundException {
//提交该作业,根据不同的cluster进行不同的作业(下面看一下connect()方法的实现过程)
return submitter.submitJobInternal(Job.this, cluster);
}
});
//设置job的状态为RUNNING
state = JobState.RUNNING;
LOG.info("The url to track the job: " + getTrackingURL());</span>
}
//connect()方法实现源码:
private synchronized void connect()
throws IOException, InterruptedException, ClassNotFoundException {
throws IOException, InterruptedException, ClassNotFoundException {
//判断cluster是否为空,为空则为cluster赋值
if (cluster == null) {
cluster =
ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
public Cluster run()
throws IOException, InterruptedException,
ClassNotFoundException {
//此处通过Cluster的构造方法创建一个新的Cluster对象
return new Cluster(getConfiguration());
}
});
}</span>
}
在Cluster构造方法的源码中可以了解到:
public Cluster(InetSocketAddress jobTrackAddr, Configuration conf)
throws IOException {
this.conf = conf;
this.ugi = UserGroupInformation.getCurrentUser();
//在构造方法中重点是该方法中的实现
initialize(jobTrackAddr, conf);
}</span>
以下是 initialize()方法的实现的源码:
//clientProtocolProvider 动态代理对象,表示某种协议
//clientProtocolProvider 是为创建client对象
较为重要的是client、fs对象的创建
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)throws IOException {
synchronized (frameworkLoader) {
//遍历provider,不同的provider来创建不同的client,frameworkLoader是 LocalJobRunner和YarnRunner
for (ClientProtocolProvider provider : frameworkLoader) {
LOG.debug("Trying ClientProtocolProvider : "
+ provider.getClass().getName());
ClientProtocol clientProtocol = null;
try {
if (jobTrackAddr == null) {
clientProtocol = provider.create(conf);//由此进入create()方法,下面有create()方法的源码
} else {
clientProtocol = provider.create(jobTrackAddr, conf);//由此进入create()方法,下面有create()方法的源码
}
if (clientProtocol != null) {
clientProtocolProvider = provider;
//在此处将clientProtocol赋值给client
client = clientProtocol;
LOG.debug("Picked " + provider.getClass().getName()
+ " as the ClientProtocolProvider");
break;
}
else {
LOG.debug("Cannot pick " + provider.getClass().getName()
+ " as the ClientProtocolProvider - returned null protocol");
}
}
catch (Exception e) {
LOG.info("Failed to use " + provider.getClass().getName()
+ " due to error: " + e.getMessage());
}
}
}
if (null == clientProtocolProvider || null == client) {
throw new IOException(
"Cannot initialize Cluster. Please check your configuration for "
+ MRConfig.FRAMEWORK_NAME
+ " and the correspond server addresses.");
}
}</span>
// clientProtocol = provider.create(conf);进入create()方法
public class LocalClientProtocolProvider extends ClientProtocolProvider {
public ClientProtocol create(Configuration conf) throws IOException {
//从配置文件中取出mapreduce.frame.name的值赋给framework,若未取到该值则赋值为local
String framework =
conf.get(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
//判断当前的mapreduce.frame.name的值与framework比较,判断是否相等,若不相等则返回空
if (!MRConfig.LOCAL_FRAMEWORK_NAME.equals(framework)) {
return null;
}
conf.setInt(JobContext.NUM_MAPS, 1);
//若为二者相等创建LocalJobRunner,返回LocalJobRunner(conf);
return new LocalJobRunner(conf);
}</span>
//此处与LocalClientProtocolProvider相同
public class YarnClientProtocolProvider extends ClientProtocolProvider {
@Override
public ClientProtocol create(Configuration conf) throws IOException {
//仍是判断本地的mapreduce.frame.name的值与所取到的值是否相等
//可以查看YARN_FRAMEWORK_NAME LOCAL_FRAMEWORK_NAME的值,对我们方便来读源码
// public static final String YARN_FRAMEWORK_NAME = "yarn";
// public static final String LOCAL_FRAMEWORK_NAME = "local";
if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf.get(MRConfig.FRAMEWORK_NAME))) {
//创建一个YARNRunner
return new YARNRunner(conf);
}
return null;
}
@Override
public ClientProtocol create(InetSocketAddress addr, Configuration conf)
throws IOException {
return create(conf);
}</span>
查看submitJobInternal()方法:
JobStatus submitJobInternal(Job job, Cluster cluster)
throws ClassNotFoundException, InterruptedException, IOException {
//检查job作业完成后的输出目录
checkSpecs(job);
//获取到job的配置参数
Configuration conf = job.getConfiguration();
addMRFrameworkToDistributedCache(conf);
//创建路径/tmp/hadoop-yarn/staging/hadoop/.staging
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
// //获取客户端的ip地址,确定谁在提交
InetAddress ip = InetAddress.getLocalHost();
if (ip != null) {
submitHostAddress = ip.getHostAddress();
submitHostName = ip.getHostName();
conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
}
//获取jobId。通过submitClient(此时值为YarnRunner)客户端来与ResourceManager通信,获取jobId
JobID jobId = submitClient.getNewJobID();
//设置job对象中的jobId值
job.setJobID(jobId);
//用jobID创建出以唯一的路径
Path submitJobDir = new Path(jobStagingArea, jobId.toString());
JobStatus status = null;
try {
conf.set(MRJobConfig.USER_NAME,
UserGroupInformation.getCurrentUser().getShortUserName());
conf.set("hadoop.http.filter.initializers",
"org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
LOG.debug("Configuring job " + jobId + " with " + submitJobDir
+ " as the submit dir");
// get delegation token for the dir
TokenCache.obtainTokensForNamenodes(job.getCredentials(),
new Path[] { submitJobDir }, conf);
populateTokenCache(conf, job.getCredentials());
// generate a secret to authenticate shuffle transfers 缓存信息
if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
KeyGenerator keyGen;
try {
keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
keyGen.init(SHUFFLE_KEY_LENGTH);
} catch (NoSuchAlgorithmException e) {
throw new IOException("Error generating shuffle secret key", e);
}
SecretKey shuffleKey = keyGen.generateKey();
TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
job.getCredentials());
}
//向我们用jobId关联的路径内写入资源,包含写入副本的信息,此处相当于客户端定义的副本数优先级优于服务器端,拷贝jar包
copyAndConfigureFiles(job, submitJobDir);
Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
// Create the splits for the job
LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
//划分切数据片,来划分map,并将划分的策略写入一个文件,也会提交
int maps = writeSplits(job, submitJobDir);
//将map写入conf中
conf.setInt(MRJobConfig.NUM_MAPS, maps);
LOG.info("number of splits:" + maps);
// write "queue admins of the queue to which job is being submitted"
// to job file.
String queue = conf.get(MRJobConfig.QUEUE_NAME,
JobConf.DEFAULT_QUEUE_NAME);
AccessControlList acl = submitClient.getQueueAdmins(queue);
conf.set(toFullPropertyName(queue,
QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());
// removing jobtoken referrals before copying the jobconf to HDFS
// as the tasks don't need this setting, actually they may break
// because of it if present as the referral will point to a
// different job.
TokenCache.cleanUpTokenReferral(conf);
if (conf.getBoolean(
MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
// Add HDFS tracking ids
ArrayList<String> trackingIds = new ArrayList<String>();
for (Token<? extends TokenIdentifier> t :
job.getCredentials().getAllTokens()) {
trackingIds.add(t.decodeIdentifier().getTrackingId());
}
conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
trackingIds.toArray(new String[trackingIds.size()]));
}
// Write job file to submit dir
writeConf(conf, submitJobFile);
//
// Now, actually submit the job (using the submit name)
//
printTokens(jobId, job.getCredentials());
status = submitClient.submitJob(
jobId, submitJobDir.toString(), job.getCredentials());
if (status != null) {
return status;
} else {
throw new IOException("Could not launch job");
}
} finally {
if (status == null) {
LOG.info("Cleaning up the staging area " + submitJobDir);
if (jtFs != null && submitJobDir != null)
jtFs.delete(submitJobDir, true);
}
}
}