job提交的源码跟踪

job提交的源码跟踪
回顾总结提交作业的过程:
1.先执行方法job.waitForCompletion(),该方法是属于job类完成作业的提交并监控和打印job提交的过程。

在job.waitForCompletion()方法中,先判断job的状态是否为DEFINE,若为DEFINE则调用submit()方法,
其中submit()是该方法实现的核心。
在job.waitForCompletion()方法中,负责监控和打印job提交过程的方法是monitorAndPrintJob();
2.查看submit()方法的源码,该方法完成的功能是:
关键是创建submitter对象,并调用getJobSubmitter()方法,该方法中的重要参数是Cluster对象,不同的submiter对象
的创建是由该cluster对象决定的,Cluster对象是在connect()方法中创建的,
连接客户端和服务器端connect()方法
也检查了job的状态,设置为新的API等方法。
3.查看connect()方法中就是判断Cluster对象是否为空,Cluster对象是RPC Client对象,为空就创建一个Cluster对象
给其赋值。Cluster对象是如何创建的:
在connect()方法中调用了Cluster(),是Cluster对象的构造方法,在Cluster()中重要的是initialize()方法。
4.查看initialize()的源码:
在该方法中我们可以看到通过遍历frameworkLoader的值,由此处也可以看到是本地模式还是yarn集群模式,
根据不同的frameworkLoader的值来调用不同的方法create(),我们从下图中可以看到有两个create()方法的
   
当我们选择LocalClientProtocolProvider时,此处为本地模式,该方法的返回值就是return new LocalJobRunner(conf);
 
当选择YarnClientProtocolProvider时,此处为yarn集群模式,该方法的返回值就是 return new YARNRunner(conf);
只到此处我们可以看到Runner的创建。
在submit()方法中调用方法submitJobInternal()方法,判断作业的输出路径是否存在,获取jobId,创建路径
最后设置job的状态为RUNNING: state = JobState.RUNNING;

下面是job提交到yarn集群时源码跟踪:
job的提交是 job.waitForCompletion()方法起作用,产生一个提交器。在该处设置一个断点,可以跟踪该过程。
//以下是waitForCompletion的源码
    
public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
// 查看job的状态(可以通过浏览器访问端口8088,查看All Application 页面已完成的job的状态为success但此时的状态应当为DEFINE)
    if (state == JobState.DEFINE) {
//将其job提交
      submit();
    }
//判断传入boolean参数是否为true,为true则打印出job的相关信息
    if (verbose) {
//打印监控和打印job提交的信息
      monitorAndPrintJob();
    } else {
      // get the completion poll interval from the client.
      int completionPollIntervalMillis = 
        Job.getCompletionPollInterval(cluster.getConf());
      while (!isComplete()) {
        try {
          Thread.sleep(completionPollIntervalMillis);
        } catch (InterruptedException ie) {
        }
      }
    }
    return isSuccessful();
  }


  跟踪submit()方法,源码如下:
  public void submit() throws IOException, InterruptedException, ClassNotFoundException {

    //检查job的状态
    ensureState(JobState.DEFINE);
    //设置新的API,因为版本的更新
    setUseNewAPI();
    //客户端和服务端的连接
    connect();
    //创建submitter,submitter的创建是根据getJobSubmitter()方法的参数区别创建的cluster
     由connect()方法产生的客户端
    final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
//提交该作业,根据不同的cluster进行不同的作业(下面看一下connect()方法的实现过程)
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
//设置job的状态为RUNNING
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());</span>
   }
//connect()方法实现源码:
private synchronized void connect()
          throws IOException, InterruptedException, ClassNotFoundException {
               
  //判断cluster是否为空,为空则为cluster赋值
    if (cluster == null) {
      cluster = 
        ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                   public Cluster run()
                          throws IOException, InterruptedException, 
                                 ClassNotFoundException {
//此处通过Cluster的构造方法创建一个新的Cluster对象
                     return new Cluster(getConfiguration());
                   }
                 });
    }</span>
  }
 
在Cluster构造方法的源码中可以了解到:
 
public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) 
      throws IOException {
    this.conf = conf;
    this.ugi = UserGroupInformation.getCurrentUser();
//在构造方法中重点是该方法中的实现
    initialize(jobTrackAddr, conf);
  }</span>

以下是  initialize()方法的实现的源码:
//clientProtocolProvider 动态代理对象,表示某种协议
//clientProtocolProvider 是为创建client对象
较为重要的是client、fs对象的创建
     
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)throws IOException {


    synchronized (frameworkLoader) {
//遍历provider,不同的provider来创建不同的client,frameworkLoader是 LocalJobRunner和YarnRunner
      for (ClientProtocolProvider provider : frameworkLoader) {
        LOG.debug("Trying ClientProtocolProvider : "
            + provider.getClass().getName());
        ClientProtocol clientProtocol = null; 
        try {
          if (jobTrackAddr == null) {
            clientProtocol = provider.create(conf);//由此进入create()方法,下面有create()方法的源码
          } else {
            clientProtocol = provider.create(jobTrackAddr, conf);//由此进入create()方法,下面有create()方法的源码
          }


          if (clientProtocol != null) {
            clientProtocolProvider = provider;
//在此处将clientProtocol赋值给client
            client = clientProtocol;
            LOG.debug("Picked " + provider.getClass().getName()
                + " as the ClientProtocolProvider");
            break;
          }
          else {
            LOG.debug("Cannot pick " + provider.getClass().getName()
                + " as the ClientProtocolProvider - returned null protocol");
          }
        } 
        catch (Exception e) {
          LOG.info("Failed to use " + provider.getClass().getName()
              + " due to error: " + e.getMessage());
        }
      }
    }


    if (null == clientProtocolProvider || null == client) {
      throw new IOException(
          "Cannot initialize Cluster. Please check your configuration for "
              + MRConfig.FRAMEWORK_NAME
              + " and the correspond server addresses.");
    }
  }</span>
//  clientProtocol = provider.create(conf);进入create()方法
    
public class LocalClientProtocolProvider extends ClientProtocolProvider {
public ClientProtocol create(Configuration conf) throws IOException {
//从配置文件中取出mapreduce.frame.name的值赋给framework,若未取到该值则赋值为local
    String framework =
        conf.get(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
//判断当前的mapreduce.frame.name的值与framework比较,判断是否相等,若不相等则返回空
    if (!MRConfig.LOCAL_FRAMEWORK_NAME.equals(framework)) {
      return null;
    }
    conf.setInt(JobContext.NUM_MAPS, 1);
//若为二者相等创建LocalJobRunner,返回LocalJobRunner(conf);
    return new LocalJobRunner(conf);
  }</span>


//此处与LocalClientProtocolProvider相同
    
public class YarnClientProtocolProvider extends ClientProtocolProvider {


  @Override
  public ClientProtocol create(Configuration conf) throws IOException {
//仍是判断本地的mapreduce.frame.name的值与所取到的值是否相等
//可以查看YARN_FRAMEWORK_NAME   LOCAL_FRAMEWORK_NAME的值,对我们方便来读源码
//  public static final String YARN_FRAMEWORK_NAME  = "yarn";
//  public static final String LOCAL_FRAMEWORK_NAME = "local";
    if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf.get(MRConfig.FRAMEWORK_NAME))) {
//创建一个YARNRunner
      return new YARNRunner(conf);
    }
    return null;
  }


  @Override
  public ClientProtocol create(InetSocketAddress addr, Configuration conf)
      throws IOException {
    return create(conf);
  }</span>


查看submitJobInternal()方法:
 
JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {


    //检查job作业完成后的输出目录
    checkSpecs(job);

//获取到job的配置参数
    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

//创建路径/tmp/hadoop-yarn/staging/hadoop/.staging
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    // //获取客户端的ip地址,确定谁在提交
    InetAddress ip = InetAddress.getLocalHost();

    if (ip != null) {
      submitHostAddress = ip.getHostAddress();
      submitHostName = ip.getHostName();
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
//获取jobId。通过submitClient(此时值为YarnRunner)客户端来与ResourceManager通信,获取jobId
    JobID jobId = submitClient.getNewJobID();
//设置job对象中的jobId值
    job.setJobID(jobId);
//用jobID创建出以唯一的路径
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);
      
      populateTokenCache(conf, job.getCredentials());


      // generate a secret to authenticate shuffle transfers 缓存信息
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }

//向我们用jobId关联的路径内写入资源,包含写入副本的信息,此处相当于客户端定义的副本数优先级优于服务器端,拷贝jar包
      copyAndConfigureFiles(job, submitJobDir);
      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
//划分切数据片,来划分map,并将划分的策略写入一个文件,也会提交
      int maps = writeSplits(job, submitJobDir);
//将map写入conf中
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);


      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());


      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);


      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }


      // Write job file to submit dir
      writeConf(conf, submitJobFile);
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());
      if (status != null) {
        return status;
      } else {
        throw new IOException("Could not launch job");
      }
    } finally {
      if (status == null) {
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);


      }
    }
  }
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值