Hadoop2.8.5 作业的提交

Hadoop的RPC是以CS两端协议栈和协议引擎构成的,在查看作业的提交流程过程中我们总是从两端去看的,也即作业提交的Client端和作业接收的Service端。以此将作业的提交分为两个阶段。第一阶段是Client端,主要是作业的准备以及作业如何通过RPC协议与Service通讯。第二阶段是Service端如何接收和处理RPC协议请求。 Hadoop 在其历史上曾经提供了新、老两种 API ,此外还提供了另一种变通的方法。

  1. JobClient. runJob (): 调用由JobClien类提供的方法 runJob (),这是所谓老 API。
  2. Job.waitForCompletion ():调用由 Job 类提供的方法 waitForCompletion (), 属于新 API。
  3. ToolRunner. run ():调用由 ToolRunner 类所提供的方法 run ()。

这三种方法都殊途同归到Job.submit()。我们从这里切入,考察Hadoop的作业提交流程。我们从这里切入,考察Hadoop的作业提交流程。

在这里插入图片描述

1. Client端作业的提交

hadoop-mapreduce-client\hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce\Job.java

public class Job extends JobContextImpl implements JobContext {
  ......
  public void submit() 
         throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI(); //API版本
    connect(); //通讯连接
    final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    //ugi:权限认证
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
        //开始提交
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
   }
  
  //准备与集群的连接
  private synchronized void connect()
          throws IOException, InterruptedException, ClassNotFoundException {
    if (cluster == null) {
      cluster = ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                   public Cluster run()
                          throws IOException, InterruptedException, 
                                 ClassNotFoundException {
                     return new Cluster(getConfiguration());
                   }
                 });
    }
  }
  ......
}

hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce\Cluster.java

public class Cluster {
  //集群条件下为 YarnClientProtocolProvider,单机模式下为 LocalClientProtocolProvider
  private ClientProtocolProvider clientProtocolProvider;
  private ClientProtocol client; //这是与外界通信的协议
  //JDK提供的ServiceLoader 装载 ClientProtocolProvider
  static Iterable<ClientProtocolProvider> frameworkLoader =
      ServiceLoader.load(ClientProtocolProvider.class);
  static {
    //加载集群配置文件mapred-default. xml 、 mapred-site. xml 和 yarn-default. xml 、yarn-site. xml
    ConfigUtil.loadResources(); 
   }
   
  public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) 
      throws IOException {
    this.conf = conf;
    this.ugi = UserGroupInformation.getCurrentUser();
    initialize(jobTrackAddr, conf);
  }
  
  //初始化
  private void initialize(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException {
    initProviderList(); //载入所有Provider
    for (ClientProtocolProvider provider : providerList) {
      ClientProtocol clientProtocol = null;
      try {
        //创建 ClientProtocol ,即 LocalJobRunner 或 YARNRunner , 视配置而定
        if (jobTrackAddr == null) {
          clientProtocol = provider.create(conf);
        } else {
          clientProtocol = provider.create(jobTrackAddr, conf);
        }

        if (clientProtocol != null) {
          clientProtocolProvider = provider;
          client = clientProtocol;
          break;
        }
      } catch (Exception e) {
         ......
      }
    }
  }
}

hadoop-mapreduce-client-jobclient\src\main\java\org\apache\hadoop\mapred\YarnClientProtocolProvider.java

public class YarnClientProtocolProvider extends ClientProtocolProvider {

  @Override
  public ClientProtocol create(Configuration conf) throws IOException {
    if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf.get(MRConfig.FRAMEWORK_NAME))) {
      return new YARNRunner(conf); //创建 YARNRunner
    }
    return null;
  }
  
}

hadoop-mapreduce-client-core\src\main\java\org\apache\hadoop\mapreduce\JobSubmitter.java

class JobSubmitter {
  // 来自 Job.submit (),在集群条件下是 YARNRunner
  private ClientProtocol submitClient;
  //提交作业
  JobStatus submitJobInternal(Job job, Cluster cluster) 
    throws ClassNotFoundException, InterruptedException, IOException {
    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);
    //获取目录路径
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    //获取本节点(主机)的 IP 地址
    InetAddress ip = InetAddress.getLocalHost();
    if (ip != null) {
      //本节点 IP 地址的字符串形式
      submitHostAddress = ip.getHostAddress();
      //本节点名称
      submitHostName = ip.getHostName();
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    //生成一个作业 ID 号
    JobID jobId = submitClient.getNewJobID();
    //将作业 ID 号写入 Job 对象
    job.setJobID(jobId);
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      // 准备访问权限认证
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);
      populateTokenCache(conf, job.getCredentials());
      // 需要生成 Mapper 与 Reducer 之间的数据流动所用的密码
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
      }
      //将可执行文件之类拷贝到 HDFS 中,编译好的JAR
      copyAndConfigureFiles(job, submitJobDir);
      //配置文件路径
      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      // 生成切片,以切片数量决定 Mapper 数量
      int maps = writeSplits(job, submitJobDir);
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      //作业调度队列
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString())if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // AddHDFStrackingids , 如果启用了跟踪机制的话
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }
      //将 conf 的内容写入一个 . xml 文件
      writeConf(conf, submitJobFile);
      // 提交作业,通过 YarnRunner.submitJob ()或 LocalJobRunner.submitJob ()
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());
      if (status != null) {
        return status;
      } else {
        throw new IOException("Could not launch job");
      }
    } finally { //善后
      ......
    }
  }
}

hadoop-mapreduce-client-jobclient\src\main\java\org\apache\hadoop\mapred\YARNRunner.java

public class YARNRunner implements ClientProtocol {
  
  //提交作业
  public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
  throws IOException, InterruptedException {
    //缓存token
    addHistoryToken(ts);
    
    //创建一个 ApplicationSubmissionContext ,并将 conf 中的相关信息转移过去
    ApplicationSubmissionContext appContext =
      createApplicationSubmissionContext(conf, jobSubmitDir, ts);

      //将作业提交给资源管理者
    try {
       //由 RM 的 ResourceMgrDelegate 转交 ApplicationSubmissionContext
      ApplicationId applicationId =
          resMgrDelegate.submitApplication(appContext);
      ApplicationReport appMaster = resMgrDelegate
          .getApplicationReport(applicationId);
      //返回作业状态
      return clientCache.getClient(jobId).getJobStatus(jobId);
    } catch (YarnException e) {
      throw new IOException(e);
    }
  }
}

hadoop-mapreduce-client-jobclient\src\main\java\org\apache\hadoop\mapred\ResourceMgrDelegate .java

public class ResourceMgrDelegate extends YarnClient {
  //交给YarnClient
  public ApplicationId  submitApplication(ApplicationSubmissionContext appContext)
          throws YarnException, IOException {
    return client.submitApplication(appContext);
  }
}

hadoop-yarn-client\src\main\java\org\apache\hadoop\yarn\client\api\impl\YarnClientImpl.java

public class YarnClientImpl extends YarnClient {
 @Override
  protected void serviceStart() throws Exception {
    try {
      //获取代理,参考上一篇
      rmClient = ClientRMProxy.createRMProxy(getConfig(),
          ApplicationClientProtocol.class);
      if (historyServiceEnabled) {
        historyClient.start();
      }
    } catch (IOException e) {
      throw new YarnRuntimeException(e);
    }
    super.serviceStart();
  }
  
  //提交作业
  public ApplicationId
      submitApplication(ApplicationSubmissionContext appContext)
          throws YarnException, IOException {
    ApplicationId applicationId = appContext.getApplicationId();
    SubmitApplicationRequest request =
        Records.newRecord(SubmitApplicationRequest.class);
    request.setApplicationSubmissionContext(appContext);
    //由上面生成的代理提交
    rmClient.submitApplication(request);
    int pollCount = 0;
    long startTime = System.currentTimeMillis();
    EnumSet<YarnApplicationState> waitingStates = 
                                 EnumSet.of(YarnApplicationState.NEW,
                                 YarnApplicationState.NEW_SAVING,
                                 YarnApplicationState.SUBMITTED);
    EnumSet<YarnApplicationState> failToSubmitStates = 
                                  EnumSet.of(YarnApplicationState.FAILED,
                                  YarnApplicationState.KILLED);		
    while (true) {
      try {
        // 获取来自 RM 节点的应用状态报告,从中获取本应用的当前状态
        ApplicationReport appReport = getApplicationReport(applicationId);
        YarnApplicationState state = appReport.getYarnApplicationState();
        ......
        }
        try {
          Thread.sleep(submitPollIntervalMillis); //睡眠一段时间
        } catch (InterruptedException ie) {
          throw new YarnException(msg, ie);
        }
      } catch (ApplicationNotFoundException ex) {
       //失败后的再次提交
        rmClient.submitApplication(request);
      }
    }

    return applicationId;
  }
} 

hadoop-yarn-common\src\main\java\org\apache\hadoop\yarn\api\impl\pb\client\ApplicationClientProtocolPBClientImpl.java

proxy.submitApplication (),实 际 上 就 是 由 protoc 编 译 生 成 的ApplicationClientProtocolService. Blocking Interface. submitApplication ()。 ProtoBuf 这一层会根据对方请求直接就调用同样实现了ApplicationClientProtocol PB 的Application ClientProtocolPBServiceImpl的函数submitApplication ()

这样,Client 一侧对于 ApplicationClientProtocolPBClientImpl 所提供函数的调用就转化成 Server 一侧对于 ApplicationClient ProtocolPBServiceImpl 所提供的对应函数的调用。,Server 一侧函数调用的返回值也会转化成 Client 一侧的返回值,这就实现了远程过程调用RPC 。不言而喻, Client / Server 双方的这两个对象必须提供对同一个界面的实现,在这里就是ApplicationClientProtocolPB

@Private
public class ApplicationClientProtocolPBClientImpl implements ApplicationClientProtocol, Closeable {
  //实际的代理
  private ApplicationClientProtocolPB proxy
  // 将协议注册进ProtocolEngine
  public ApplicationClientProtocolPBClientImpl(long clientVersion,
      InetSocketAddress addr, Configuration conf) throws IOException {
    RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class,
      ProtobufRpcEngine.class);
    //参考上一篇,不再做详述
    proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf);
  }
	 @Override
  public SubmitApplicationResponse submitApplication(
      SubmitApplicationRequest request) throws YarnException,
      IOException {
    SubmitApplicationRequestProto requestProto =
        ((SubmitApplicationRequestPBImpl) request).getProto();
    try {
      //交由 proxy 将报文发送出去,并等候服务端回应
      return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
        requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }
}

2.Service端的作业提交

hadoop-yarn-common\src\main\java\org\apache\hadoop\yarn\api\impl\pb\service\ApplicationClientProtocolPBServiceImpl.java
实现ApplicationClientProtocolPB协议,由PotocolBuffer生成的协议文件调用

public class ApplicationClientProtocolPBServiceImpl implements ApplicationClientProtocolPB {
 private ApplicationClientProtocol real; //实际是RMClientService
   @Override
  public SubmitApplicationResponseProto submitApplication(RpcController arg0,
      SubmitApplicationRequestProto proto) throws ServiceException {
    SubmitApplicationRequestPBImpl request = new SubmitApplicationRequestPBImpl(proto);
    try {
      SubmitApplicationResponse response = real.submitApplication(request);
      return ((SubmitApplicationResponsePBImpl)response).getProto();
    } catch (YarnException e) {
      throw new ServiceException(e);
    } catch (IOException e) {
      throw new ServiceException(e);
    }
  }
}

hadoop-yarn-server-resourcemanager\src\main\java\org\apache\hadoop\yarn\server\resourcemanager\ClientRMService.java
最终 转交到RMAppManager 手中

public class ClientRMService extends AbstractService implements ApplicationClientProtocol {
  private final RMAppManager rmAppManager;

  //服务端获取RPC服务,参考上两篇
  protected void serviceStart() throws Exception {
    Configuration conf = getConfig();
    YarnRPC rpc = YarnRPC.create(conf);
    this.server = rpc.getServer(ApplicationClientProtocol.class, this,
            clientBindAddress,
            conf, this.rmDTSecretManager,
            conf.getInt(YarnConfiguration.RM_CLIENT_THREAD_COUNT, 
                YarnConfiguration.DEFAULT_RM_CLIENT_THREAD_COUNT));
    }
    this.server.start();
    clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
                                               YarnConfiguration.RM_ADDRESS,
                                               YarnConfiguration.DEFAULT_RM_ADDRESS,
                                               server.getListenerAddress());
    super.serviceStart();
  }

  @Override
  public SubmitApplicationResponse submitApplication(
      SubmitApplicationRequest request) throws YarnException {
    ApplicationSubmissionContext submissionContext = request
        .getApplicationSubmissionContext();
    ApplicationId applicationId = submissionContext.getApplicationId();
    CallerContext callerContext = CallerContext.getCurrent();
    // App 已在队列中,是重复提交
    if (rmContext.getRMApps().get(applicationId) != null) {
      LOG.info("This is an earlier submitted application: " + applicationId);
      return SubmitApplicationResponse.newInstance();
    }
    //如果未指定提交到哪个队列
    if (submissionContext.getQueue() == null) {
      submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
    }
    //如果未提供 App 名称
    if (submissionContext.getApplicationName() == null) {
      submissionContext.setApplicationName(
          YarnConfiguration.DEFAULT_APPLICATION_NAME);
    }
    //如果未提供 App 类型
    if (submissionContext.getApplicationType() == null) {
      submissionContext
        .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
    } else {
    //须检查提供的 App 类型名称是否太长
      if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
        submissionContext.setApplicationType(submissionContext
          .getApplicationType().substring(0,
            YarnConfiguration.APPLICATION_TYPE_LENGTH));
      }
    }
   //把 App 交到了 RMAppManager 的手里
    try {
      // call RMAppManager to submit application directly
      rmAppManager.submitApplication(submissionContext,
          System.currentTimeMillis(), user);

      LOG.info("Application with id " + applicationId.getId() + 
          " submitted by user " + user);
      RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
          "ClientRMService", applicationId, callerContext);
    } catch (YarnException e) {
      LOG.info("Exception in submitting application with id " +
          applicationId.getId(), e);
      RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
          e.getMessage(), "ClientRMService",
          "Exception in submitting application", applicationId, callerContext);
      throw e;
    }

    SubmitApplicationResponse response = recordFactory
        .newRecordInstance(SubmitApplicationResponse.class);
    return response;
  }
}

RMAppManager 类对象相当于“中央”的一个部门。 ResourceManager 要管的事多得很,对于 App (作业)的管理只是其中之一,而 RMAppManager 就是专门管这个事的。如果不考虑容错所需的备份,那么整个Hadoop 系 统 中 只 有 一 个 ResourceManager,也 只 有 一 个 RMAppManager ,而 且 就 是 由ResourceManager 所创建的。可见,当一个作业,也就是一个 App ,被提交到“中央”的时候,是被交到了 RMAppManager 对象的手里。从作业提交的角度看,一旦进入了 RM 节点上的RMAppManager. submitApplica tion (), 作业的提交就已完成。至于这以后的处理,那是 RM的事了

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值