Hadoop之YARN客户端向ResourceManager提交作业

YARN客户端向ResourceManager提交作业

提交作业流程概述
1. 准备阶段
  1. 编写作业: 使用 MapReduce API 或其他支持的框架(如 Spark, Flink 等)编写应用程序。
  2. 打包作业: 将应用程序及其依赖打包成 JAR 文件或其他格式的归档文件。
  3. 配置参数: 设置作业的配置参数,例如资源需求、队列选择等。
2. 提交作业
  1. 提交作业到 ResourceManager (RM): 使用命令行工具或者通过 API 提交作业到 ResourceManager。ResourceManager 是 YARN 中负责集群资源管理和调度的核心组件。
  2. 创建 Application ID: ResourceManager 为每个提交的作业分配一个唯一的 Application ID。
  3. 分配 Container: ResourceManager 为 ApplicationMaster 分配第一个 Container。
3. 启动 ApplicationMaster (AM)
  1. ApplicationMaster 启动: 在分配的 Container 中启动 ApplicationMaster。
  2. 注册 ApplicationMaster: ApplicationMaster 向 ResourceManager 注册自己,并报告其地址,以供客户端查询。
  3. 获取任务: ApplicationMaster 请求 ResourceManager 分配资源来运行任务。
4. 资源请求与分配
  1. 请求资源: ApplicationMaster 向 ResourceManager 请求 Container 以运行 Map 或 Reduce 任务。
  2. 分配资源: ResourceManager 根据策略分配 Container,并通知 NodeManager。
  3. 启动任务: ApplicationMaster 收到 Container 分配后,在相应的 NodeManager 上启动任务。
5. 执行任务
  1. 任务初始化: NodeManager 根据 ApplicationMaster 的指令设置环境,并启动 Container。
  2. 运行任务: Container 中的任务开始执行 Map 或 Reduce 逻辑。
  3. 进度报告: Task 向 ApplicationMaster 报告进度。
6. 监控与完成
  1. 监控任务: ApplicationMaster 监控任务的进度,并处理失败的任务。
  2. 任务完成: 当所有任务完成后,ApplicationMaster 向 ResourceManager 注销,并释放所有资源。
  3. 清理: ResourceManager 清理与作业相关的资源。
7. 查询结果
  1. 查询状态: 客户端可以通过 ResourceManager 或者 ApplicationMaster 查询作业的状态。
  2. 获取输出: 完成后,可以从 HDFS 或其他存储系统获取作业输出。
8. 日志与调试
  1. 收集日志: YARN 会收集每个 Container 的日志,可用于调试。
  2. 查看 ApplicationMaster 日志: ResourceManager 提供了一个界面来查看 ApplicationMaster 的日志。

Job.javawaitForCompletion 方法

public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException {
  if (state == JobState.DEFINE) {
    submit();
  }
  if (verbose) {
    monitorAndPrintJob();
  } else {
    int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf());
    while (!isComplete()) {
      Thread.sleep(completionPollIntervalMillis);
    }
  }
  return isSuccessful();
}

public void submit() throws IOException, InterruptedException, ClassNotFoundException {
  ensureState(JobState.DEFINE);
  setUseNewAPI();
  connect();
  final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
  status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
    public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException {
      return submitter.submitJobInternal(Job.this, cluster);
    }
  });
  state = JobState.RUNNING;
  LOG.info("The url to track the job: " + getTrackingURL());
}

submitJobInternal 方法

JobStatus submitJobInternal(Job job, Cluster cluster) throws ClassNotFoundException, InterruptedException, IOException {
  ... ...
  status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());
  ... ...
}
创建提交环境

YARNRunner.javasubmitJob 方法

public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts) throws IOException, InterruptedException {
  addHistoryToken(ts);
  ApplicationSubmissionContext appContext = createApplicationSubmissionContext(conf, jobSubmitDir, ts);
  ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);
  ApplicationReport appMaster = resMgrDelegate.getApplicationReport(applicationId);
  String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics());
  if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
    throw new IOException("Failed to run job : " + diagnostics);
  }
  return clientCache.getClient(jobId).getJobStatus(jobId);
}

public ApplicationSubmissionContext createApplicationSubmissionContext(Configuration jobConf, String jobSubmitDir, Credentials ts) throws IOException {
  ApplicationId applicationId = resMgrDelegate.getApplicationId();
  Map<String, LocalResource> localResources = setupLocalResources(jobConf, jobSubmitDir);
  ByteBuffer securityTokens = setupSecurityTokens(jobConf, ts);
  List<String> vargs = setupAMCommand(jobConf);
  ContainerLaunchContext amContainer = setupContainerLaunchContextForAM(jobConf, localResources, securityTokens, vargs);
  ... ...

  return appContext;
}

setupAMCommand 方法

private List<String> setupAMCommand(Configuration jobConf) {
  List<String> vargs = new ArrayList<>(8);
  vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");
  Path amTmpDir = new Path(MRApps.crossPlatformifyMREnv(conf, Environment.PWD), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
  vargs.add("-Djava.io.tmpdir=" + amTmpDir);
  MRApps.addLog4jSystemProperties(null, vargs, conf);
  ... ...

  String mrAppMasterAdminOptions = jobConf.get(MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
  vargs.add(mrAppMasterAdminOptions);
  String mrAppMasterUserOptions = jobConf.get(MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
  vargs.add(mrAppMasterUserOptions);
  ... ...

  vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
  vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT);
  vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);
  return vargs;
}
向YARN提交

YARNRunner.javasubmitApplication 方法

ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);

public ApplicationId submitApplication(ApplicationSubmissionContext appContext) throws YarnException, IOException {
  ApplicationId applicationId = appContext.getApplicationId();
  if (applicationId == null) {
    throw new ApplicationIdNotProvidedException("ApplicationId is not provided in ApplicationSubmissionContext");
  }
  SubmitApplicationRequest request = Records.newRecord(SubmitApplicationRequest.class);
  request.setApplicationSubmissionContext(appContext);
  rmClient.submitApplication(request);

  int pollCount = 0;
  long startTime = System.currentTimeMillis();
  EnumSet<YarnApplicationState> waitingStates = EnumSet.of(YarnApplicationState.NEW, YarnApplicationState.NEW_SAVING, YarnApplicationState.SUBMITTED);
  EnumSet<YarnApplicationState> failToSubmitStates = EnumSet.of(YarnApplicationState.FAILED, YarnApplicationState.KILLED);
  while (true) {
    try {
      ApplicationReport appReport = getApplicationReport(applicationId);
      YarnApplicationState state = appReport.getYarnApplicationState();
      ... ...
    } catch (ApplicationNotFoundException ex) {
      LOG.info("Re-submit application " + applicationId + "with the same ApplicationSubmissionContext");
      rmClient.submitApplication(request);
    }
  }

  return applicationId;
}

ClientRMService.javasubmitApplication 方法

public SubmitApplicationResponse submitApplication(SubmitApplicationRequest request) throws YarnException, IOException {
  ApplicationSubmissionContext submissionContext = request.getApplicationSubmissionContext();
  ApplicationId applicationId = submissionContext.getApplicationId();
  CallerContext callerContext = CallerContext.getCurrent();

  ... ...

  try {
    rmAppManager.submitApplication(submissionContext, System.currentTimeMillis(), user);
    LOG.info("Application with id " + applicationId.getId() + " submitted by user " + user);
    RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST, "ClientRMService", applicationId, callerContext, submissionContext.getQueue());
  } catch (YarnException e) {
    LOG.info("Exception in submitting " + applicationId, e);
    RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, e.getMessage(), "ClientRMService", "Exception in submitting application", applicationId, callerContext, submissionContext.getQueue());
    throw e;
  }

  return recordFactory.newRecordInstance(SubmitApplicationResponse.class);
}

ResourceManager 启动 MRAppMaster

添加依赖

首先,在项目的 pom.xml 文件中添加以下依赖项:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-app</artifactId>
    <version>3.1.3</version>
</dependency>
主方法 (main)

接下来,在 MRAppMaster 类中找到 main 方法:

public static void main(String[] args) {
  try {
    ContainerId containerId = ContainerId.fromString(args[0]);
    ApplicationAttemptId applicationAttemptId = containerId.getApplicationAttemptId();
    if (applicationAttemptId != null) {
      CallerContext.setCurrent(new CallerContext.Builder(
          "mr_appmaster_" + applicationAttemptId.toString()).build());
    }
    long appSubmitTime = Long.parseLong(args[1]);

    MRAppMaster appMaster = new MRAppMaster(
        applicationAttemptId, containerId, args[2],
        Integer.parseInt(args[3]),
        Integer.parseInt(args[4]), appSubmitTime);

    initAndStartAppMaster(appMaster, conf, jobUserName);
  } catch (Throwable t) {
    LOG.error("Error starting MRAppMaster", t);
    ExitUtil.terminate(1, t);
  }
}
初始化和启动 AppMaster
protected static void initAndStartAppMaster(final MRAppMaster appMaster,
    final JobConf conf, String jobUserName) throws IOException,
    InterruptedException {
  ... ...
  conf.getCredentials().addAll(credentials);
  appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() {
    @Override
    public Object run() throws Exception {
      appMaster.init(conf);
      appMaster.start();
      if (appMaster.errorHappenedShutDown) {
        throw new IOException("Was asked to shut down.");
      }
      return null;
    }
  });
}
初始化方法 (init)
public void init(Configuration conf) {
  ... ...
  synchronized (stateChangeLock) {
    if (enterState(STATE.INITED) != STATE.INITED) {
      setConfig(conf);
      try {
        serviceInit(config);
        if (isInState(STATE.INITED)) {
          notifyListeners();
        }
      } catch (Exception e) {
        noteFailure(e);
        ServiceOperations.stopQuietly(LOG, this);
        throw ServiceStateException.convert(e);
      }
    }
  }
}
服务初始化 (serviceInit)
protected void serviceInit(final Configuration conf) throws Exception {
  ... ...
  clientService = createClientService(context);
  clientService.init(conf);
  containerAllocator = createContainerAllocator(clientService, context);
  ... ...
}
启动方法 (start)
public void start() {
 if (isInState(STATE.STARTED)) {
    return;
  }
  synchronized (stateChangeLock) {
    if (stateModel.enterState(STATE.STARTED) != STATE.STARTED) {
      try {
        startTime = System.currentTimeMillis();
        serviceStart();
        if (isInState(STATE.STARTED)) {
          LOG.debug("Service {} is started", getName());
          notifyListeners();
        }
      } catch (Exception e) {
        noteFailure(e);
        ServiceOperations.stopQuietly(LOG, this);
        throw ServiceStateException.convert(e);
      }
    }
  }
}
服务启动 (serviceStart)
protected void serviceStart() throws Exception {
  ... ...
  if (initFailed) {
    JobEvent initFailedEvent = new JobEvent(job.getID(), JobEventType.JOB_INIT_FAILED);
    jobEventDispatcher.handle(initFailedEvent);
  } else {
    startJobs();
  }
}
启动任务 (startJobs)
1protected void startJobs() {
2  JobEvent startJobEvent = new JobStartEvent(job.getID(),
3      recoveredJobStartTime);
4  dispatcher.getEventHandler().handle(startJobEvent);
5}
处理事件 (handle)
class GenericEventHandler implements EventHandler<Event> {
  public void handle(Event event) {
    ... ...
    try {
      eventQueue.put(event);
    } catch (InterruptedException e) {
      ... ...
    }
  };
}

任务执行流程 (YarnChild)

启动 MapTask

YarnChild 类中找到 main 方法:

public static void main(String[] args) throws Throwable {
  Thread.setDefaultUncaughtExceptionHandler(new YarnUncaughtExceptionHandler());
  LOG.debug("Child starting");

  ... ...

  task = myTask.getTask();
  YarnChild.taskid = task.getTaskID();
  ... ...

  final Task taskFinal = task;
  childUGI.doAs(new PrivilegedExceptionAction<Object>() {
    @Override
    public Object run() throws Exception {
      setEncryptedSpillKeyIfRequired(taskFinal);
      FileSystem.get(job).setWorkingDirectory(job.getWorkingDirectory());
      taskFinal.run(job, umbilical); // 执行任务 (MapTask 或 ReduceTask)
      return null;
    }
  });

  ... ...
}
MapTask 执行逻辑
public void run(final JobConf job, final TaskUmbilicalProtocol umbilical)
  throws IOException, ClassNotFoundException, InterruptedException {
  this.umbilical = umbilical;

  if (isMapTask()) {
    if (conf.getNumReduceTasks() == 0) {
      mapPhase = getProgress().addPhase("map", 1.0f);
    } else {
      mapPhase = getProgress().addPhase("map", 0.667f);
      sortPhase = getProgress().addPhase("sort", 0.333f);
    }
  }

  if (useNewApi) {
    runNewMapper(job, splitMetaInfo, umbilical, reporter);
  } else {
    runOldMapper(job, splitMetaInfo, umbilical, reporter);
  }
  done(umbilical, reporter);
}

void runNewMapper(final JobConf job,
                  final TaskSplitIndex splitIndex,
                  final TaskUmbilicalProtocol umbilical,
                  TaskReporter reporter
                  ) throws IOException, ClassNotFoundException,
                           InterruptedException {
  ... ...

  try {
    input.initialize(split, mapperContext);
    mapper.run(mapperContext);
    mapPhase.complete();
    setPhase(TaskStatus.Phase.SORT);
    statusUpdate(umbilical);
    input.close();
    output.close(mapperContext);
  } finally {
    closeQuietly(input);
    closeQuietly(output, mapperContext);
  }
}
Mapper 执行逻辑
public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  try {
    while (context.nextKeyValue()) {
      map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
  } finally {
    cleanup(context);
  }
}
启动 ReduceTask

YarnChild 类中查找 run 方法实现类 ReduceTask.java:

public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
  throws IOException, InterruptedException, ClassNotFoundException {
  job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());

  ... ...

  if (useNewApi) {
    runNewReducer(job, umbilical, reporter, rIter, comparator, 
                  keyClass, valueClass);
  } else {
    runOldReducer(job, umbilical, reporter, rIter, comparator, 
                  keyClass, valueClass);
  }

  shuffleConsumerPlugin.close();
  done(umbilical, reporter);
}

void runNewReducer(JobConf job,
                   final TaskUmbilicalProtocol umbilical,
                   final TaskReporter reporter,
                   RawKeyValueIterator rIter,
                   RawComparator<INKEY> comparator,
                   Class<INKEY> keyClass,
                   Class<INVALUE> valueClass
                   ) throws IOException,InterruptedException, 
                            ClassNotFoundException {
  ... ...
  try {
    reducer.run(reducerContext);
  } finally {
    trackedRW.close(reducerContext);
  }
}
Reducer 执行逻辑
public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  try {
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
      Iterator<VALUEIN> iter = context.getValues().iterator();
      if (iter instanceof ReduceContext.ValueIterator) {
        ((ReduceContext.ValueIterator<VALUEIN>) iter).resetBackupStore();
      }
    }
  } finally {
    cleanup(context);
  }
}
  • 13
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值