YARN DistributedShell源码分析与修改

http://www.cnblogs.com/BYRans/p/5118891.html


YARN版本:2.6.0


转载请注明出处:http://www.cnblogs.com/BYRans/

1 概述

Hadoop YARN项目自带一个非常简单的应用程序编程实例--DistributedShell。DistributedShell是一个构建在YARN之上的non-MapReduce应用示例。它的主要功能是在Hadoop集群中的多个节点,并行执行用户提供的shell命令或shell脚本(将用户提交的一串shell命令或者一个shell脚本,由ApplicationMaster控制,分配到不同的container中执行)。

2 YARN DistributedShell不能满足当前需求

2.1 功能需求

我所参与的项目通过融合Hive、MapReduce、Spark、Kafka等大数据开源组件,搭建了一个数据分析平台。
平台需要新增一个功能:

  • 在集群中选取一个节点,执行用户提交的jar包。
  • 该功能需要与平台已有的基于Hive、MR、Spark实现的业务以及YARN相融合。
  • 简而言之,经分析与调研,我们需要基于YARN的DistributedShell实现该功能。

该功能需要实现:

  • 单机执行用户自己提交的jar包
  • 用户提交的jar包会有其他jar包的依赖
  • 用户提交的jar包只能选取一个节点运行
  • 用户提交的jar包需要有缓存数据的目录

2.2 YARN DistributedShell对需求的支持情况

YARN的DistributedShell功能为:

  • 支持执行用户提供的shell命令或脚本
  • 执行节点数可以通过参数num_containers设置,默认值为1
  • 不支持jar包的执行
  • 更不支持依赖包的提交
  • 不支持jar包缓存目录的设置

2.3 需要对YARN DistributedShell进行的修改

  • 增加支持执行jar包功能
  • 增加支持缓存目录设置功能
  • 删除执行节点数设置功能,不允许用户设置执行节点数,将执行节点数保证值为1

3 YARN DistributedShell源码获取

YARN DistributedShell源码可以在GitHub上apache/hadoop获取,hadoop repository中DistributedShell的源代码路径为:
hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/
这里修改的是2.6.0版本源码。

4 YARN DistributedShell源码分析及修改

YARN DistributedShell包含4个java Class:

<code class="sourceCode xml hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);">DistributedShell
    ├── Client.java
    ├── ApplicationMaster.java
    ├── DSConstants.java
    ├── Log4jPropertyHelper.java</code>
  • Client:客户端提交application
  • ApplicationMaster:注册AM,申请分配container,启动container
  • DSConstants:Client类和ApplicationMaster类中的常量定义
  • Log4jPropertyHelper:加载Log4j配置

4.1 Client类

4.1.1 Client源码逻辑

Client类是DistributedShell应用提交到YARN的客户端。Client将启动application master,然后application master启动多个containers用于运行shell命令或脚本。Client运行逻辑为:

  1. 使用ApplicationClientProtocol协议连接ResourceManager(也叫ApplicationsMaster或ASM),获取一个新的ApplicationId。(ApplicationClientProtocol提供给Client一个获取集群信息的方式)
  2. 在一个job提交过程中,Client首先创建一个ApplicationSubmissionContext。ApplicationSubmissionContext定义了application的详细信息,例如:ApplicationId、application name、application分配的优先级、application分配的队列。另外,ApplicationSubmissionContext还定义了一个Container,该Container用于启动ApplicationMaster。
  3. 在ContainerLaunchContext中需要初始化启动ApplicationMaster的资源:
    • 运行ApplicationMaster的container的资源
    • jars(例:AppMaster.jar)、配置文件(例:log4j.properties)
    • 运行环境(例:hadoop特定的类路径、java classpath)
    • 启动ApplicationMaster的命令
  4. Client使用ApplicationSubmissionContext提交application到ResourceManager,并通过按周期向ResourceManager请求ApplicationReport,完成对applicatoin的监控。
  5. 如果application运行时间超过timeout的限制(默认为600000毫秒,可通过-timeout进行设置),client将发送KillApplicationRequest到ResourceManager,将application杀死。

具体代码如下(基于YARN2.6.0):

  • Cilent的入口main方法:
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="kw"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span></span><span class="hljs-function"> </span><span class="dt"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span></span><span class="hljs-function"> </span><span class="dt"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">void</span></span></span><span class="hljs-function"> </span><span class="fu"><span class="hljs-function"><span class="hljs-title" style="color: rgb(163, 21, 21);">main</span></span></span><span class="hljs-function"><span class="hljs-params" style="color: rgb(102, 0, 102);">(String[] args)</span> </span>{
        <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">boolean</span></span> result = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">false</span></span>;
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">try</span></span> {
            DshellClient client = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">DshellClient</span>();
            LOG.<span class="fu">info</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Initializing Client"</span></span>);
            <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">try</span></span> {
                <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">boolean</span></span> doRun = client.<span class="fu">init</span>(args);
                <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (!doRun) {
                    System.<span class="fu">exit</span>(<span class="dv">0</span>);
                }
            } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">catch</span></span> (IllegalArgumentException e) {
                System.<span class="fu">err</span>.<span class="fu">println</span>(e.<span class="fu">getLocalizedMessage</span>());
                client.<span class="fu">printUsage</span>();
                System.<span class="fu">exit</span>(-<span class="dv">1</span>);
            }
            result = client.<span class="fu">run</span>();
        } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">catch</span></span> (Throwable t) {
            LOG.<span class="fu">fatal</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Error running Client"</span></span>, t);
            System.<span class="fu">exit</span>(<span class="dv">1</span>);
        }
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (result) {
            LOG.<span class="fu">info</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Application completed successfully"</span></span>);
            System.<span class="fu">exit</span>(<span class="dv">0</span>);
        }
        LOG.<span class="fu">error</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Application failed to complete successfully"</span></span>);
        System.<span class="fu">exit</span>(<span class="dv">2</span>);
    }</code>

main方法:

  • 输入参数为用户CLI的执行命令,例如:hadoop jar hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar hadoop-yarn-applications-distributedshell-2.0.5-alpha.jar -shell_command '/bin/date' -num_containers 10,该命令提交的任务为:启动10个container,每个都执行date命令。
  • main方法将运行init方法,如果init方法返回true则运行run方法。
  • init方法解析用户提交的命令,解析用户命令中的参数值。
  • run方法将完成Client源码逻辑中描述的功能。

4.1.2 对Client源码的修改

在原有YARN DistributedShell的基础上做的修改如下:

  • 在CLI为用户增加了container_filescontainer_archives两个参数
    • container_files指定用户要执行的jar包的依赖包,多个依赖包以逗号分隔
    • container_archives指定用户执行的jar包的缓存目录,多个目录以逗号分隔
  • 删除num_containers参数
    • 不允许用户设置container的个数,使用默认值1

对Client源码修改如下:

  • 变量
    • 增加变量用于保存container_filescontainer_archives两个参数的值
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 增加两个变量,保存container_files、container_archives的参数值↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">private</span></span> String[] containerJarPaths = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> String[<span class="dv">0</span>];
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">private</span></span> String[] containerArchivePaths = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> String[<span class="dv">0</span>];
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>
  • Client构造方法
    • 删除num_containers参数的初试化,增加container_filescontainer_archives两个参数

      <code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 删除num_containers项,不允许用户设置containers个数,containers个数默认为1 ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
      <span class="co"><span class="hljs-comment" style="color: green;">//opts.addOption("num_containers", true, "No. of containers on which the shell command needs to be executed");</span></span>
      <span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span>
      <span class="co"><span class="hljs-comment" style="color: green;">// 添加container_files、container_archives的描述↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
      <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">opts</span>.<span class="fu">addOption</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"container_files"</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>,<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"The files that containers will run .  Separated by comma"</span></span>);
      <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">opts</span>.<span class="fu">addOption</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"container_archives"</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>,<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"The archives that containers will unzip.  Separated by comma"</span></span>);
      <span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>
  • init方法
    • 增加container_filescontainer_archives两个参数的解析
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 初始化选项container_files、container_archives↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">opts</span>.<span class="fu">addOption</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"container_files"</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>,<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"The files that containers will run .  Separated by comma"</span></span>);
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">opts</span>.<span class="fu">addOption</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"container_archives"</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>,<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"The archives that containers will unzip.  Separated by comma"</span></span>);
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>
  • run方法
    • 上传container_filescontainer_archives两个参数指定的依赖包和缓存目录至HDFS
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"> <span class="co"><span class="hljs-comment" style="color: green;">// 上传container_files指定的jar包到HDFS ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerJarPaths</span>.<span class="fu">length</span> != <span class="dv">0</span>)
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (<span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">int</span></span> i = <span class="dv">0</span>; i < <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerJarPaths</span>.<span class="fu">length</span>; i++) {
        String hdfsJarLocation = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">""</span></span>;
        String[] jarNameSplit = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerJarPaths</span>[i].<span class="fu">split</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span>);
        String jarName = jarNameSplit[(jarNameSplit.<span class="fu">length</span> - <span class="dv">1</span>)];

        <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">long</span></span> hdfsJarLen = <span class="hljs-number" style="color: rgb(0, 102, 102);">0L</span>;
        <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">long</span></span> hdfsJarTimestamp = <span class="hljs-number" style="color: rgb(0, 102, 102);">0L</span>;
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (!<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerJarPaths</span>[i].<span class="fu">isEmpty</span>()) {
            Path jarSrc = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">Path</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerJarPaths</span>[i]);
            String jarPathSuffix = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">appName</span> + <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span> + appId.<span class="fu">toString</span>() +
                    <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span> + jarName;
            Path jarDst = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">Path</span>(fs.<span class="fu">getHomeDirectory</span>(), jarPathSuffix);
            fs.<span class="fu">copyFromLocalFile</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">false</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>, jarSrc, jarDst);
            hdfsJarLocation = jarDst.<span class="fu">toUri</span>().<span class="fu">toString</span>();
            FileStatus jarFileStatus = fs.<span class="fu">getFileStatus</span>(jarDst);
            hdfsJarLen = jarFileStatus.<span class="fu">getLen</span>();
            hdfsJarTimestamp = jarFileStatus.<span class="fu">getModificationTime</span>();
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARLOCATION</span> + i,
                    hdfsJarLocation);
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARTIMESTAMP</span> + i,
                    Long.<span class="fu">toString</span>(hdfsJarTimestamp));
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARLEN</span> + i,
                    Long.<span class="fu">toString</span>(hdfsJarLen));
        }
    }
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span>
<span class="co"><span class="hljs-comment" style="color: green;">// 上传container_archives到HDFS↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">long</span></span> hdfsArchiveLen;
String archivePathSuffix;
Path archiveDst;
FileStatus archiveFileStatus;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerArchivePaths</span>.<span class="fu">length</span> != <span class="dv">0</span>) {
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (<span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">int</span></span> i = <span class="dv">0</span>; i < <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerArchivePaths</span>.<span class="fu">length</span>; i++) {
        String hdfsArchiveLocation = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">""</span></span>;
        String[] archiveNameSplit = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerArchivePaths</span>[i].<span class="fu">split</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span>);
        String archiveName = archiveNameSplit[(archiveNameSplit.<span class="fu">length</span> - <span class="dv">1</span>)];
        hdfsArchiveLen = <span class="hljs-number" style="color: rgb(0, 102, 102);">0L</span>;
        <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">long</span></span> hdfsArchiveTimestamp = <span class="hljs-number" style="color: rgb(0, 102, 102);">0L</span>;
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (!<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerArchivePaths</span>[i].<span class="fu">isEmpty</span>()) {
            Path archiveSrc = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">Path</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">containerArchivePaths</span>[i]);
            archivePathSuffix = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">appName</span> + <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span> + appId.<span class="fu">toString</span>() +
                    <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span> + archiveName;
            archiveDst = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">Path</span>(fs.<span class="fu">getHomeDirectory</span>(),
                    archivePathSuffix);
            fs.<span class="fu">copyFromLocalFile</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">false</span></span>, <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">true</span></span>, archiveSrc, archiveDst);
            hdfsArchiveLocation = archiveDst.<span class="fu">toUri</span>().<span class="fu">toString</span>();
            archiveFileStatus = fs.<span class="fu">getFileStatus</span>(archiveDst);
            hdfsArchiveLen = archiveFileStatus.<span class="fu">getLen</span>();
            hdfsArchiveTimestamp = archiveFileStatus
                    .<span class="fu">getModificationTime</span>();
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVELOCATION</span> + i,
                    hdfsArchiveLocation);
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVETIMESTAMP</span> + i,
                    Long.<span class="fu">toString</span>(hdfsArchiveTimestamp));
            env.<span class="fu">put</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVELEN</span> + i,
                    Long.<span class="fu">toString</span>(hdfsArchiveLen));
        }
    }
}
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>

4.2 ApplicationMaster类

4.2.1 ApplicationMaster源码逻辑

一个ApplicationMaster将在启动一个或过个container,在container上执行shell命令或脚本。ApplicationMaster运行逻辑为:

  1. ResourceManager启动一个container用于运行ApplicationMaster。
  2. ApplicationMaster连接ResourceManager,向ResourceManager注册自己。
    • 向ResourceManager注册的信息有:
      • ApplicationMaster的ip:port
      • ApplicationMaster所在主机的hostname
      • ApplicationMaster的tracking url。客户端可以用tracking url来跟踪任务的状态和历史记录。
    • 需要注意的是:在DistributedShell中,不需要初注册tracking url和 appMasterHost:appMasterRpcPort,只需要设置hostname。
  3. ApplicationMaster会按照设定的时间间隔向ResourceManager发送心跳。ResourceManager的ApplicationMasterService每次收到ApplicationMaster的心跳信息后,会同时在AMLivelinessMonitor更新其最近一次发送心跳的时间。
  4. ApplicationMaster通过ContainerRequest方法向ResourceManager发送请求,申请相应数目的container。在发送申请container请求前,需要初始化Request,需要初始化的参数有:
    • Priority:请求的优先级
    • capability:当前支持CPU和Memory
    • nodes:申请的container所在的host(如果不需要指定,则设为null)
    • racks:申请的container所在的rack(如果不需要指定,则设为null)
  5. ResourceManager返回ApplicationMaster的申请的containers信息,根据container的状态-containerStatus,更新已申请成功和还未申请的container数目。
  6. 申请成功的container,ApplicationMaster则通过ContainerLaunchContext初始化container的启动信息。初始化container后启动container。需要初始化的信息有:
    • Container id
    • 执行资源(Shell脚本或命令、处理的数据)
    • 运行环境
    • 运行命令
  7. container运行期间,ApplicationMaster对container进行监控。
  8. job运行结束,ApplicationMaster发送FinishApplicationMasterRequest请求给ResourceManager,完成ApplicationMaster的注销。

具体代码如下(基于YARN2.6.0):

  • ApplicationMaster的入口main方法:
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="kw"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span></span><span class="hljs-function"> </span><span class="dt"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span></span><span class="hljs-function"> </span><span class="dt"><span class="hljs-function"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">void</span></span></span><span class="hljs-function"> </span><span class="fu"><span class="hljs-function"><span class="hljs-title" style="color: rgb(163, 21, 21);">main</span></span></span><span class="hljs-function"><span class="hljs-params" style="color: rgb(102, 0, 102);">(String[] args)</span> </span>{
       <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">boolean</span></span> result = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">false</span></span>;
       <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">try</span></span> {
           DshellApplicationMaster appMaster = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">DshellApplicationMaster</span>();
           LOG.<span class="fu">info</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Initializing ApplicationMaster"</span></span>);
           <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">boolean</span></span> doRun = appMaster.<span class="fu">init</span>(args);
           <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (!doRun) {
               System.<span class="fu">exit</span>(<span class="dv">0</span>);
           }
           appMaster.<span class="fu">run</span>();
           result = appMaster.<span class="fu">finish</span>();
       } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">catch</span></span> (Throwable t) {
           LOG.<span class="fu">fatal</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Error running ApplicationMaster"</span></span>, t);
           LogManager.<span class="fu">shutdown</span>();
           ExitUtil.<span class="fu">terminate</span>(<span class="dv">1</span>, t);
       }
       <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (result) {
           LOG.<span class="fu">info</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Application Master completed successfully. exiting"</span></span>);
           System.<span class="fu">exit</span>(<span class="dv">0</span>);
       } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">else</span></span> {
           LOG.<span class="fu">info</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Application Master failed. exiting"</span></span>);
           System.<span class="fu">exit</span>(<span class="dv">2</span>);
       }
   }</code>

main方法:

  • 输入参数为Client提交的执行命令。
  • init方法完成对执行命令的解析,获取执行命令中参数指定的值。
  • run方法完成ApplicationMaster的启动、注册、containers的申请、分配、监控等功能的启动。
    • run方法中建立了与ResourceManager通信的Handle-AMRMClientAsync,其中的CallbackHandler是由RMCallbackHandler类实现的。
      • RMCallbackHandler类中实现了containers的申请、分配等方法。
      • containers的分配方法onContainersAllocated中通过LaunchContainerRunnable类中run方法完成container的启动。
  • finish方法完成container的停止、ApplicationMaster的注销。

4.2.2 对ApplicationMaster源码的修改

在原有YARN DistributedShell的基础上做的修改如下:

  • 在ApplicationMaster初试化时,增加对container_filescontainer_archives两个参数指定值的支持。即:初始化container_filescontainer_archives指定的运行资源在HDFS上的信息。
  • 在container运行时,从HDFS上加载container_filescontainer_archives指定的资源。

对ApplicationMaster源码修改如下:

  • 变量
    • 增加变量,用于保存container_filescontainer_archives指定的运行资源在HDFS上的信息。
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 增加container_files、container_archives选项值变量 ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">private</span></span> ArrayList<DshellFile> scistorJars = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> ArrayList();
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">private</span></span> ArrayList<DshellArchive> scistorArchives = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> ArrayList();
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>
  • ApplicationMaster的init方法
    • 初始化container_filescontainer_archives两个参数指定值信息。
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 遍历envs,把所有的jars、archivers的HDFS路径,时间戳,LEN全部保存到jarPaths对象数组中 ↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (String key : envs.<span class="fu">keySet</span>()) {
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (key.<span class="fu">contains</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARLOCATION</span>)) {
        DshellFile scistorJar = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">DshellFile</span>();
        scistorJar.<span class="fu">setJarPath</span>((String) envs.<span class="fu">get</span>(key));
        String num = key
                .<span class="fu">split</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARLOCATION</span>)[<span class="dv">1</span>];
        scistorJar.<span class="fu">setTimestamp</span>(Long.<span class="fu">valueOf</span>(Long.<span class="fu">parseLong</span>(
                (String) envs
                        .<span class="fu">get</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARTIMESTAMP</span> + num))));
        scistorJar.<span class="fu">setSize</span>(Long.<span class="fu">valueOf</span>(Long.<span class="fu">parseLong</span>(
                (String) envs
                        .<span class="fu">get</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDJARLEN</span> + num))));
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">scistorJars</span>.<span class="fu">add</span>(scistorJar);
    }
}

<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (String key : envs.<span class="fu">keySet</span>()) {
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">if</span></span> (key.<span class="fu">contains</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVELOCATION</span>)) {
        DshellArchive scistorArchive = <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> <span class="fu">DshellArchive</span>();
        scistorArchive.<span class="fu">setArchivePath</span>((String) envs.<span class="fu">get</span>(key));
        String num = key
                .<span class="fu">split</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVELOCATION</span>)[<span class="dv">1</span>];
        scistorArchive.<span class="fu">setTimestamp</span>(Long.<span class="fu">valueOf</span>(Long.<span class="fu">parseLong</span>(
                (String) envs
                        .<span class="fu">get</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVETIMESTAMP</span> +
                                num))));
        scistorArchive.<span class="fu">setSize</span>(Long.<span class="fu">valueOf</span>(Long.<span class="fu">parseLong</span>(
                (String) envs
                        .<span class="fu">get</span>(DshellDSConstants.<span class="fu">DISTRIBUTEDARCHIVELEN</span> + num))));
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">scistorArchives</span>.<span class="fu">add</span>(scistorArchive);
    }
}
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>
  • LaunchContainerRunnable的run方法(container线程的run方法)
    • 从HDFS上加载container_filescontainer_archives指定的资源。
<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 把HDFS中的jar、archive加载到container的LocalResources,也就是从HDFS分发到container节点的过程 ↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (DshellFile perJar : DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">scistorJars</span>) {
    LocalResource jarRsrc = (LocalResource) Records.<span class="fu">newRecord</span>(LocalResource.<span class="fu">class</span>);
    jarRsrc.<span class="fu">setType</span>(LocalResourceType.<span class="fu">FILE</span>);
    jarRsrc.<span class="fu">setVisibility</span>(LocalResourceVisibility.<span class="fu">APPLICATION</span>);
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">try</span></span> {
        jarRsrc.<span class="fu">setResource</span>(
                ConverterUtils.<span class="fu">getYarnUrlFromURI</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> URI(perJar.<span class="fu">getJarPath</span>()
                        .<span class="fu">toString</span>())));
    } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">catch</span></span> (URISyntaxException e1) {
        DshellApplicationMaster.<span class="fu">LOG</span>.<span class="fu">error</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Error when trying to use JAR path specified in env, path="</span></span> +
                perJar.<span class="fu">getJarPath</span>(), e1);
        DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">numCompletedContainers</span>.<span class="fu">incrementAndGet</span>();
        DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">numFailedContainers</span>.<span class="fu">incrementAndGet</span>();
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">return</span></span>;
    }
    jarRsrc.<span class="fu">setTimestamp</span>(perJar.<span class="fu">getTimestamp</span>().<span class="fu">longValue</span>());
    jarRsrc.<span class="fu">setSize</span>(perJar.<span class="fu">getSize</span>().<span class="fu">longValue</span>());
    String[] tmp = perJar.<span class="fu">getJarPath</span>().<span class="fu">split</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span>);
    localResources.<span class="fu">put</span>(tmp[(tmp.<span class="fu">length</span> - <span class="dv">1</span>)], jarRsrc);
}
String[] tmp;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">for</span></span> (DshellArchive perArchive : DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">scistorArchives</span>) {
    LocalResource archiveRsrc =
            (LocalResource) Records.<span class="fu">newRecord</span>(LocalResource.<span class="fu">class</span>);
    archiveRsrc.<span class="fu">setType</span>(LocalResourceType.<span class="fu">ARCHIVE</span>);
    archiveRsrc.<span class="fu">setVisibility</span>(LocalResourceVisibility.<span class="fu">APPLICATION</span>);
    <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">try</span></span> {
        archiveRsrc.<span class="fu">setResource</span>(
                ConverterUtils.<span class="fu">getYarnUrlFromURI</span>(<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">new</span></span> URI(perArchive
                        .<span class="fu">getArchivePath</span>().<span class="fu">toString</span>())));
    } <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">catch</span></span> (URISyntaxException e1) {
        DshellApplicationMaster.<span class="fu">LOG</span>.<span class="fu">error</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"Error when trying to use ARCHIVE path specified in env, path="</span></span> +
                        perArchive.<span class="fu">getArchivePath</span>(),
                e1);
        DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">numCompletedContainers</span>.<span class="fu">incrementAndGet</span>();
        DshellApplicationMaster.<span class="fu"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">this</span></span>.<span class="fu">numFailedContainers</span>.<span class="fu">incrementAndGet</span>();
        <span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">return</span></span>;
    }
    archiveRsrc.<span class="fu">setTimestamp</span>(perArchive.<span class="fu">getTimestamp</span>().<span class="fu">longValue</span>());
    archiveRsrc.<span class="fu">setSize</span>(perArchive.<span class="fu">getSize</span>().<span class="fu">longValue</span>());
    tmp = perArchive.<span class="fu">getArchivePath</span>().<span class="fu">split</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"/"</span></span>);
    String[] tmptmp = tmp[(tmp.<span class="fu">length</span> - <span class="dv">1</span>)].<span class="fu">split</span>(<span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"[.]"</span></span>);
    localResources.<span class="fu">put</span>(tmptmp[<span class="dv">0</span>], archiveRsrc);
}
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>

4.3 DSConstants类

DSConstants类中是在Client和ApplicationMaster中的常量,对DSConstants类的修改为:增加了container_files、container_archives相关常量。修改代码如下:

<code class="sourceCode java hljs" style="display: block; padding: 5px !important; color: rgb(0, 0, 0); margin: auto; vertical-align: top; overflow-x: auto; font-family: 'Courier New', sans-serif !important; font-size: 12px !important; line-height: 1.5 !important; border: 1px solid rgb(204, 204, 204) !important; background: rgb(255, 255, 255);"><span class="co"><span class="hljs-comment" style="color: green;">// 增加container_files、container_archives相关常量 ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓</span></span>
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDJARLOCATION = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDJARLOCATION"</span></span>;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDJARTIMESTAMP = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDJARTIMESTAMP"</span></span>;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDJARLEN = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDJARLEN"</span></span>;

<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDARCHIVELOCATION = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDARCHIVELOCATION"</span></span>;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDARCHIVETIMESTAMP = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDARCHIVETIMESTAMP"</span></span>;
<span class="kw"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">public</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">static</span></span> <span class="dt"><span class="hljs-keyword" style="color: rgb(0, 0, 255);">final</span></span> String DISTRIBUTEDARCHIVELEN = <span class="st"><span class="hljs-string" style="color: rgb(163, 21, 21);">"DISTRIBUTEDARCHIVELEN"</span></span>;
<span class="co"><span class="hljs-comment" style="color: green;">// ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑</span></span></code>

4.4 Log4jPropertyHelper类

对Log4jPropertyHelper类无任何改动。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值