云计算(二十五)- Hadoop MapReduce Next Generation - Writing YARN Applications

概念和流程

客户端提交应用到Resource Manager,首先客户端需要使用ApplicationClientProtocol连接ResourceManager获取一个ApplicationId,通过ApplicationClientProtocol#getNewApplication(类名#方法),然后通过ApplicationClientProtocol#submitApplication方法,提交应用运行。作为 ApplicationClientProtocol#submitApplication方法的一部分,客户端需要提供足够的信息到ResourceManager用来‘发布’应用的第一个container作为ApplicationMaster。你需要提供的信息包括应用运行所需要的本地文件或者jar,实际的执行的命令行(和必要的命令行参数),环境设置(可选)等等。你需要描述ApplicationMaster运行的Unix进程资源。

YARN的ResourceManager将会发布ApplicationMaster(按照指定的)。然后ApplicationMaster将会采用ApplicationMasterProtocol与ResourceManager 进行通讯,首先ApplicationMaster会使用ApplicationMasterProtocol#allocate方法注册自己到ResourceManager.为了完成指定的任务。ApplicationMaster要通过ApplicationMasterProtocol#allocate来请求和接收containers。在一个container分配给他之后,ApplicationMaster使用ContainerManager#startContainer来与NodeManager 通讯发布任务到这个container。作为发布到container的部分,ApplicationMaster需要指明ContainerLaunchContext(这个类似于ApplicationSubmissionContext)并且发布的信息应包括指定的命令行,环境等。一旦任务完成ApplicationMaster会通过ApplicationMasterProtocol#finishApplicationMaster通知ResourceManager完成。

同时,客户端通过ResourceManager监控应用的状态或者是直接查询ApplicationMaster(如果支持这个服务的话),如有需要客户端也可以通过ApplicationClientProtocol#forceKillApplication杀掉应用的进程。

接口

你使用的连接的接口包括:

  • ApplicationClientProtocol - Client<-->ResourceManager
    协议用来连接客户端和ResourceManager发布一个新的应用(如ApplicationMaster),查询应用的状态或者停止应用,例如job-client(job从网关发布程序)会使用这个协议。
  • ApplicationMasterProtocol - ApplicationMaster<-->ResourceManager
    这个协议用来ApplicationMaster 注册或者注销自身到ResourceManager及请求的资源调度器来完成其任务。
  • ContainerManager - ApplicationMaster<-->NodeManager
    协议用来ApplicationMaster与NodeManager通讯用来启动/停止containers并且获取containers状态

编写一个简单的YARN应用

编写建议的客户端
  • 第一步客户端需要连接ResourceManager。 
        ApplicationClientProtocol applicationsManager; 
        YarnConfiguration yarnConf = new YarnConfiguration(conf);
        InetSocketAddress rmAddress = 
            NetUtils.createSocketAddr(yarnConf.get(
                YarnConfiguration.RM_ADDRESS,
                YarnConfiguration.DEFAULT_RM_ADDRESS));             
        LOG.info("Connecting to ResourceManager at " + rmAddress);
        configuration appsManagerServerConf = new Configuration(conf);
        appsManagerServerConf.setClass(
            YarnConfiguration.YARN_SECURITY_INFO,
            ClientRMSecurityInfo.class, SecurityInfo.class);
        applicationsManager = ((ApplicationClientProtocol) rpc.getProxy(
            ApplicationClientProtocol.class, rmAddress, appsManagerServerConf));    
  • 一旦ASM被处理了,客户端就会向ResourceManager请求一个新的ApplicationId.
        GetNewApplicationRequest request = 
            Records.newRecord(GetNewApplicationRequest.class);              
        GetNewApplicationResponse response = 
            applicationsManager.getNewApplication(request);
        LOG.info("Got new ApplicationId=" + response.getApplicationId());
  • 从ASM返回的response应该包括信息集群的最大最小的资源容量。这个是为了确定你可以正确的设置ApplicationMaster发布的container。请参考GetNewApplicationResponse获取更多的细节。
  • 客户端通过定义ApplicationSubmissionContext 中的信息并且发送到ResourceManager来发布ApplicationMaster。客户端需要设置下面的内容到context中:
    • Application信息: id, name
    • Queue优先级信息: 发布应用所使用的queue,应用所指定的优先级
    • User: 提交应用的user
    • ContainerLaunchContext: ApplicationMaster发布运行所使用的container的信息。ContainerLaunchContext如前所述定义了所有运行ApplicationMaster的信息比如本地资源文件 (binaries, jars, files etc.), 安全tokens, 环境设置 (CLASSPATH 等.)以及需要执行的命令行。
        // 创建一个新的ApplicationSubmissionContext
        ApplicationSubmissionContext appContext = 
            Records.newRecord(ApplicationSubmissionContext.class);
        // 设置ApplicationId 
        appContext.setApplicationId(appId);
        // 设置application name
        appContext.setApplicationName(appName);
        
        //创建一个新的container launch context
        ContainerLaunchContext amContainer = 
            Records.newRecord(ContainerLaunchContext.class);
    
        // 定义必要的本地资源
        Map<String, LocalResource> localResources = 
            new HashMap<String, LocalResource>();
        // 假设运行ApplicationMaster所需的jar是可用的并且已经在HDFS的一个path下,我们希望将其发布到ApplicationMaster发布的container中
        Path jarPath; // 已知的路径  
        FileStatus jarStatus = fs.getFileStatus(jarPath);
        LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
        // 设置资源类型 - file or archive
        // archives会被解压
        amJarRsrc.setType(LocalResourceType.FILE);
        // 设置资源可见性 
        // 大多数情况会设置private选项例如file只能够被运行应用的实例访问
        amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);          
        //设置资源文件位置
        amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); 
        //设置时间戳和文件长度
        // 为了进行基本的检查当被复制完成之后,检查一致性
        amJarRsrc.setTimestamp(jarStatus.getModificationTime());
        amJarRsrc.setSize(jarStatus.getLen());
        // 框架会创建一个AppMaster.jar在工作目录下的连接,会连接实际文件。ApplicationMaster如果需要jar文件可以使用这个连接名
        localResources.put("AppMaster.jar",  amJarRsrc);    
        //设置环境资源到context    
        amContainer.setLocalResources(localResources);
    
        // 构建context所需的环境
        Map<String, String> env = new HashMap<String, String>();    
        // 例如我们可以设置classpath,假设classes和jars时可用的我们需要添加“.”到path中,默认的情况,hadoop中指定的classes是可以通过$CLASSPATH,所以我们应注意不要重写这个变量
        String classPathEnv = "$CLASSPATH:./*:";    
        env.put("CLASSPATH", classPathEnv);
        amContainer.setEnvironment(env);
        
        // 构建在container发布的命令行 
        String command = 
            "${JAVA_HOME}" + /bin/java" +
            " MyAppMaster" + 
            " arg1 arg2 arg3" + 
            " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
            " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";                     
    
        List<String> commands = new ArrayList<String>();
        commands.add(command);
        // 添加命令行                
        // 设置命令行数组到container
        amContainer.setCommands(commands);
        
        // 定义container所需的资源,目前YARN只是支持内存,如果大于框架可以支配的内存,那么会被终止要求的内存要小于集群最大的内存分配量
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(amMemory);
        amContainer.setResource(capability);
        
        //设置container到ApplicationSubmissionContext
        appContext.setAMContainerSpec(amContainer);
  • 在启动流程完成之后,客户端会提交应用到ASM
        // 创建request并发送到ApplicationsManager 
        SubmitApplicationRequest appRequest = 
            Records.newRecord(SubmitApplicationRequest.class);
        appRequest.setApplicationSubmissionContext(appContext);
    
        // 提交应用到ApplicationsManager
        //忽略返回的response如果是一个response对象说明成功否则抛出一个异常说明失败 
        applicationsManager.submitApplication(appRequest);
  • 此时,ResourceManager已经接受了应用并在后台启动一个指定需要配置的container,然后发布ApplicationMaster 到container上
  • 有多种手段客户跟中实际任务的状态.
    • 可以通过ApplicationClientProtocol#getApplicationReport方法联系ResourceManager要求报告应用的状态
            GetApplicationReportRequest reportRequest = 
                Records.newRecord(GetApplicationReportRequest.class);
            reportRequest.setApplicationId(appId);
            GetApplicationReportResponse reportResponse = 
                applicationsManager.getApplicationReport(reportRequest);
            ApplicationReport report = reportResponse.getApplicationReport();

      ResourceManager得到的ApplicationReport可以包含下面的内容:

      • 应用信息: ApplicationId, 提交应用的queue,提交应用的user以及应用开始的时间.
      • ApplicationMaster信息:ApplicationMaster运行的主机,rpc端口,该端口监听客户端的请求并且用来使客户端和ApplicationMaster通讯
      • 应用跟踪信息:如果应用支持一些形式的进程跟踪并且可以设置一个跟踪使用的URL,那么可以设置url,客户端可以通过ApplicationReport#getTrackingUrl的方法监控进程
      • 应用状态:通过ResourceManager 查询应用状态的方法可以通过pplicationReport#getYarnApplicationState方法实现。如果YarnApplicationState的状态被设置成FINISHED,那么需要ApplicationReport#getFinalApplicationStatus方法查询应用任务本身的状态成功/失败.如果失败那么可以通过ApplicationReport#getDiagnostics方法获取更多的细节。 
    • 如果ApplicationMaster支持的话,客户端也可以直接查询ApplicationMaster通过host:rpcport的方式监控应用的状态,他也使用跟踪url的形式。
  • 在某些情况下,比如应用运行了太长时间或者一些其他原因客户端希望终止应用。那么客户端可以使用ApplicationClientProtocol中的forceKillApplication方法来从客户端发送终止的信号到ResourceManager然后再通知ApplicationMaster。ApplicationMaster也设计了一个通过rpc层实现这个功能的方法,客户端也可以利用 
        KillApplicationRequest killRequest = 
            Records.newRecord(KillApplicationRequest.class);                
        killRequest.setApplicationId(appId);
        applicationsManager.forceKillApplication(killRequest);      
编写ApplicationMaster
  • ApplicationMaster是实际的job拥有者,他通过客户端提供的必要信息和资源并且被ResourceManager发布出去,负责监督和完成这个job
  • 因为ApplicationMaster被发布到一个container但是这个container可能是和其他container一起使用一个物理主机,这个方法提供了多租户性但是又会有一些其他问题,使得不能假设并提前预配置监听的端口。 
  • ApplicationMaster启动后,环境中的一些参数会被使用,这些包括ApplicationMaster container的ContainerId,应用提交的时间,NodeManager主机运行应用的细节,请参照ApplicationConstants来获取这些参数的名字
  • 所有与ResourceManager 的交互需要有一个ApplicationAttemptId 。ApplicationAttemptId被ApplicationMaster containerId持有,他可以帮助API将环境中的变量值转换成对象 
  •     Map<String, String> envs = System.getenv();
        String containerIdString = 
            envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
        if (containerIdString == null) {
          // container id 应该被框架环境设置 
          throw new IllegalArgumentException(
              "ContainerId not set in the environment");
        }
        ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
        ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
  • ApplicationMaster 初始化完成之后,他需要向ResourceManager注册,调用的方法是ApplicationMasterProtocol#registerApplicationMaster。ApplicationMaster总是通过ResourceManager的Scheduler 接口与其通讯
        // 连接ResourceManager的Scheduler接口 
        YarnConfiguration yarnConf = new YarnConfiguration(conf);
        InetSocketAddress rmAddress = 
            NetUtils.createSocketAddr(yarnConf.get(
                YarnConfiguration.RM_SCHEDULER_ADDRESS,
                YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));           
        LOG.info("Connecting to ResourceManager at " + rmAddress);
        ApplicationMasterProtocol resourceManager = 
            (ApplicationMasterProtocol) rpc.getProxy(ApplicationMasterProtocol.class, rmAddress, conf);
    
        // 注册AM到RM
        // 设置必要的信息到注册请求中: 
        // ApplicationAttemptId, 
        // AM运行的主机
        // AM接受客户端请求的rpc port 
        // 客户端跟踪进程的URL
        RegisterApplicationMasterRequest appMasterRequest = 
            Records.newRecord(RegisterApplicationMasterRequest.class);
        appMasterRequest.setApplicationAttemptId(appAttemptID);     
        appMasterRequest.setHost(appMasterHostname);
        appMasterRequest.setRpcPort(appMasterRpcPort);
        appMasterRequest.setTrackingUrl(appMasterTrackingUrl);
    
        // response返回包括集群的信息,与GetNewApplicationResponse类似。 
        RegisterApplicationMasterResponse response = 
            resourceManager.registerApplicationMaster(appMasterRequest);
  • ApplicationMaster发出心跳给ResourceManager报告ApplicationMaster仍在运行。超期的时间间隔被ResourceManager定义到Yran配置文件的YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS默认情况下使用YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS作为时间上限。ApplicationMasterProtocol#allocate方法用来向ResourceManager汇报心跳,他也用来更新信息。因此如果没有容器汇报心跳并且更新信息到ResourceManager可以作为一个验证手段证明已经丢失
  • 根据任务要求,ApplicationMaster可以向ResourceRequest申请一组containers来运行任务,申请时应包括::
    • 主机名:如果溶剂被指定在特定的机架或者主机上,“*”表示任何主机都可以
    • 资源容量: 目前,YARN只支持内存请求所以请求只需要定义需要多少内存,单位是MB,需要小于集群中的最大容量和an exact multiple of the min capability。内存资源受到任务containers物理内存的限制。
    • 优先级:当请求containers,ApplicationMaster可以为每一个设定优先级,比如 Map-Reduce ApplicationMaster 可以指定一个高优先级去运行Map task,低优先级去运行Reduce tasks.
        // Resource Request
        ResourceRequest rsrcRequest = Records.newRecord(ResourceRequest.class);
    
        // setup requirements for hosts 
        // whether a particular rack/host is needed 
        // useful for applications that are sensitive
        // to data locality 
        rsrcRequest.setHostName("*");
    
        // set the priority for the request
        Priority pri = Records.newRecord(Priority.class);
        pri.setPriority(requestPriority);
        rsrcRequest.setPriority(pri);           
    
        // Set up resource type requirements
        // For now, only memory is supported so we set memory requirements
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(containerMemory);
        rsrcRequest.setCapability(capability);
    
        // set no. of containers needed
        // matching the specifications
        rsrcRequest.setNumContainers(numContainers);
  • 在定义完container请求之后,ApplicationMaster组织AllocateRequest并发送到ResourceManager,包括:
    • 请求的containers: ApplicationMaster ResourceManager申请的指定的container和container号
    • 释放的containers:可能有一些情况ApplicationMaster 申请了比他实际使用要多的资源或者是一些其他的原因(比如失败),决定使用其他容器去执行他,,那么他会释放这些containers回ResourceManager,这些可以被其他应用继续使用
    • ResponseId: 这个id会被发回请求的call.
    • Progress update information:ApplicationMaster发送进度(0到1)到ResourceManager 
        List<ResourceRequest> requestedContainers;
        List<ContainerId> releasedContainers    
        AllocateRequest req = Records.newRecord(AllocateRequest.class);
    
        // The response id set in the request will be sent back in 
        // the response so that the ApplicationMaster can 
        // match it to its original ask and act appropriately.
        req.setResponseId(rmRequestID);
        
        // Set ApplicationAttemptId 
        req.setApplicationAttemptId(appAttemptID);
        
        // Add the list of containers being asked for 
        req.addAllAsks(requestedContainers);
        
        // If the ApplicationMaster has no need for certain 
        // containers due to over-allocation or for any other
        // reason, it can release them back to the ResourceManager
        req.addAllReleases(releasedContainers);
        
        // Assuming the ApplicationMaster can track its progress
        req.setProgress(currentProgress);
        
        AllocateResponse allocateResponse = resourceManager.allocate(req);               
  • ResourceManager返回的AllocateResponse包括的信息有:
    • Reboot flag: 由于场景限制ApplicationMaster可能会和ResourceManager不同步.
    • Allocated containers:分配给ApplicationMaster的containers 
    • Headroom: 集群的空闲空间。基于这个信息ApplicationMaster可以调整任务的优先级比如使用自身的已经分配的资源完成可能更快,当无法满足足够的资源的时候.
    • Completed containers: 一旦ApplicationMaster触发一个已经分配的container,当container完成时他会从ResourceManager得到一个更新信息。ApplicationMaster 可以查询完成container的状态,并且重新采取一些适当的行动,比如重新尝试一些失败的任务
    • Number of cluster nodes: 集群中的可用的主机数目

    注意一个事情就是containers 并不是马上指定给ApplicationMaster,并不是说ApplicationMaster应该得到先前申请的containers。一旦ApplicationMaser发送申请,RM会根据集群容量,优先级以及调度策略分配资源,AM应该在原来的资源变化时或者需要添加containers时重新申请containers

        // 从response中,查询分配的容器,并且向每个容器分配相同的job
        List<Container> allocatedContainers = allocateResponse.getAllocatedContainers();
        for (Container allocatedContainer : allocatedContainers) {
          LOG.info("Launching shell command on a new container."
              + ", containerId=" + allocatedContainer.getId()
              + ", containerNode=" + allocatedContainer.getNodeId().getHost() 
              + ":" + allocatedContainer.getNodeId().getPort()
              + ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()
              + ", containerState" + allocatedContainer.getState()
              + ", containerResourceMemory"  
              + allocatedContainer.getResource().getMemory());
              
              
          // 在一个特殊的线程上发布启动container并保证主线程非阻塞
          LaunchContainerRunnable runnableLaunchContainer = 
              new LaunchContainerRunnable(allocatedContainer);
          Thread launchThread = new Thread(runnableLaunchContainer);        
          launchThreads.add(launchThread);
          launchThread.start();
        }
    
        // 查询当前集群中的可用资源
        Resource availableResources = allocateResponse.getAvailableResources();
        // 基于这些信息ApplicationMaster可以做一些决定更有效的利用资源
    
        // 查询完成的Container状态
        // Let's assume we are keeping a count of total completed containers, 
        // containers that failed and ones that completed successfully.                     
        List<ContainerStatus> completedContainers = 
            allocateResponse.getCompletedContainersStatuses();
        for (ContainerStatus containerStatus : completedContainers) {                               
          LOG.info("Got container status for containerID= " 
              + containerStatus.getContainerId()
              + ", state=" + containerStatus.getState()     
              + ", exitStatus=" + containerStatus.getExitStatus() 
              + ", diagnostics=" + containerStatus.getDiagnostics());
    
          int exitStatus = containerStatus.getExitStatus();
          if (0 != exitStatus) {
            // container failed 
            // -100 is a special case where the container 
            // was aborted/pre-empted for some reason 
            if (-100 != exitStatus) {
              // application job on container returned a non-zero exit code
              // counts as completed 
              numCompletedContainers.incrementAndGet();
              numFailedContainers.incrementAndGet();                                                        
            }
            else { 
              // something else bad happened 
              // app job did not complete for some reason 
              // we should re-try as the container was lost for some reason
              // decrementing the requested count so that we ask for an
              // additional one in the next allocate call.          
              numRequestedContainers.decrementAndGet();
              // we do not need to release the container as that has already 
              // been done by the ResourceManager/NodeManager. 
            }
            }
            else { 
              // nothing to do 
              // container completed successfully 
              numCompletedContainers.incrementAndGet();
              numSuccessfulContainers.incrementAndGet();
            }
          }
        }
  • 在一个container被分配到ApplicationMaster之后,他需要遵守一个简单的流程:客户端为每一个时间任务启动一个ContainerLaunchContext,他会运行在每一个Container上。一旦ContainerLaunchContexter被定义,AM就可以通过ContainerManager启动被分配的container了
           
        //假设为从AllocateResponse获取的被分配的Container
        Container container;   
        // 在container上连接ContainerManager 
        String cmIpPortStr = container.getNodeId().getHost() + ":" 
            + container.getNodeId().getPort();              
        InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);               
        ContainerManager cm = 
            (ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);     
    
        // 启动一个ContainerLaunchContext  
        ContainerLaunchContext ctx = 
            Records.newRecord(ContainerLaunchContext.class);
    
        ctx.setContainerId(container.getId());
        ctx.setResource(container.getResource());
    
        try {
          ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());
        } catch (IOException e) {
          LOG.info(
              "Getting current user failed when trying to launch the container",
              + e.getMessage());
        }
    
        // 设置环境 
        Map<String, String> unixEnv;
        //启动环境主机这里不会继承AM的环境变量所欲需要重新设置container的环境    
        ctx.setEnvironment(unixEnv);
    
        // 设置本地环境变量
        Map<String, LocalResource> localResources = 
            new HashMap<String, LocalResource>();
        // 本地环境变量也没用从AM复制 
        // by default to the allocated container. Thus, it is the responsibility 
              // of the ApplicationMaster to setup all the necessary local resources 
              // needed by the job that will be executed on the allocated container. 
          
        // Assume that we are executing a shell script on the allocated container 
        // and the shell script's location in the filesystem is known to us. 
        Path shellScriptPath; 
        LocalResource shellRsrc = Records.newRecord(LocalResource.class);
        shellRsrc.setType(LocalResourceType.FILE);
        shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);          
        shellRsrc.setResource(
            ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));
        shellRsrc.setTimestamp(shellScriptPathTimestamp);
        shellRsrc.setSize(shellScriptPathLen);
        localResources.put("MyExecShell.sh", shellRsrc);
    
        ctx.setLocalResources(localResources);                      
    
        // Set the necessary command to execute on the allocated container 
        String command = "/bin/sh ./MyExecShell.sh"
            + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"
            + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";
    
        List<String> commands = new ArrayList<String>();
        commands.add(command);
        ctx.setCommands(commands);
    
        // Send the start request to the ContainerManager
        StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);
        startReq.setContainerLaunchContext(ctx);
        cm.startContainer(startReq);
  • ApplicationMaster会更新完成的containers作为 ApplicationMasterProtocol#allocate返回值得一部分,他可以监控查询ContainerManager的containers状态
        GetContainerStatusRequest statusReq = 
            Records.newRecord(GetContainerStatusRequest.class);
        statusReq.setContainerId(container.getId());
        GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);
        LOG.info("Container Status"
            + ", id=" + container.getId()
            + ", status=" + statusResp.getStatus());

FAQ

如何发布应用所使用的jar到所有的节点?

你可以使用LocalResource 添加资源到应用的request中。这个会使YARN发布资源到ApplicationMaster的节点上。如果这个资源是一个tgz zip文件你可以使用YARN替你解压,那么所有的浙西资源文件就在你的classpath下了。比如,当创建一个应用请求的时候可以这样:

    File packageFile = new File(packagePath);
    Url packageUrl = ConverterUtils.getYarnUrlFromPath(
        FileContext.getFileContext.makeQualified(new Path(packagePath)));

    packageResource.setResource(packageUrl);
    packageResource.setSize(packageFile.length());
    packageResource.setTimestamp(packageFile.lastModified());
    packageResource.setType(LocalResourceType.ARCHIVE);
    packageResource.setVisibility(LocalResourceVisibility.APPLICATION);

    resource.setMemory(memory)
    containerCtx.setResource(resource)
    containerCtx.setCommands(ImmutableList.of(
        "java -cp './package/*' some.class.to.Run "
        + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "
        + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))
    containerCtx.setLocalResources(
        Collections.singletonMap("package", packageResource))
    appCtx.setApplicationId(appId)
    appCtx.setUser(user.getShortUserName)
    appCtx.setAMContainerSpec(containerCtx)
    request.setApplicationSubmissionContext(appCtx)
    applicationsManager.submitApplication(request)

如你所见,setLocalResources将设置一个resources的map名称,这个名称会变成一个链接,你可以通过./package/*使用他们.

注意:Java的类路径(cp)参数非常敏感。确保你获得的语法完全正确。

一旦你的包分发给你的ApplicationMaster,你需要遵循相同的过程你ApplicationMaster启动一个新的容器(假设你想要的资源发送到您的容器)。代码是相同的。你只需要确保你给ApplicationMaster包路径(HDFS或本地),以便它可以发送连同容器ctx资源URL。

我如何得到ApplicationMaster的ApplicationAttemptId?

The ApplicationAttemptId will be passed to the ApplicationMaster via the environment and the value from the environment can be converted into an ApplicationAttemptId object via the ConverterUtils helper function.

My container is being killed by the Node Manager

This is likely due to high memory usage exceeding your requested container memory size. There are a number of reasons that can cause this. First, look at the process tree that the node manager dumps when it kills your container. The two things you're interested in are physical memory and virtual memory. If you have exceeded physical memory limits your app is using too much physical memory. If you're running a Java app, you can use -hprof to look at what is taking up space in the heap. If you have exceeded virtual memory, you may need to increase the value of the the cluster-wide configuration variable yarn.nodemanager.vmem-pmem-ratio.

How do I include native libraries?

Setting -Djava.library.path on the command line while launching a container can cause native libraries used by Hadoop to not be loaded correctly and can result in errors. It is cleaner to use LD_LIBRARY_PATH instead.



参考



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值