如何编写YARN应用程序

概念和流程

一般的概念就是“Application Submission Client”提交一个”Application”到YARN的Resource Manager。客户端(client)与ResourceManager之间通过”ClientRMProtocol”协议进行通信。如果有需要,客户端通过 ClientRMProtocol#getNewApplication 调用来获得一个新的“ApplicationId”,接着通过调用 ClientRMProtocol#submitApplication 来提交任务(Application)。作为 ClientRMProtocol#submitApplication 调用的一部分,客户端需要提供足够的信息给ResourceManager来启动应用程序的第一个container,即ApplicationMaster。客户端需要提供的信息包括任务运行所需要的本地文件,jar包,真正需要执行的命令(和必需的命令行参数),以及unix环境设置(可选)等。事实上,你需要为启动ApplicationMaster提供Unix进程信息。

YARN的ResourceManager接着会在RM分配的第一个container上启动指定的ApplicationMaster。ApplicationMaster与ResourceManager之间会通过‘AMRMProtocol’协议通信。首先,ApplicationMaster需要将自己注册到ResourceManager上。为了完成分配的任务,ApplicationMaster接着会通过 AMRMProtocol#allocate 协议请求请求和接受containers。如果分配到了container,ApplicationMaster就会通过 ContainerManager#startContainer 协议与NodeManager通信,来启动container。作为启动container的一部分,ApplicationMaster需要指定类似于ApplicationSubmissionContext的ContainerLaunchContext,里面包含了启动container所需的信息,比如命令行,环境变量等等。一旦任务完成,ApplicationMaster就会通过 AMRMProtocol#finishApplicationMaster 协议告知ResourceManager任务完成了。

同时,客户端可通过查询ResourceManager来监控应用的状态,或者如果ApplicationMaster支持这种调用服务也可以直接从ApplicationMaster来查询信息。如果有必要,客户端通过 ClientRMProtocol#forceKillApplication 也能杀死应用。

接口

你最需要关心的接口有:

  • ClientRMProtocol – ClientResourceManager
    这是客户端和ResourceManager之间的通信协议,可以用来启动一个新的应用(如ApplicationMaster),检查应用状态和杀死应用。例如,在gateway机器上提交job的客户端一般使用这个协议。
  • AMRMProtocol – ApplicationMasterResourceManager
    这是ApplicationMaster和ResourceManager之间的通信协议,ApplicationMaster通过这个协议可以向ResourceManager注册和注销自己,还能从Scheduler处请求资源以完成任务。
  • ContainerManager – ApplicationMasterNodeManager
    这是ApplicationMaster和NodeManager直接的通信协议,ApplicationMaster通过它来告诉NodeManager启动/停止container,如果有需要能从NodeManager处获取container的任务状态更新信息。

编写一个简单的Yarn应用程序

编写一个简单的客户端

    • 客户端第一步需要做的是要连接到ResourceManager,具体连接的是ResourceManager的ApplicationsManager(AsM)接口。
1 ClientRMProtocol applicationsManager;
2 YarnConfiguration yarnConf = new YarnConfiguration(conf);
3 InetSocketAddress rmAddress =
4     NetUtils.createSocketAddr(yarnConf.get(
5         YarnConfiguration.RM_ADDRESS,
6         YarnConfiguration.DEFAULT_RM_ADDRESS));            
7 LOG.info("Connecting to ResourceManager at " + rmAddress);
8 configuration appsManagerServerConf = new Configuration(conf);
9 appsManagerServerConf.setClass(
10     YarnConfiguration.YARN_SECURITY_INFO,
11     ClientRMSecurityInfo.class, SecurityInfo.class);
12 applicationsManager = ((ClientRMProtocol) rpc.getProxy(
13     ClientRMProtocol.class, rmAddress, appsManagerServerConf));
    • 一旦ASM的handler获取之后,客户端需要向ResourceManager请求一个新的ApplicationId。
1 GetNewApplicationRequest request =
2     Records.newRecord(GetNewApplicationRequest.class);             
3 GetNewApplicationResponse response =
4     applicationsManager.getNewApplication(request);
5 LOG.info("Got new ApplicationId=" + response.getApplicationId());
    • 从ASM返回的接口包括了集群的信息,例如最小/最大资源容易等。有了这些信息才能正确的设置container的参数,以启动ApplicationMaster。可以参考GetNewApplicationResponse以获取更多的信息。
    • 客户端的关键工作是设置ApplicationSubmissionContext,它定义了ResourceManager所需要的启动ApplicationMaster的所有信息:
      1). Application Info: id, name
      2). Queue, Priority info: 应用将要被提交的队列,以及应用要被赋予的优先级。
      3). User: 提交应用的用户
      4). ContainerLaunchContext: 定义了启动container需要的信息,ApplicationMaster会在这个container上运行。ContainerLaunchContext正如前面所描述的,定义了所有启动ApplicationMaster需要的信息,例如本地资源(二进制文件,jar包,文件等),, security tokens, 环境变量 (CLASSPATH etc.) 和被执行的命令。
1 // Create a new ApplicationSubmissionContext
2 ApplicationSubmissionContext appContext =
3     Records.newRecord(ApplicationSubmissionContext.class);
4 // set the ApplicationId
5 appContext.setApplicationId(appId);
6 // set the application name
7 appContext.setApplicationName(appName);
8  
9 // Create a new container launch context for the AM's container
10 ContainerLaunchContext amContainer =
11     Records.newRecord(ContainerLaunchContext.class);
12  
13 // Define the local resources required
14 Map<String, LocalResource> localResources =
15     new HashMap<String, LocalResource>();
16 // Lets assume the jar we need for our ApplicationMaster is available in
17 // HDFS at a certain known path to us and we want to make it available to
18 // the ApplicationMaster in the launched container
19 Path jarPath; // <- known path to jar file 
20 FileStatus jarStatus = fs.getFileStatus(jarPath);
21 LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
22 // Set the type of resource - file or archive
23 // archives are untarred at the destination by the framework
24 amJarRsrc.setType(LocalResourceType.FILE);
25 // Set visibility of the resource
26 // Setting to most private option i.e. this file will only
27 // be visible to this instance of the running application
28 amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);         
29 // Set the location of resource to be copied over into the
30 // working directory
31 amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));
32 // Set timestamp and length of file so that the framework
33 // can do basic sanity checks for the local resource
34 // after it has been copied over to ensure it is the same
35 // resource the client intended to use with the application
36 amJarRsrc.setTimestamp(jarStatus.getModificationTime());
37 amJarRsrc.setSize(jarStatus.getLen());
38 // The framework will create a symlink called AppMaster.jar in the
39 // working directory that will be linked back to the actual file.
40 // The ApplicationMaster, if needs to reference the jar file, would
41 // need to use the symlink filename. 
42 localResources.put("AppMaster.jar",  amJarRsrc);   
43 // Set the local resources into the launch context   
44 amContainer.setLocalResources(localResources);
45  
46 // Set up the environment needed for the launch context
47 Map<String, String> env = new HashMap<String, String>();   
48 // For example, we could setup the classpath needed.
49 // Assuming our classes or jars are available as local resources in the
50 // working directory from which the command will be run, we need to append
51 // "." to the path.
52 // By default, all the hadoop specific classpaths will already be available
53 // in $CLASSPATH, so we should be careful not to overwrite it.  
54 String classPathEnv = "$CLASSPATH:./*:";   
55 env.put("CLASSPATH", classPathEnv);
56 amContainer.setEnvironment(env);
57  
58 // Construct the command to be executed on the launched container
59 String command =
60     "${JAVA_HOME}" + /bin/java" +
61     " MyAppMaster" +
62     " arg1 arg2 arg3" +
63     " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
64     " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR +"/stderr";                    
65  
66 List<String> commands = new ArrayList<String>();
67 commands.add(command);
68 // add additional commands if needed               
69  
70 // Set the command array into the container spec
71 amContainer.setCommands(commands);
72  
73 // Define the resource requirements for the container
74 // For now, YARN only supports memory so we set the memory
75 // requirements.
76 // If the process takes more than its allocated memory, it will
77 // be killed by the framework.
78 // Memory being requested for should be less than max capability
79 // of the cluster and all asks should be a multiple of the min capability.
80 Resource capability = Records.newRecord(Resource.class);
81 capability.setMemory(amMemory);
82 amContainer.setResource(capability);
83  
84 // Set the container launch content into the ApplicationSubmissionContext
85 appContext.setAMContainerSpec(amContainer);
    • 在设置完进程信息后,客户端最后准备好了提交任务到ASM。
1 // Create the request to send to the ApplicationsManager
2 SubmitApplicationRequest appRequest =
3     Records.newRecord(SubmitApplicationRequest.class);
4 appRequest.setApplicationSubmissionContext(appContext);
5  
6 // Submit the application to the ApplicationsManager
7 // Ignore the response as either a valid response object is returned on
8 // success or an exception thrown to denote the failure
9 applicationsManager.submitApplication(appRequest);
  • 这时,ResourceManager将会接受这个任务,在后台根据获取的参数分配一个container,并且在这个container上启动ApplicationMaster。
  • 客户端有多种方法可以监控任务的实际进度。

1). 客户端可以通过 ClientRMProtocol#getApplicationReport 与ResourceManager通信来请求获取任务的状态。

1 GetApplicationReportRequest reportRequest =
2     Records.newRecord(GetApplicationReportRequest.class);
3 reportRequest.setApplicationId(appId);
4 GetApplicationReportResponse reportResponse =
5     applicationsManager.getApplicationReport(reportRequest);
6 ApplicationReport report = reportResponse.getApplicationReport();

从ResourceManager获取的任务状态报告ApplicationReport包括如下信息:

(1.1). 一般性的任务信息: ApplicationId, ApplicationId,application被提交到的queue,提交application的user,application开始的时间
(1.2). ApplicationMaster的详细信息: ApplicationMaster运行的主机,提供给client连接的rpc端口(如果有),以及client与ApplicationManager通讯需要的一个令牌(token).
(1.3). Application的监控信息: 如果任务支持某种类型的进程监控,它可以设置监控的url,客户端可以通过 ApplicationReport#getTrackingUrl 来获取url,并通过这个url来监控progress的状态.
(1.4). ApplicationStatus: ResourceManager能够看到的一些任务的状态,可以通过 Application#getYarnApplicationState 得到是否YarnApplicationState被设置为FINISHED,客户端可以通过 ApplicationReport#getFinalApplicationStatus 来check 任务的成功/失败。在失败时,ApplicationReport#getDiagnostics 可以提供一些关于失败的信息。

2). 如果ApplicationMaster支持,客户端可以直接通过ApplicationReport中包含的host:rpcport来查询ApplicationMaster以获得进程更新信息。如果能得到racking url,也能用于获取状态信息。

    • 在特定条件下,如果任务花费了太长时间或者其他因素,客户端可能希望终止任务。ClientRMProtocol协议支持forceKillApplication调用,允许客户端通过ResourceManager给ApplicationMaster发送一个kill消息。ApplicationMaster也可以通过设计为客户端提供abort调用,那么客户端就能通过rpc调用来终止任务了。
1 KillApplicationRequest killRequest =
2     Records.newRecord(KillApplicationRequest.class);               
3 killRequest.setApplicationId(appId);
4 applicationsManager.forceKillApplication(killRequest);

编写ApplicationMaster

    • ApplicationMaster是任务的实际拥有者。它由客户端通过ResouceManager启动,客户端提供了job运行需要的所有必要的信息和资源。ApplicationMaster负责任务的监控和相关工作的完成。
    • 启动于某个container内的ApplicationMaster在多用户环境下可能与其他container运行在相同的物理主机上,因此它无法使用预先配置的端口来监听。
    • 当ApplicationMaster启动时,可以通过环境变量来获得一些参数,例如:ApplicationMaster所在container的ContainerId,任务提交的时间,以及运行 ApplicationMaster的NodeManger主机的详细信息,可以查阅ApplicationConstants来获得参数名称。
    • 所有与ResouceManager的交互需要一个ApplicationAttemptId(如果任务失败可能会有多次重试)。ApplicationAttemptId能够通过ApplicationMaster的containerId来获得。有些辅助的API可以将从环境变量获得的值转换为对象。
1 Map<String, String> envs = System.getenv();
2 String containerIdString =
3     envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
4 if (containerIdString == null) {
5   // container id should always be set in the env by the framework
6   throw new IllegalArgumentException(
7       "ContainerId not set in the environment");
8 }
9 ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
10 ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
    • ApplicationMaster初始化完成后,可以通过 ARMRMProtocol#registerApplicationMaster 来向ResourceManager注册。ApplicationMaster经常通过ResouceManager的Scheduler接口与之通讯。
1 // Connect to the Scheduler of the ResourceManager.
2 YarnConfiguration yarnConf = new YarnConfiguration(conf);
3 InetSocketAddress rmAddress =
4     NetUtils.createSocketAddr(yarnConf.get(
5         YarnConfiguration.RM_SCHEDULER_ADDRESS,
6         YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));          
7 LOG.info("Connecting to ResourceManager at " + rmAddress);
8 AMRMProtocol resourceManager =
9     (AMRMProtocol) rpc.getProxy(AMRMProtocol.class, rmAddress, conf);
10  
11 // Register the AM with the RM
12 // Set the required info into the registration request:
13 // ApplicationAttemptId,
14 // host on which the app master is running
15 // rpc port on which the app master accepts requests from the client
16 // tracking url for the client to track app master progress
17 RegisterApplicationMasterRequest appMasterRequest =
18     Records.newRecord(RegisterApplicationMasterRequest.class);
19 appMasterRequest.setApplicationAttemptId(appAttemptID);    
20 appMasterRequest.setHost(appMasterHostname);
21 appMasterRequest.setRpcPort(appMasterRpcPort);
22 appMasterRequest.setTrackingUrl(appMasterTrackingUrl);
23  
24 // The registration response is useful as it provides information about the
25 // cluster.
26 // Similar to the GetNewApplicationResponse in the client, it provides
27 // information about the min/mx resource capabilities of the cluster that
28 // would be needed by the ApplicationMaster when requesting for containers.
29 RegisterApplicationMasterResponse response =
30     resourceManager.registerApplicationMaster(appMasterRequest);
  • ApplicationMaster需要发出心跳给ResouceManager,表示ApplicationMaster还活着且正在运行。在ResouceManager端设置的超时时间可以通过YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS来访问,缺省值为YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS。对ResouceManager的 AMRMProtocol#allocate 调用可以作为心跳,它还支持发送进度更新信息。因此,一次不请求任何container和不包含进度更新信息的allocate调用,对ResourceManager来说,是一种有效的发送心跳方式。
  • 按照任务的需求,ApplicationMaster可以申请一系列containers来运行任务。ApplicationMaster使用ResouceRequest类来指定container的规格:

1). hostname:如果container需要host在特定的rack或主机上,需要设定这个参数,其中“*”代表container可以分配在任何主机上。
2). Resouce capability:目前的YARN版本只支持基于内存的资源分配,因此资源请求只需要定义任务需要多少内存。内存的值以MB为单位,必须小于集群的最大容量,且是最小容量的整数倍。内存资源是以子任务的物理内存使用来设定限制的。
3). Priority:当申请到一些container时,ApplicationMaster可以给不同组的container设置不同的优先级,例如,对于Map-Reduce任务来说,ApplicationMaster可以给map任务的container指定比较高的优先级,而给reduce任务的container指定比较低的优先级。

1 // Resource Request
2 ResourceRequest rsrcRequest = Records.newRecord(ResourceRequest.class);
3  
4 // setup requirements for hosts
5 // whether a particular rack/host is needed
6 // useful for applications that are sensitive
7 // to data locality
8 rsrcRequest.setHostName("*");
9  
10 // set the priority for the request
11 Priority pri = Records.newRecord(Priority.class);
12 pri.setPriority(requestPriority);
13 rsrcRequest.setPriority(pri);          
14  
15 // Set up resource type requirements
16 // For now, only memory is supported so we set memory requirements
17 Resource capability = Records.newRecord(Resource.class);
18 capability.setMemory(containerMemory);
19 rsrcRequest.setCapability(capability);
20  
21 // set no. of containers needed
22 // matching the specifications
23 rsrcRequest.setNumContainers(numContainers);
  • 在定义了container的资源请求对象requirement以后,ApplicationMaster需要构建AllocateRequest发送到ResourceManager。AllocateRequest包括:

1). Requested containers:container的说明和ApplicationMaster从ResourceManager处申请的container的数量
2). Released containers:在某些情况下,ApplicationMaster可能申请了过多的container或者由于运行失败,决定使用其他已经分配给它的containers,这时它可以返还那些不用的container给ResourceManager,这些container可以分配给其他的应用使用。
3). ResponseId:在allocate调用时保持在response当中的response id
4). Progress update information:ApplicationMaster可以发送进度更新信息给ResourceManager(取值范围在0到1直接)。

1 List<ResourceRequest> requestedContainers;
2 List<ContainerId> releasedContainers   
3 AllocateRequest req = Records.newRecord(AllocateRequest.class);
4  
5 // The response id set in the request will be sent back in
6 // the response so that the ApplicationMaster can
7 // match it to its original ask and act appropriately.
8 req.setResponseId(rmRequestID);
9  
10 // Set ApplicationAttemptId
11 req.setApplicationAttemptId(appAttemptID);
12  
13 // Add the list of containers being asked for
14 req.addAllAsks(requestedContainers);
15  
16 // If the ApplicationMaster has no need for certain
17 // containers due to over-allocation or for any other
18 // reason, it can release them back to the ResourceManager
19 req.addAllReleases(releasedContainers);
20  
21 // Assuming the ApplicationMaster can track its progress
22 req.setProgress(currentProgress);
23  
24 AllocateResponse allocateResponse = resourceManager.allocate(req);
    • ResourceManager返回的AllocateResponse通过AMResponse对象包含了下面这些信息:

1). Reboot flag(重启标志):针对ApplicationMaster失去了和ResourceManager同步的场景
2). Allocated containers:分配给ApplicationMaster的containers
3). Headroom:整个集群的资源上限。基于这个信息和自身的资源需求,ApplicationMaster可以灵活的调整子任务的优先级以充分利用已经获得的containers,或者在无法获得资源时,能够尽快的脱离困境。
4). Completed containers:当ApplicationMaster启动了一个获得的container后,当这个container完成后,它将接收到来自ResourceManager的更新信息。ApplicationMaster能够查看完成的container的状态信息,并采取适当的策略,比如重试某个失败的任务。

有一点需要注意的是,container不一定会立即分配给ApplicationMaster。这不意味着ApplicationMaster需要持续不断的请求没有获得的containers。一旦allocate request被发送了,在考虑到集群容量、优先级和调度策略的条件下,ApplicationMaster最终会获得container。ApplicationMaster只有在它原有的请求数量有变化,需要新增container时,才需要再次发送资源请求。

1 // Get AMResponse from AllocateResponse
2 AMResponse amResp = allocateResponse.getAMResponse();                      
3  
4 // Retrieve list of allocated containers from the response
5 // and on each allocated container, lets assume we are launching
6 // the same job.
7 List<Container> allocatedContainers = amResp.getAllocatedContainers();
8 for (Container allocatedContainer : allocatedContainers) {
9   LOG.info("Launching shell command on a new container."
10       ", containerId=" + allocatedContainer.getId()
11       ", containerNode=" + allocatedContainer.getNodeId().getHost()
12       ":" + allocatedContainer.getNodeId().getPort()
13       ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()
14       ", containerState" + allocatedContainer.getState()
15       ", containerResourceMemory" 
16       + allocatedContainer.getResource().getMemory());
17  
18   // Launch and start the container on a separate thread to keep the main
19   // thread unblocked as all containers may not be allocated at one go.
20   LaunchContainerRunnable runnableLaunchContainer =
21       new LaunchContainerRunnable(allocatedContainer);
22   Thread launchThread = new Thread(runnableLaunchContainer);       
23   launchThreads.add(launchThread);
24   launchThread.start();
25 }
26  
27 // Check what the current available resources in the cluster are
28 Resource availableResources = amResp.getAvailableResources();
29 // Based on this information, an ApplicationMaster can make appropriate
30 // decisions
31  
32 // Check the completed containers
33 // Let's assume we are keeping a count of total completed containers,
34 // containers that failed and ones that completed successfully.                    
35 List<ContainerStatus> completedContainers =
36     amResp.getCompletedContainersStatuses();
37 for (ContainerStatus containerStatus : completedContainers) {                              
38   LOG.info("Got container status for containerID= "
39       + containerStatus.getContainerId()
40       ", state=" + containerStatus.getState()    
41       ", exitStatus=" + containerStatus.getExitStatus()
42       ", diagnostics=" + containerStatus.getDiagnostics());
43  
44   int exitStatus = containerStatus.getExitStatus();
45   if (0 != exitStatus) {
46     // container failed
47     // -100 is a special case where the container
48     // was aborted/pre-empted for some reason
49     if (-100 != exitStatus) {
50       // application job on container returned a non-zero exit code
51       // counts as completed
52       numCompletedContainers.incrementAndGet();
53       numFailedContainers.incrementAndGet();                                                       
54     }
55     else {
56       // something else bad happened
57       // app job did not complete for some reason
58       // we should re-try as the container was lost for some reason
59       // decrementing the requested count so that we ask for an
60       // additional one in the next allocate call.         
61       numRequestedContainers.decrementAndGet();
62       // we do not need to release the container as that has already
63       // been done by the ResourceManager/NodeManager.
64     }
65     }
66     else {
67       // nothing to do
68       // container completed successfully
69       numCompletedContainers.incrementAndGet();
70       numSuccessfulContainers.incrementAndGet();
71     }
72   }
73 }
    • 当一个container分配给ApplicationMaster以后,ApplicationMaster需要做和Client类似的过程来为最终运行的task设置ContainerLaunchContext,使得task能够在已分配的container上运行。一旦ContainerLaunchContext定义好了,ApplicationMaster就能够与ContainerManager进行通信和启动已分配的container。
1 //Assuming an allocated Container obtained from AMResponse
2 Container container;  
3 // Connect to ContainerManager on the allocated container
4 String cmIpPortStr = container.getNodeId().getHost() + ":"
5     + container.getNodeId().getPort();             
6 InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);              
7 ContainerManager cm =
8     (ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);    
9  
10 // Now we setup a ContainerLaunchContext 
11 ContainerLaunchContext ctx =
12     Records.newRecord(ContainerLaunchContext.class);
13  
14 ctx.setContainerId(container.getId());
15 ctx.setResource(container.getResource());
16  
17 try {
18   ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());
19 catch (IOException e) {
20   LOG.info(
21       "Getting current user failed when trying to launch the container",
22       + e.getMessage());
23 }
24  
25 // Set the environment
26 Map<String, String> unixEnv;
27 // Setup the required env.
28 // Please note that the launched container does not inherit
29 // the environment of the ApplicationMaster so all the
30 // necessary environment settings will need to be re-setup
31 // for this allocated container.     
32 ctx.setEnvironment(unixEnv);
33  
34 // Set the local resources
35 Map<String, LocalResource> localResources =
36     new HashMap<String, LocalResource>();
37 // Again, the local resources from the ApplicationMaster is not copied over
38 // by default to the allocated container. Thus, it is the responsibility
39       // of the ApplicationMaster to setup all the necessary local resources
40       // needed by the job that will be executed on the allocated container.
41  
42 // Assume that we are executing a shell script on the allocated container
43 // and the shell script's location in the filesystem is known to us.
44 Path shellScriptPath;
45 LocalResource shellRsrc = Records.newRecord(LocalResource.class);
46 shellRsrc.setType(LocalResourceType.FILE);
47 shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);         
48 shellRsrc.setResource(
49     ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));
50 shellRsrc.setTimestamp(shellScriptPathTimestamp);
51 shellRsrc.setSize(shellScriptPathLen);
52 localResources.put("MyExecShell.sh", shellRsrc);
53  
54 ctx.setLocalResources(localResources);                     
55  
56 // Set the necessary command to execute on the allocated container
57 String command = "/bin/sh ./MyExecShell.sh"
58     " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"
59     " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";
60  
61 List<String> commands = new ArrayList<String>();
62 commands.add(command);
63 ctx.setCommands(commands);
64  
65 // Send the start request to the ContainerManager
66 StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);
67 startReq.setContainerLaunchContext(ctx);
68 cm.startContainer(startReq);
    • 正如前面提到的,ApplicationMaster通过AMRMProtocol#allocate调用的返回信息,能够得到任务的完成进度信息,它也能够通过查询ContainerManager的状态来主动监测已经启动的containers。
1 GetContainerStatusRequest statusReq =
2     Records.newRecord(GetContainerStatusRequest.class);
3 statusReq.setContainerId(container.getId());
4 GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);
5 LOG.info("Container Status"
6     ", id=" + container.getId()
7     ", status=" + statusResp.getStatus());

FAQ

我如何将我的应用的jar包放到YARN集群的所有节点上?

你可以使用LocalResource将所需要的资源添加到你应用的资源请求中。这将使YARN分发这些资源到ApplicationMaster的节点。如果资源的类型是 tgz, zip或者jar包,你可以让YARN去解压它。所有你需要做的只是将未压缩的文件夹添加到你的classpath中。例如,像下面这样创建你的应用的资源请求:

1 File packageFile = new File(packagePath);
2 Url packageUrl = ConverterUtils.getYarnUrlFromPath(
3     FileContext.getFileContext.makeQualified(new Path(packagePath)));
4  
5 packageResource.setResource(packageUrl);
6 packageResource.setSize(packageFile.length());
7 packageResource.setTimestamp(packageFile.lastModified());
8 packageResource.setType(LocalResourceType.ARCHIVE);
9 packageResource.setVisibility(LocalResourceVisibility.APPLICATION);
10  
11 resource.setMemory(memory)
12 containerCtx.setResource(resource)
13 containerCtx.setCommands(ImmutableList.of(
14     "java -cp './package/*' some.class.to.Run "
15     "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "
16     "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))
17 containerCtx.setLocalResources(
18     Collections.singletonMap("package", packageResource))
19 appCtx.setApplicationId(appId)
20 appCtx.setUser(user.getShortUserName)
21 appCtx.setAMContainerSpec(containerCtx)
22 request.setApplicationSubmissionContext(appCtx)
23 applicationsManager.submitApplication(request)

正如你所看到的,setLocalResources方法建立了一个名字到资源的映射,名字成为一个软链接链接到你应用的当前目录,因此通过使用 ./package*.,你就可以访问这些资源了。

注意:Java的classpath参数是很敏感的。务必保证你使用的语法完全正确。

一旦你的资源包被分发到ApplicationMaster节点,无论任何时候当ApplicationMaster启动一个新的container时,你只需要遵循这个相同的过程(假设你是希望资源被分发到你的container节点的)。完全可以重用这段代码,你只需要给ApplicationMaster资源包路径(无论是在HDFS上或者本地路径),这样资源的URL就可以随着container的ctx一起发送过去。

我如何获取ApplicationMaster的ApplicationAttemptId?

ApplicationAttemptId会作为环境变量发送给ApplicationMaster,因此可以从环境变量中得到它的值,此外通过辅助函数ConverterUtils还能将其转化为ApplicationAttemptId对象。

我的container被NodeManager杀掉了

这可能是因为比较高的内存使用超出了你的container的内存大小。有一系列的原因可能产生这种现象,首先可以产看当container被kill时,node manager dump出来的进程树。你需要关注的两个参数是物理内存和虚拟内存。如果你超出了物理内存限制,说明你的应用使用了太多的物理内存。如果你运行的是一个Java应用程序,你可以使用 -hprof 来查看是什么占用了堆的空间。如果你超出了虚拟内存的限制,你需要增大针对集群的配置 yarn.nodemanager.vmem-pmem-ratio。

我如何包含本地库?

当你启动container时,通过命令行参数 -Djava.library.path 会导致hadoop使用的本地库无法正常加载而导致失败。较明智的做法是使用LD_LIBRARY_PATH。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值