Hadoop源码解析之distributedshell

最新推荐文章于 2022-11-12 16:54:19 发布

liushahe2012

最新推荐文章于 2022-11-12 16:54:19 发布

阅读量1.4k

点赞数 1

分类专栏：大数据 hadoop yarn 文章标签： hadoop 源码 yarn

本文链接：https://blog.csdn.net/liushahe2012/article/details/54606700

版权

大数据同时被 3 个专栏收录

25 篇文章 0 订阅

订阅专栏

hadoop

25 篇文章 0 订阅

订阅专栏

yarn

14 篇文章 1 订阅

订阅专栏

Hadoop源码解析之distributedshell

1. 概述

本文介绍YARN自带的一个非常简单的应用程序编程实例—distributedshell，他可以看做YARN编程中的“helloworld”，它的主要功能是并行执行用户提供的shell命令或者shell脚本。本文主要介绍distributedshell的实现方法。

版本为hadoop-2.5.2

Distributedshell的源代码在文件夹

hadoop-2.5.2-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-applications\hadoop-yarn-applications-distributedshell下。

Distributedshell的实现完全与一般YARN应用程序的编写方法完全一致。

2. 客户端解析

DistributedshellClient的入口main函数如下：

public static void main(String[]args) {

boolean result = false;

try {

Client client =new Client();

LOG.info("Initializing Client");

try {

boolean doRun = client.init(args);

if (!doRun) {

System.exit(0);

}

} catch (IllegalArgumentExceptione) {

System.err.println(e.getLocalizedMessage());

client.printUsage();

System.exit(-1);

}

result = client.run();

} catch (Throwable t) {

LOG.fatal("Error running CLient",t);

System.exit(1);

}

…

}

2.1 构造yarn的客户端对象yarnClient。

创建时会指定本Client要用到的AM。创建yarnClient。yarn将client与RM的交互抽象出了编程库YarnClient，用以应用程序提交、状态查询和控制等，简化应用程序。

public Client(Configurationconf) throws Exception {

this( //指定AM

"org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster",

conf);

}

利用YarnClient类创建一个可以直接与ResourceManager交互的客户端yarnClient。

Client(String appMasterMainClass,Configuration conf) {

…

yarnClient = YarnClient.createYarnClient(); //创建yarnClient

yarnClient.init(conf);

opts = new Options();

opts.addOption("appname", true, "ApplicationName. Default value - DistributedShell");

opts.addOption("priority", true, "ApplicationPriority. Default 0");

}

2.2 初始化

init会解析命令行传入的参数，例如使用的jar包、内存大小、cpu个数等。代码里使用GnuParser解析：init时定义所有的参数opts（可以认为是一个模板），然后将opts和实际的args传入解析后得到一个CommnadLine对象，后面查询选项直接操作该CommnadLine对象即可，如cliParser.hasOption("help")和cliParser.getOptionValue("jar")

public boolean init(String[] args)throws ParseException {

CommandLine cliParser =new GnuParser().parse(opts,args);

amMemory = Integer.parseInt(cliParser.getOptionValue("master_memory","10"));

amVCores = Integer.parseInt(cliParser.getOptionValue("master_vcores","1"));

…

2.3 运行

Run方法中，启动客户端

DistributedShellClient中最重要的是函数为run()，该函数实现过程如下：

public boolean run() throws IOException, YarnException {

…

//先启动yarnClient，会建立跟RM的RPC连接，之后就跟调用本地方法一样。通过此yarnClient查询NM个数、NM详细信息（ID/地址/Container个数等）

yarnClient.start();

YarnClusterMetrics clusterMetrics= yarnClient.getYarnClusterMetrics();

//通过yarnClient向ASM获取全部节点信息：

List<NodeReport>clusterNodeReports = yarnClient.getNodeReports(NodeState.RUNNING);

//收集提交AM所需的信息

YarnClientApplication app = yarnClient.createApplication();//创建app

GetNewApplicationResponse appResponse = app.getNewApplicationResponse();

…

//构造ApplicationSubmissionContext，用于提交ApplicationMaster。

ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();

//构造AM的container，加载上下文，包含本地资源，环境变量，实际命令。

ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);

//AM需要的本地资源，如jar包、log文件

Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();

FileSystem fs = FileSystem.get(conf);

addToLocalResources(fs,appMasterJar,appMasterJarPath, appId.toString(),localResources,null);//添加localResource

//Set the log4j properties if needed

if (!log4jPropFile.isEmpty()) {

addToLocalResources(fs,log4jPropFile,log4jPath, appId.toString(),localResources,null);

}

//添加localResource到amContainer。

amContainer.setLocalResources(localResources);

//设置环境变量

Map<String, String> env = newHashMap<String, String>();

env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION,hdfsShellScriptLocation);

env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP,Long.toString(hdfsShellScriptTimestamp));

env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN,Long.toString(hdfsShellScriptLen));

//添加nev到amContainer。

amContainer.setEnvironment(env);

//添加命令行到amContainer

List<String>commands = new ArrayList<String>();

commands.add(command.toString());

amContainer.setCommands(commands);

//添加验证信息到amContainer

DataOutputBuffer dob =new DataOutputBuffer();

credentials.writeTokenStorageToStream(dob);

ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(),0, dob.getLength());

amContainer.setTokens(fsTokens);

// 添加amContainer到appContext

appContext.setAMContainerSpec(amContainer);

//设置优先级

appContext.setPriority(pri);

//设置队列

appContext.setQueue(amQueue);

//最后提交AM到yarnClient

yarnClient.submitApplication(appContext);

//启动监控。 Client只关心自己提交到RM的AM是否正常运行，而AM内部的多个task，由AM管理。如果Client要查询应用程序的任务信息，需要自己设计与AM的交互。

return monitorApplication(appId);

总的来说，Client做的事情比较简单，即建立与RM的连接，提交AM，监控AM运行状态。

3. ApplicationMaster解析

AM简化框架如下：
publicstaticvoidmain(String[]args) {

boolean result = false;

ApplicationMaster appMaster =new ApplicationMaster();

boolean doRun = appMaster.init(args);

if (!doRun) {

System.exit(0);

}

appMaster.run();

result = appMaster.finish();

yarn抽象了两个编程库，AMRMClient和NMClient(AM和RM都可以用)，简化AM编程。

3.1设置RM、NM消息的异步处理方法

//设置并启动RM消息的响应类RMCallbackHandler

AMRMClientAsync.CallbackHandler allocListener = newRMCallbackHandler();

amRMClient= AMRMClientAsync.createAMRMClientAsync(1000, allocListener);

amRMClient.init(conf);

amRMClient.start();

//设置并启动NM消息的响应类NMCallbackHandler

containerListener = createNMCallbackHandler();

nmClientAsync = newNMClientAsyncImpl(containerListener);

nmClientAsync.init(conf);

nmClientAsync.start();

3.2 向RM注册

RegisterApplicationMasterResponse response =amRMClient

.registerApplicationMaster(appMasterHostname,appMasterRpcPort,appMasterTrackingUrl);

3.3计算需要的Container，向RM发起请求

for (int i =0; i < numTotalContainersToRequest; ++i) {

ContainerRequest containerAsk = setupContainerAskForRM();

amRMClient.addContainerRequest(containerAsk);

}

private ContainerRequest setupContainerAskForRM() {

Priority pri= Records.newRecord(Priority.class);

Resource capability = Records.newRecord(Resource.class);

//指定需要的memory/cpu能力

capability.setMemory(containerMemory);

capability.setVirtualCores(containerVirtualCores);

ContainerRequest request=new ContainerRequest(capability,null,null,pri);

return request;

}

3.4 RM分配Container给AM，AM启动任务

RMCallbackHandler RM消息的响应，由RMCallbackHandler处理。示例中主要对前两种消息进行了处理。

private class MCallbackHandler implements AMRMClientAsync.CallbackHandler {

//处理消息：Container执行完毕。在RM返回的心跳应答中携带。如果心跳应答中有已完成和新分配两种Container，先处理已完成

public void onContainersCompleted(List<ContainerStatus> completedContainers) {}

...

//处理消息：RM新分配Container。在RM返回的心跳应答中携带

public void onContainersAllocated(List<Container> allocatedContainers) {}

public void onShutdownRequest() {done= true;}

//节点状态变化

public void onNodesUpdated(List<NodeReport> updatedNodes) {}

public floatgetProgress() {}

onContainersAllocated收到分配的Container之后，会提交任务到NM。

public void onContainersAllocated(List<Container> allocatedContainers) {

for (Container allocatedContainer: allocatedContainers) {

//创建runnable容器

LaunchContainerRunnable runnableLaunchContainer=

new LaunchContainerRunnable(allocatedContainer,containerListener);

//新建线程

Thread launchThread = new Thread(runnableLaunchContainer);

// launch and start the container on a separate thread to keep

// the main thread unblocked

// as all containers may not be allocated at one go.

launchThreads.add(launchThread);

//线程中提交Container到NM，不影响主流程

launchThread.start();

}

简单分析下LaunchContainerRunnable。该类实现自Runnable，其run方法准备任务命令

private class LaunchContainerRunnable implements Runnable {

public LaunchContainerRunnable(

Container lcontainer, NMCallbackHandlercontainerListener) {

this.container= lcontainer; //创建时记录待使用的Container

this.containerListener= containerListener;

}

public void run() {

//根据命令、环境变量、本地资源等创建Container加载上下文

ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);

ctx.setEnvironment(shellEnv);

ctx.setLocalResources(localResources);

ctx.setCommands(commands);

ctx.setTokens(allTokens.duplicate());

containerListener.addContainer(container.getId(), container);

//异步启动Container

nmClientAsync.startContainerAsync(container, ctx);

}

onContainersCompleted的功能比较简单，收到Container执行完毕的消息，检查其执行结果，如果执行失败，则重新发起请求，直到全部完成。

NM消息的响应，由NMCallbackHandler处理。

在示例里，回调句柄对NM通知过来的各种事件的处理比较简单，只是修改AM维护的Container执行完成、失败的个数。这样等到有Container执行完毕后，可以重启发起请求。失败处理和上面Container执行完毕消息的处理类似，达到了上面问题里所说的loopback效果。

static class NMCallbackHandler

implements NMClientAsync.CallbackHandler {

@Override

public void onContainerStopped(ContainerIdcontainerId) {

@Override

public void onContainerStatusReceived(ContainerIdcontainerId,

@Override

public void onContainerStarted(ContainerId containerId,

...

总的来说，AM做的事就是向RM/NM注册回调函数，然后请求Container；得到Container后提交任务，并跟踪这些任务的执行情况，如果失败了则重新提交，直到全部任务完成。

参考：

http://www.datastart.cn/tech/2015/05/05/yarn-dist-shell.html

liushahe2012

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hadoop源码解析之distributedshell

Hadoop源码解析之distributedshell1. 概述本文介绍YARN自带的一个非常简单的应用程序编程实例—distributedshell，他可以看做YARN编程中的“helloworld”，它的主要功能是并行执行用户提供的shell命令或者shell脚本。本文主要介绍distributedshell的实现方法。版本为hadoop-2.5.2Distributed
复制链接

扫一扫

专栏目录