前提
- hadoop版本是3.1.1
- 在源码添加打印日志 方便学习
在RetryInvocationHandler添加打印,这样每次RPC调用我们都能看到相关日志,方便定位流程
开始正题:
执行SQL:很简单打印当前目录
hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.3.0.1.0-187.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1.3.0.1.0-187.jar -shell_command 'echo ---- %cd%' -num_containers 2 -container_memory 300 -master_memory 400
打印日志
08/13 14:22:37 [main] Client(254):Initializing Client
08/13 14:22:37 [main] Client(632):Running Client
08/13 14:22:37 [main] RMProxy(133):Connecting to ResourceManager at /0.0.0.0:8032
08/13 14:31:55 [main] RetryInvocationHandler(355):==>hander 被执行method:getResourceTypeInfo
08/13 14:31:55 [main] RetryInvocationHandler(355):==>hander 被执行method:getClusterMetrics,{}
08/13 14:31:55 [main] Client(636):Got Cluster metric info from ASM, numNodeManagers=1
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getClusterNodes,{nodeStates: NS_RUNNING}
08/13 14:31:56 [main] Client(641):Got Cluster node info from ASM
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getQueueInfo,{queueName: "default" includeApplications: true includeChildQueues: false recursive: false}
08/13 14:31:56 [main] Client(651):Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getQueueUserAcls,{}
08/13 14:31:56 [main] Client(661):User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS
08/13 14:31:56 [main] Client(661):User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE
08/13 14:31:56 [main] Client(661):User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
08/13 14:31:56 [main] Client(661):User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getResourceProfiles,{org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetAllResourceProfilesRequestPBImpl@5c86a017}
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getNewApplication,{}
08/13 14:31:56 [main] Client(706):Max mem capability of resources in this cluster 8192
08/13 14:31:56 [main] Client(717):Max virtual cores capability of resources in this cluster 4
08/13 14:31:56 [main] RetryInvocationHandler(355):==>hander 被执行method:getResourceTypeInfo,{org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetAllResourceTypeInfoRequestPBImpl@da6efc73}
08/13 14:31:56 [main] Client(1184):AM vcore not specified, use 1 mb as AM vcores
08/13 14:31:56 [main] Client(1191):AM Resource capability=<memory:400, vCores:1>
08/13 14:31:56 [main] Client(765):Copy App Master jar from local filesystem and add to local environment
08/13 14:31:56 [main] FileSystem(3296):Loading filesystems[viewfs=class org.apache.hadoop.fs.viewfs.ViewFileSystem, swebhdfs=class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem, file=class org.apache.hadoop.fs.LocalFileSystem, har=class org.apache.hadoop.fs.HarFileSystem, http=class org.apache.hadoop.fs.http.HttpFieSystem, hdfs=class org.apache.hadoop.hdfs.DistributedFileSystem, webhdfs=class org.apache.hadoop.hdfs.web.WebHdfsFileSystem, https=class org.apache.hadoop.fs.http.HttpsFileSystem],8
08/13 14:31:57 [main] RetryInvocationHandler(355):==>hander 被执行method:toString,{}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:getFileInfo,{/user/***/DistributedShell/application_1597216395121_0007/AppMaster.jar}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:toString,{}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:create,{/user/***/DistributedShell/application_1597216395121_0007/AppMaster.jar,{ masked: rw-r--r--, unmasked: rw-rw-rw- },DFSClient_NONMAPREDUCE_1707719962_1,[CREATE, OVERWRITE],true,1,134217728,{CryptoProtocolVersion{description='Encry
08/13 14:31:58 [Thread-6] RetryInvocationHandler(355):==>hander 被执行method:addBlock,{/user/***/DistributedShell/application_1597216395121_0007/AppMaster.jar,DFSClient_NONMAPREDUCE_1707719962_1,<null>,<null>,17203,<null>,[]}
08/13 14:31:58 [Thread-6] RetryInvocationHandler(355):==>hander 被执行method:getServerDefaults,{}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:complete,{/user/***/DistributedShell/application_1597216395121_0007/AppMaster.jar,DFSClient_NONMAPREDUCE_1707719962_1,BP-1150083184-10.180.201.39-1587535791595:blk_1073742314_1492,17203}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:getFileInfo,{/user/***/DistributedShell/application_1597216395121_0007/AppMaster.jar}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:toString,{}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:create,{/user/***/DistributedShell/application_1597216395121_0007/shellCommands,{ masked: rw-r--r--, unmasked: rw-rw-rw- },DFSClient_NONMAPREDUCE_1707719962_1,[CREATE, OVERWRITE],true,1,134217728,{CryptoProtocolVersion{description='Encry
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:setPermission,{/user/***/DistributedShell/application_1597216395121_0007/shellCommands,rwx--x---}
08/13 14:31:58 [Thread-9] RetryInvocationHandler(355):==>hander 被执行method:addBlock,{/user/***/DistributedShell/application_1597216395121_0007/shellCommands,DFSClient_NONMAPREDUCE_1707719962_1,<null>,<null>,17204,<null>,[]}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:complete,{/user/***/DistributedShell/application_1597216395121_0007/shellCommands,DFSClient_NONMAPREDUCE_1707719962_1,BP-1150083184-10.180.201.39-1587535791595:blk_1073742315_1493,17204}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:getFileInfo,{/user/***/DistributedShell/application_1597216395121_0007/shellCommands}
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:toString,{}
08/13 14:31:58 [main] Client(814):Set the environment for the application master
08/13 14:31:58 [main] Client(856):Setting up app master command
08/13 14:31:58 [main] Client(918):Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx400m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type GUARANTEED --container_memory 300 --container_vcores 1 --num_containers 2 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG
08/13 14:31:58 [main] Client(985):Submitting application to ASM
08/13 14:31:58 [main] RetryInvocationHandler(355):==>hander 被执行method:submitApplication,
08/13 14:31:59 [main] YarnClientImpl(306):Submitted application application_1597216395121_0007
08/13 14:32:00 [main] RetryInvocationHandler(355):==>hander 被执行method:getApplicationReport,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:00 [main] Client(1021):Got application report from ASM for, appId=7, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1597300318994, yarnAppState=ACCEPTED, distributedFinalSate=UNDEFINED, appTrackingUrl=http://***.home.langchao.com:8088/proxy/application_1597216395121_0007/, appUser=***
08/13 14:32:01 [main] RetryInvocationHandler(355):==>hander 被执行method:getApplicationReport,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:01 [main] Client(1021):Got application report from ASM for, appId=7, clientToAMToken=null, appDiagnostics=AM container is launched, waiting for AM container to Register with RM, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1597300318994, yarnAppState=ACCEPTED, distributedFinalSate=UNDEFINED, appTrackingUrl=http://***.home.langchao.com:8088/proxy/application_1597216395121_0007/, appUser=***
08/13 14:32:28 [main] RetryInvocationHandler(355):==>hander 被执行method:getApplicationReport,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:28 [main] Client(1021):Got application report from ASM for, appId=7, clientToAMToken=null, appDiagnostics=, appMasterHost=***/10.180.201.39, appQueue=default, appMasterRpcPort=-1, appStartTime=1597300318994, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://***home.langchao.com:8088/proxy/application_1597216395121_0007/, appUser=***
08/13 14:32:29 [main] RetryInvocationHandler(355):==>hander 被执行method:getApplicationReport,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:37 [main] Client(1021):Got application report from ASM for, appId=7, clientToAMToken=null, appDiagnostics=, appMasterHost=***/10.180.201.39, appQueue=default, appMasterRpcPort=-1, appStartTime=1597300318994, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://***home.langchao.com:8088/proxy/application_1597216395121_0007/, appUser=***
08/13 14:32:37 [main] Client(1057):Reached client specified timeout for application. Killing application
08/13 14:32:37 [main] RetryInvocationHandler(355):==>hander 被执行method:forceKillApplication,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:37 [main] RetryInvocationHandler(355):==>hander 被执行method:forceKillApplication,{application_id { id: 7 cluster_timestamp: 1597216395121 }}
08/13 14:32:37 [main] YarnClientImpl(474):Killed application application_1597216395121_0007
08/13 14:32:37 [main] Client(274):Application failed to complete successfully
从日志可以看出大概逻辑分为如下几步
- 1.获取 Cluster metric 信息
- 2.获取 Cluster node info from ASM
- 3.获取Queue info
- 4. 获取User ACL Info for Queue
- 5.Get the resource profiles available in the RM
- 7.Get available resource types supported by RM
- 8.整理文件
- 9.提交任务
- 10.循环获取报告源码
源码
1.开始main
public static void main(String[] args) {
boolean result = false;
try {
Client client = new Client();
LOG.info("Initializing Client");
try {
//init处理命令行参数
boolean doRun = client.init(args);
if (!doRun) {
System.exit(0);
}
} catch (IllegalArgumentException e) {
System.err.println(e.getLocalizedMessage());
client.printUsage();
System.exit(-1);
}
//
result = client.run();
} catch (Throwable t) {
LOG.error("Error running Client", t);
System.exit(1);
}
if (result) {
LOG.info("Application completed successfully");
System.exit(0);
}
LOG.error("Application failed to complete successfully");
System.exit(2);
}
2.进入业务代码
代码太长进行用"...." 进行忽略
public boolean run() throws IOException, YarnException {
LOG.info("Running Client");
yarnClient.start();
1.获取 Cluster metric 信息
YarnClusterMetrics clusterMetrics = yarnClient.getYarnClusterMetrics();
....
2.获取 Cluster node info from ASM
List<NodeReport> clusterNodeReports = yarnClient.getNodeReports(
NodeState.RUNNING);
LOG.info("Got Cluster node info from ASM");
....
3.获取Queue info
QueueInfo queueInfo = yarnClient.getQueueInfo(this.amQueue);
....
4. 获取User ACL Info for Queue
List<QueueUserACLInfo> listAclInfo = yarnClient.getQueueAclsInfo();
.....
5.5.Get the resource profiles available in the RM
Map<String, Resource> profiles;
try {
profiles = yarnClient.getResourceProfiles();
} catch (YARNFeatureNotEnabledException re) {
profiles = null;
}
List<String> appProfiles = new ArrayList<>(2);
appProfiles.add(amResourceProfile);
appProfiles.add(containerResourceProfile);
....
6.Get a new application id
YarnClientApplication app = yarnClient.createApplication();
GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
.....
ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
ApplicationId appId = appContext.getApplicationId();
// Set up resource type requirements
// For now, both memory and vcores are supported, so we set memory and
// vcores requirements
7.Get available resource types supported by RM
List<ResourceTypeInfo> resourceTypes = yarnClient.getResourceTypeInfo();
....
8.整理参数文件
包含三部分 resource,commands,env
还包含处理 token
....
9.提交任务
yarnClient.submitApplication(appContext);
10.循环获取报告
return monitorApplication(appId);
}