mapreduce系列(3)----在window端远程提交mr程序运行

之前讲到windows上跑本地版的mapreduce程序,毫无问题,
但是更进一步,我现在想直接把我的idea上的程序运行在linunx集群上,这样,我的本地就相当于是mapreduce的一个客户端了。
沿着这个思路,我们直接把conf配置如下设置:

conf.set("mapreduce.framework.name","yarn");
conf.set("yarn.resourcemanager.hostname","mini01");
conf.set("fs.defaultFS","hdfs://mini01:9000/");

运行,发下如下错误:

17/03/17 19:02:22 INFO client.RMProxy: Connecting to ResourceManager at mini01/192.168.153.11:8032
17/03/17 19:02:22 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/03/17 19:02:22 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
17/03/17 19:02:22 INFO input.FileInputFormat: Total input paths to process : 1
17/03/17 19:02:22 INFO mapreduce.JobSubmitter: number of splits:1
17/03/17 19:02:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489496419130_0002
17/03/17 19:02:37 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
17/03/17 19:05:18 INFO impl.YarnClientImpl: Submitted application application_1489496419130_0002
17/03/17 19:09:19 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002
Exception in thread "main" java.io.IOException: Failed to run job : Application application_1489496419130_0002 failed 2 times due to AM Container for appattempt_1489496419130_0002_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://mini01:8088/proxy/application_1489496419130_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1489496419130_0002_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job control

Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:241)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1315)
    at wc.WordCountRunner.main(WordCountRunner.java:78)

可以知道是在YarnRunner中shell脚本导致的错误。
单步跟踪源码到YarnRunner中的submitJob()中

  // Construct necessary information to start the MR AM
  ApplicationSubmissionContext appContext =createApplicationSubmissionContext(conf, jobSubmitDir, ts);

appContext的信息如下:

application_id { id: 2 cluster_timestamp: 1489496419130 } application_name: "N/A" queue: "default" am_container_spec { localResources { key: "jobSubmitDir/job.splitmetainfo" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.splitmetainfo" } size: 27 timestamp: 1489698635869 type: FILE visibility: APPLICATION } } localResources { key: "jobSubmitDir/job.split" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.split" } size: 112 timestamp: 1489698635836 type: FILE visibility: APPLICATION } } localResources { key: "job.xml" value { resource { scheme: "hdfs" host: "mini01" port: 9000 file: "/tmp/hadoop-yarn/staging/root/.staging/job_1489496419130_0002/job.xml" } size: 88715 timestamp: 1489698636066 type: FILE visibility: APPLICATION } } tokens: "HDTS\000\000\001\025MapReduceShuffleToken\b\213\023`\302+\213\302`" environment { key: "HADOOP_CLASSPATH" value: "%PWD%;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*;null" } environment { key: "SHELL" value: "/bin/bash" } environment { key: "CLASSPATH" value: "%PWD%;%HADOOP_CONF_DIR%;%HADOOP_COMMON_HOME%/share/hadoop/common/*;%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*;%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/*;%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\*;%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\lib\\*;job.jar/job.jar;job.jar/classes/;job.jar/lib/*;%PWD%/*" } environment { key: "LD_LIBRARY_PATH" value: "%PWD%" } command: "%JAVA_HOME%/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr " application_ACLs { accessType: APPACCESS_VIEW_APP acl: " " } application_ACLs { accessType: APPACCESS_MODIFY_APP acl: " " } } cancel_tokens_when_complete: true maxAppAttempts: 2 resource { memory: 1536 virtual_cores: 1 } applicationType: "MAPREDUCE"

可以看到是把windows上的路径上的”%”和”;”传到linux上了,所以只要这个类在拷贝到自己的工程中修改路径即可(包名和路径名不能有任何变化)
org.apache.hadoop.mapred.YARNRunner.java文件原封不动的拷贝到自己的src下,包名和路径名不能有任何变化。
修改如下几个地方即可:
第一处:

// Setup the command to run the AM
List<String> vargs = new ArrayList<String>(8);
//TODO:注释一下代码
//vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");      

//TODO: tianjun修改的源码    System.out.println(MRApps.crossPlatformifyMREnv(jobConf,Environment.JAVA_HOME)+"/bin/java");
System.out.println("$JAVA_HOME/bin/java");
vargs.add("$JAVA_HOME/bin/java");

第二处:

  //TODO: tianjun修改的源码
  for (String key : environment.keySet()){
      String org = environment.get(key);
      String linux = getLinux(org);
      environment.put(key,linux);
  }

// Setup ContainerLaunchContext for AM container
ContainerLaunchContext amContainer =
        ContainerLaunchContext.newInstance(localResources, environment,
                vargsFinal, null, securityTokens, acls);

增加上面使用过的getLinx()函数:

//TODO:tianjun 增加
private String getLinux(String org) {
    StringBuilder sb = new StringBuilder();
    int c = 0;
    for (int i = 0; i < org.length(); i++) {
        if (org.charAt(i) == '%') {
            c++;
            if (c % 2 == 1) {
                sb.append("$");
            }
        } else {
            switch (org.charAt(i)) {
                case ';':
                    sb.append(":");
                    break;

                case '\\':
                    sb.append("/");
                    break;
                default:
                    sb.append(org.charAt(i));
                    break;
            }
        }
    }
    return (sb.toString());
}

还有一个地方需要十分注意的是:
driver类中需要setJar配置绝对路径,因为setJarByclass本质上是依靠hadoop jar这个命令里面的脚本来读取绝对路径的,现在我们的客户端是在windows上,没有运行在linux集群上,所以setJarByclass会报mapper找不到的错误的。

 wcjob.setJar("F:/myWorkPlace/java/dubbo/demo/dubbo-demo/mr-demo1/target/mr.demo-1.0-SNAPSHOT.jar");

//如果从本地拷贝,是不行的,这时需要使用setJar
//wcjob.setJarByClass(WordCountRunner.class);
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值