开发环境为win7下eclipse
hadoop集群部署在远程linux服务器上
首先我们创建一个hadoop-cluster.xml的配置文件,内容如下:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.32:8900</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
然后将配置文件内容加载到Configuration 中
Configuration conf=new Configuration();
conf.addResource("hadoop-cluster.xml");
当然也可以使用
conf.set("fs.defaultFS", "hdfs://192.168.1.32:8900");
的方式。
经过如上配置之后,执行MR程序,会报如下错误:
16:32:18,725 | INFO | main | RMProxy | pache.hadoop.yarn.client.RMProxy 92 | Connecting to ResourceManager at /0.0.0.0:8032
16:32:21,005 | INFO | main | Client | che.hadoop.ipc.Client$Connection 842 | Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s);
这是因为我们是远程提交,而ResourceManager的默认主机为0.0.0.0,我们需要手工指定一下
在hadoop-cluster.xml中添加
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.1.32</value>
</property>
在次执行MR程序,报如下错误
Job job_1468550321415_0015 failed with state FAILED due to: Application application_1468550321415_0015 failed 2 times due to AM Container for appattempt_1468550321415_0015_000002 exited with exitCode: 1 due to: Exception from container-launch: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
这是因为map reduce默认没有开启跨平台(win到linux)任务提交导致的,最终后影响环境变量的符号导致
因为在windows下是使用%JAVA_HOME%,而在linux下是$JAVA_HOME
解决方法如下,在hadoop-cluster.xml配置文件中增加如下配置:
<property>
<name>mapreduce.app-submission.cross-platform</name>
<value>true</value>
</property>
至此,就可以成功的在windows下远程提交mapreduce任务了。