Win7下eclipse提交Job到hadoop集群

Win7下eclipse提交Job到hadoop集群

参考:http://zy19982004.iteye.com/blog/2031172

之前跑通eclipse连接hadoop2.2.0原来是local版的,一直在web UI界面上没有显示。

1.   出现的问题

1.1 main函数中的conf配置代码

Configuration conf = new Configuration();

conf.set("fs.defaultFS", "hdfs://192.168.178.181:9000");

conf.set("mapreduce.job.jar","D:\\Qing_WordCount.jar");

conf.addResource(new Path(    "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\hdfs-site.xml"));

conf.addResource(new Path(          "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\mapred-site.xml"));

conf.addResource(new Path(  "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\core-site.xml"));

conf.addResource(new Path(          "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\yarn-site.xml"));

1.2    错误提示

application_1386170530016_0001 failed 2times due to AM Container for appattempt_1386170530016_0001_000002 exited withexitCode: 1 due to: Exception from container-launch:

org.apache.hadoop.util.Shell$ExitCodeException:/bin/bash: line 0: fg: no job control

atorg.apache.hadoop.util.Shell.runCommand(Shell.java:464)

atorg.apache.hadoop.util.Shell.run(Shell.java:379)

atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

atjava.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

atjava.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)

1.3    原因

网上搜索结果,发现这是hadoop2.2.0以及2.3.0的bug,由于不同操作系统之间的引起问题,详见apache官方Jira上的Bug提交:https://issues.apache.org/jira/browse/MAPREDUCE-5655

2.   解决办法

2.1 下载补丁

去官方jira下载补丁

2.2    打补丁

1)        用SecureCRT连接Master节点,进入hadoop-src源码目录。

2)        执行patch -p0 < MRApps.patch

按照提示输入:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java

3)        执行patch -p0 < YARNRunner.patch

按照提示输入:

hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java

4)        同理还有HADOOP-10110.patch文件,这个文件的作用就是实现以下替换。

<源码根目录>/hadoop-common-project/hadoop-auth/pom.xml找到:

<dependency>

     <groupId>org.mortbay.jetty</groupId>

     <artifactId>jetty</artifactId>

     <scope>test</scope>

</dependency>

在这之后添加:

<dependency>

     <groupId>org.mortbay.jetty</groupId>

     <artifactId>jetty-util</artifactId>

     <scope>test</scope>

</dependency>

2.3    重新编译hadoop2.2.0

# mvn clean package -Pdist,native-DskipTests -Dtar -e –X

-e -X 参数是输出错误信息用的

编译的时候,一般会有网络连接不上的错误,重新执行命令就好了,一般需要个把小时。

成功后的截图:

2.4    获得Hadoop文件

最终我们希望获得的hadoop2.2.0文件在:

<源码根目录>/hadoop-dist/target/hadoop-2.2.0.tar.gz

2.5    获得BUG修复的Jar文件

1)        将hadoop-2.2.0.tar.gz解压,进入\share\hadoop\mapreduce目录,找到hadoop-mapreduce-client-jobclient-2.2.0.jar和hadoop-mapreduce-client-common-2.2.0.jar这两个jar,这两个就是导致eclipse上job提交不上集群的有BUG的文件。

2)        将这两个文件替换hadoop集群上在<hadoop目录>\share\hadoop\mapreduce下的同名文件。

3)        将这两个文件替换windows 7上的hadoop安装目录,也就是eclipse中的Window->Perferences->Hadoop Map/Reduece->Hadoopinstallation directory

2.6    修改mapred-site.xml配置文件

1)        修改hadoop集群所有节点上的mapred-site.xml文件,添加以下信息。

<property>

                            <name>mapreduce.application.classpath</name>

                            <value>

                                     $HADOOP_CONF_DIR,

                                     $HADOOP_COMMON_HOME/share/hadoop/common/*,

                                     $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,

                                     $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,

                                     $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,

                                     $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,

                                     $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,

                                     $HADOOP_YARN_HOME/share/hadoop/yarn/*,

                                     $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

                            </value>

         </property>

2)        修改windows环境下的mapred-site.xml,添加

<property> 

 <name>mapred.remote.os</name> 

 <value>Linux</value> 

<description>RemoteMapReduce framework's OS, can be either Linux orWindows</description> 

 </property>

3.   总结

3.1 配置过程

Window向Linux Hadoop提交作业的方法

1)        配置好hadoop eclipse plugin。

2)        Job配置文件里mapreduce.framework.name为yarn。其它配置也需要正确。

3)        Run On Hadoop

3.2其他问题

若出现下面这个错误的话,那就是Job的jar没有设置。

 

解决办法:conf.set("mapreduce.job.jar","D:\\Qing_WordCount.jar");

其中Qing_WordCount.jar这是当前工程打包成jar,然后通过conf去设置这个jar的位置。

 

错误信息:

No job jar file set.  User classes may not be found. See Job orJob#setJar(String).

Error: java.lang.RuntimeException:java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found

   atorg.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)

   atorg.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)

   atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)

   atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

   atorg.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

   atjava.security.AccessController.doPrivileged(NativeMethod)

   atjavax.security.auth.Subject.doAs(Subject.java:415)

   atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

   atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found

   atorg.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)

   atorg.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)

   ... 8more

4.   成功演示

4.1 代码

import java.io.IOException;

import java.util.StringTokenizer;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

 

public classWordCount {

   public static class TokenizerMapper extends

         Mapper<Object,Text, Text, IntWritable> {

 

      private final static IntWritable one =new IntWritable(1);

      private Textword =new Text();

 

      // value已经是文件内容的一行

      public void map(Object key, Textvalue, Context context)

            throws IOException,InterruptedException {

         StringTokenizeritr = newStringTokenizer(value.toString());

         while (itr.hasMoreTokens()) {

            word.set(itr.nextToken());

            context.write(word,one);

         }

      }

   }

 

   public static class IntSumReducer extends

         Reducer<Text,IntWritable, Text, IntWritable> {

      private IntWritableresult =new IntWritable();

 

      public void reduce(Text key,Iterable<IntWritable> values,

            Contextcontext) throwsIOException, InterruptedException {

         int sum = 0;

         for (IntWritable val :values) {

            sum+= val.get();

         }

         result.set(sum);

         context.write(key,result);

      }

   }

 

   public static void main(String[] args)throws Exception {

      Configurationconf = newConfiguration();

      conf.set("fs.defaultFS","hdfs://192.168.178.181:9000");

      conf.addResource(new Path(

            "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\hdfs-site.xml"));

      conf.addResource(new Path(

            "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\mapred-site.xml"));

      conf.addResource(new Path(

            "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\core-site.xml"));

      conf.addResource(new Path(

            "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\yarn-site.xml"));

      String[]otherArgs = newGenericOptionsParser(conf, args)

            .getRemainingArgs();

      if (otherArgs.length != 2) {

         System.err.println("Usage: wordcount <in> <out>");

         System.exit(2);

      }

      Jobjob = newJob(conf,"word count");

      job.setJarByClass(WordCount.class);

      job.setMapperClass(TokenizerMapper.class);

      job.setCombinerClass(IntSumReducer.class);

      job.setReducerClass(IntSumReducer.class);

      job.setOutputKeyClass(Text.class);

      job.setOutputValueClass(IntWritable.class);

      FileInputFormat.addInputPath(job,newPath(otherArgs[0]));

      FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));

      System.exit(job.waitForCompletion(true) ? 0 : 1);

   }

  

}

4.2    eclipse Run Configuration

4.3    结果

1) Eclipse中的输出:

2014-04-26 18:53:45,881 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(56))- Connecting to ResourceManager at master/192.168.178.181:8032

2014-04-26 18:53:46,347 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(287))- Total input paths to process : 1

2014-04-26 18:53:46,429 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(394))- number of splits:1

2014-04-26 18:53:46,444 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- user.name is deprecated. Instead, use mapreduce.job.user.name

2014-04-26 18:53:46,444 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.jar is deprecated. Instead, use mapreduce.job.jar

2014-04-26 18:53:46,445 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- fs.default.name is deprecated. Instead, use fs.defaultFS

2014-04-26 18:53:46,446 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.value.class is deprecated. Instead, usemapreduce.job.output.value.class

2014-04-26 18:53:46,446 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.job.name is deprecated. Instead, use mapreduce.job.name

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.input.dir is deprecated. Instead, usemapreduce.input.fileinputformat.inputdir

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.dir is deprecated. Instead, usemapreduce.output.fileoutputformat.outputdir

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.key.class is deprecated. Instead, usemapreduce.job.output.key.class

2014-04-26 18:53:46,449 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

2014-04-26 18:53:46,571 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(477))- Submitting tokens for job: job_1398500824037_0005

2014-04-26 18:53:46,802 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(174))- Submitted application application_1398500824037_0005 to ResourceManager atmaster/192.168.178.181:8032

2014-04-26 18:53:46,862 INFO  [main] mapreduce.Job (Job.java:submit(1272)) - The url to track thejob: http://master:8088/proxy/application_1398500824037_0005/

2014-04-26 18:53:46,863 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1317))- Running job: job_1398500824037_0005

2014-04-26 18:53:55,827 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1338))- Job job_1398500824037_0005 running in uber mode : false

2014-04-26 18:53:55,829 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 0% reduce 0%

2014-04-26 18:54:02,899 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 100% reduce 0%

2014-04-26 18:54:12,982 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 100% reduce 100%

2014-04-26 18:54:12,992 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356))- Job job_1398500824037_0005 completed successfully

2014-04-26 18:54:13,097 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1363))- Counters: 43

   FileSystem Counters

      FILE:Number of bytes read=6216483

      FILE:Number of bytes written=12593661

      FILE:Number of read operations=0

      FILE:Number of large read operations=0

      FILE:Number of write operations=0

      HDFS:Number of bytes read=6144044

      HDFS:Number of bytes written=6090981

      HDFS:Number of read operations=6

      HDFS:Number of large read operations=0

      HDFS:Number of write operations=2

   JobCounters

      Launchedmap tasks=1

      Launchedreduce tasks=1

      Data-localmap tasks=1

      Totaltime spent by all maps in occupied slots (ms)=5026

      Totaltime spent by all reduces in occupied slots (ms)=7932

   Map-ReduceFramework

      Mapinput records=38038

      Mapoutput records=19349

      Map output bytes=6189081

      Mapoutput materialized bytes=6216483

      Inputsplit bytes=121

      Combineinput records=19349

      Combineoutput records=18984

      Reduceinput groups=18984

      Reduceshuffle bytes=6216483

      Reduceinput records=18984

      Reduceoutput records=18984

      SpilledRecords=37968

      ShuffledMaps =1

      FailedShuffles=0

      MergedMap outputs=1

      GCtime elapsed (ms)=259

      CPUtime spent (ms)=4020

      Physicalmemory (bytes) snapshot=311271424

      Virtualmemory (bytes) snapshot=1683406848

      Totalcommitted heap usage (bytes)=164630528

   ShuffleErrors

      BAD_ID=0

      CONNECTION=0

      IO_ERROR=0

      WRONG_LENGTH=0

      WRONG_MAP=0

      WRONG_REDUCE=0

   FileInput Format Counters

      BytesRead=6143923

   FileOutput Format Counters

      BytesWritten=6090981

2)        Web UI中的输出:

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值