在windows10下通过Eclipse远程运行wordCount

在windows10下通过Eclipse远程连接Linux运行hadoop程序wordCount

本Jemp在虚拟机下面安装了一个Centos7系统,并在Centos下搭建了hadoop环境,在Centos7命令行终端运行样例正常,并且自己写的wordCount程序也成功运行。

准备工作:

  • 远程机器可以正常运行hadoop程序(本Jemp搭建的是完全分布式系统)
  • 本地windows10系统下已安装jdk和Eclipse,能够正常编译运行java项目
  • Eclipse安装hadoop插件(安装教程请百度)
  • 远程机器的namenode和datanode已开启(Linux通过jps命令可以查看是否开启)

遇到的问题

下面说说本Jemp在捣鼓过程中遇到的一些问题及相应解决方法,如果能够对大家有一点点帮助,也是极好的。

1. 缺少jar包

因为在Windows下面没有安装hadoop,所以在Wordcount程序中提示找不到相关的类

解决方法:

从网上下载hadoop,解压以后,把里面的jar包通过Eclipse里的BuidPath选项(在项目名上面鼠标右键即可看见),然后找到相关的jar包并导入。
其中我导入的jar包如下(hadoop版本2.5+)
先进入hadoop安装的目录下:

cd $HADOOP_HOME

  • common目录下的三个jar包
  • hdfs目录下的三个jar包
  • mapreduce目录下的三个jar包
  • tools/lib目录下的所有jar(可能程序里面有些jar包没有用到,但是为了避免出现缺少相关jar包的情况最好还是全部导入吧,当然,如果你对每个jar包的用处熟悉的话可以根据自己的需求来导入相关的jar包)

其实根据路径名大概可以知道这些jar包的用途,比如mapreduce下的jar包就是关于mapreduce运算的,那么我想说明的是tools/lib目录下的jar包是干什么的呢?主要是一些工具类,比如如果你要从命令行读取参数的话需要用到 org.apache.commons.cli.Options这个类,而这个类就是在 tools/lib/commons-cli-***.jar里面,(这里的***表示的是版本号)。

2. 缺少二进制命令文件

提示以下错误

2017-03-10 17:33:22,491 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(397)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable I:\hadoop-common-2.7.1-bin-master\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:440)
    at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:486)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
    at test.WordCount.main(WordCount.java:63)
2017-03-10 17:33:23,009 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-10 17:33:30,903 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-10 17:33:30,905 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.195.135:8020/OutputData already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at test.WordCount.main(WordCount.java:77)
解决方法:

参考以下博客:
http://www.cnblogs.com/zq-inlook/p/4386216.html
但是,我配置以后还是有错误。找了一下原来是版本问题,我Centos7上跑的的64位hadoop2.7.3,所以重新找了一个比较新的下载:hadoop-common-2.7.1-bin-master
下载链接:https://github.com/JempChou/Jemp-/blob/master/tools/hadoop-common-2.7.1-bin-master.zip
用这个代替上面博客提供的文件然后配置好环境变量,ok!

3. 提示找不到输入文件

这是我用来测试的wordCount程序代码。

package test;

/**
 * * Licensed to the Apache Software Foundation (ASF) under one
 * * or more contributor license agreements. See the NOTICE file
 * * distributed with this work for additional information
 * * regarding copyright ownership. The ASF licenses this file
 * * to you under the Apache License, Version 2.0 (the
 * * "License"); you may not use this file except in compliance
 * * with the License. You may obtain a copy of the License at
 * *
 * * http://www.apache.org/licenses/LICENSE-2.0
 * *
 * * Unless required by applicable law or agreed to in writing, software
 * * distributed under the License is distributed on an "AS IS" BASIS,
 * * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * * See the License for the specific language governing permissions and
 * * limitations under the License.
 * */
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }
        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

在Linux系统上运行时,我已经把本地测试数据put到HDFS上的/InputData目录中了,所以在命令行直接输入执行命令

hadoop jar WordCount.jar wordcount /InputData /OutputData

关于后面三个参数说明(实际上程序中保需要后两个参数):
wordcount :此项目(job)的名称,在WordCount.java中并未引用,但是不可少
/InputData:hdfs上的数据输入目录
/OutputData:hdfs上的数据输出目录(不需提前创建,Hadoop会自动创建)

所以在Eclipse下运行时也需要进行参数配置,方法是,右键——》run as——》run configuration——》Arguments——》program arguments ,当时我的参数配置如下:

/InputData /OutputData

但是,却提示以下错误:

2017-03-10 18:47:34,394 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-10 18:47:34,398 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-03-10 18:47:35,024 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-03-10 18:47:35,035 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(249)) - Cleaning up the staging area file:/tmp/hadoop-JempChou/mapred/staging/JempChou325643891/.staging/job_local325643891_0001
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/InputData
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at test.WordCount.main(WordCount.java:77)

可以看到第三行错误提示:

Exception in thread “main” org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/InputData

意思是输入路径不存在。

解决办法:

原因出在这两行代码

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

因为是远程连接运行,所以不能直接这样写,需要加上具体的Namenode所在的机器的ip地址与端口号,修改后的参数配置如下:

hdfs://ip地址:端口号/InputData hdfs://ip地址:端口号/OutputData

注意中间的空格。具体的ip地址及端口号要根据你自己来填写。比如ip地址为111.112.113.114,端口号为8020,那么参数配置就应该如下:

hdfs://111.112.113.114:8020/InputData hdfs://111.112.113.114:8020/OutputData

8020是hadoop默认的端口号,查看端口号可以在浏览器中输入
http://具体namenode的ip地址:50070 进行查看,前提是Namenode已经在运行。正常会显示如下:
这里写图片描述

圈圈中的数字便是你的端口号。
好了,我的已经可以正常运行,你的呢?如果有问题,欢迎骚扰^^。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值