在windows10下通过Eclipse远程运行wordCount

最新推荐文章于 2022-07-26 23:57:07 发布

JempChou

最新推荐文章于 2022-07-26 23:57:07 发布

阅读量593

点赞数

分类专栏：大数据文章标签： eclipse hadoop windows Linux

本文链接：https://blog.csdn.net/baidu_25555389/article/details/61205426

版权

大数据专栏收录该内容

5 篇文章 0 订阅

订阅专栏

在windows10下通过Eclipse远程连接Linux运行hadoop程序wordCount

本Jemp在虚拟机下面安装了一个Centos7系统，并在Centos下搭建了hadoop环境，在Centos7命令行终端运行样例正常，并且自己写的wordCount程序也成功运行。

准备工作：

远程机器可以正常运行hadoop程序（本Jemp搭建的是完全分布式系统）
本地windows10系统下已安装jdk和Eclipse，能够正常编译运行java项目
Eclipse安装hadoop插件（安装教程请百度）
远程机器的namenode和datanode已开启（Linux通过jps命令可以查看是否开启）

遇到的问题

下面说说本Jemp在捣鼓过程中遇到的一些问题及相应解决方法，如果能够对大家有一点点帮助，也是极好的。

1. 缺少jar包

因为在Windows下面没有安装hadoop，所以在Wordcount程序中提示找不到相关的类

解决方法：

从网上下载hadoop，解压以后，把里面的jar包通过Eclipse里的BuidPath选项（在项目名上面鼠标右键即可看见），然后找到相关的jar包并导入。
其中我导入的jar包如下（hadoop版本2.5+）
先进入hadoop安装的目录下：

cd $HADOOP_HOME

common目录下的三个jar包
hdfs目录下的三个jar包
mapreduce目录下的三个jar包
tools/lib目录下的所有jar（可能程序里面有些jar包没有用到，但是为了避免出现缺少相关jar包的情况最好还是全部导入吧，当然，如果你对每个jar包的用处熟悉的话可以根据自己的需求来导入相关的jar包）

其实根据路径名大概可以知道这些jar包的用途，比如mapreduce下的jar包就是关于mapreduce运算的，那么我想说明的是tools/lib目录下的jar包是干什么的呢？主要是一些工具类，比如如果你要从命令行读取参数的话需要用到 org.apache.commons.cli.Options这个类，而这个类就是在 tools/lib/commons-cli-***.jar里面，（这里的***表示的是版本号）。

2. 缺少二进制命令文件

提示以下错误

2017-03-10 17:33:22,491 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(397)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable I:\hadoop-common-2.7.1-bin-master\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:440)
    at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:486)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
    at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
    at test.WordCount.main(WordCount.java:63)
2017-03-10 17:33:23,009 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-10 17:33:30,903 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-10 17:33:30,905 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.195.135:8020/OutputData already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at test.WordCount.main(WordCount.java:77)

解决方法：

参考以下博客：
http://www.cnblogs.com/zq-inlook/p/4386216.html
但是，我配置以后还是有错误。找了一下原来是版本问题，我Centos7上跑的的64位hadoop2.7.3，所以重新找了一个比较新的下载：hadoop-common-2.7.1-bin-master
下载链接：https://github.com/JempChou/Jemp-/blob/master/tools/hadoop-common-2.7.1-bin-master.zip
用这个代替上面博客提供的文件然后配置好环境变量，ok!

3. 提示找不到输入文件

这是我用来测试的wordCount程序代码。

package test;

/**
 * * Licensed to the Apache Software Foundation (ASF) under one
 * * or more contributor license agreements. See the NOTICE file
 * * distributed with this work for additional information
 * * regarding copyright ownership. The ASF licenses this file
 * * to you under the Apache License, Version 2.0 (the
 * * "License"); you may not use this file except in compliance
 * * with the License. You may obtain a copy of the License at
 * *
 * * http://www.apache.org/licenses/LICENSE-2.0
 * *
 * * Unless required by applicable law or agreed to in writing, software
 * * distributed under the License is distributed on an "AS IS" BASIS,
 * * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * * See the License for the specific language governing permissions and
 * * limitations under the License.
 * */
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: wordcount <in> <out>");
            System.exit(2);
        }
        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

在Linux系统上运行时，我已经把本地测试数据put到HDFS上的/InputData目录中了，所以在命令行直接输入执行命令

hadoop jar WordCount.jar wordcount /InputData /OutputData

关于后面三个参数说明(实际上程序中保需要后两个参数)：
wordcount ：此项目（job）的名称，在WordCount.java中并未引用，但是不可少
/InputData：hdfs上的数据输入目录
/OutputData：hdfs上的数据输出目录（不需提前创建，Hadoop会自动创建）

所以在Eclipse下运行时也需要进行参数配置，方法是，右键——》run as——》run configuration——》Arguments——》program arguments ，当时我的参数配置如下：

/InputData /OutputData

但是，却提示以下错误：

2017-03-10 18:47:34,394 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-10 18:47:34,398 INFO  [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-03-10 18:47:35,024 WARN  [main] mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-03-10 18:47:35,035 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(249)) - Cleaning up the staging area file:/tmp/hadoop-JempChou/mapred/staging/JempChou325643891/.staging/job_local325643891_0001
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/InputData
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at test.WordCount.main(WordCount.java:77)

可以看到第三行错误提示：

Exception in thread “main” org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/InputData

意思是输入路径不存在。

解决办法：

原因出在这两行代码

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

因为是远程连接运行，所以不能直接这样写，需要加上具体的Namenode所在的机器的ip地址与端口号，修改后的参数配置如下：

hdfs://ip地址:端口号/InputData hdfs://ip地址:端口号/OutputData

注意中间的空格。具体的ip地址及端口号要根据你自己来填写。比如ip地址为111.112.113.114，端口号为8020，那么参数配置就应该如下：

hdfs://111.112.113.114:8020/InputData hdfs://111.112.113.114:8020/OutputData

8020是hadoop默认的端口号，查看端口号可以在浏览器中输入
http://具体namenode的ip地址:50070 进行查看，前提是Namenode已经在运行。正常会显示如下：
这里写图片描述

圈圈中的数字便是你的端口号。
好了，我的已经可以正常运行，你的呢？如果有问题，欢迎骚扰^^。

JempChou

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
在windows10下通过Eclipse远程运行wordCount

在windows10下通过Eclipse远程连接Linux运行hadoop程序wordCount
复制链接

扫一扫

专栏目录

在windows10下通过Eclipse远程运行wordCount

在windows10下通过Eclipse远程连接Linux运行hadoop程序wordCount

准备工作：

遇到的问题

1. 缺少jar包

解决方法：

2. 缺少二进制命令文件

解决方法：

3. 提示找不到输入文件

解决办法：

“相关推荐”对你有帮助么？