用命令行运行hadoop程序WordCount，编译hadoop程序报错

最新推荐文章于 2023-10-14 17:22:46 发布

crazyzhb2012

最新推荐文章于 2023-10-14 17:22:46 发布

阅读量3.4k

点赞数 1

分类专栏： hadoop 文章标签： Hadoop wordcount 命令行

本文链接：https://blog.csdn.net/crazyzhb2012/article/details/9258247

版权

hadoop 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

用命令行运行hadoop程序，中间出现了很多错误，和大家分享一下

将WordCount.java文件放在Hadoop安装目录下，我的是放在/home/administrator/hadoop-0.20.2/下，并在此目录下创建输入目录input,改目录下有输入文件file01.txt,file02.txt

file01.txt内容为：

hello hadoop1

hello hadoop2
hello hadoop3
hello hadoop2
hello hadoop1
hello hadoop5
hello hadoop5
hello hadoop5
hello world1
hello world1
hello world2
hell word

file02.txt内容为：

hello world1

hello world2
hello world2
hello world2
hello world1
hello hadoop5
hello hadoop5
hell word
hell word

在集群上创建输入文件夹
bin/hadoop fs -mkdir wordcount_input

上传本地目录input下的file文件

bin/hadoop fs -put /home/administrator/input/file01.txt wordcount_input
bin/hadoop fs -put /home/administrator/input/file02.txt wordcount_input

开始编译WordCount.java
我一开始的时候出现了编译错误

administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount

错误提示：

/home/administrator/hadoop-0.20.2/WordCount.java:46: 无法访问 org.apache.commons.cli.Options
未找到 org.apache.commons.cli.Options 的类文件
String[] otherArgs = new GenericOptionsParser(conf, arg).getRemainingArgs();
^
1 错误

编译成功：
administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/administrator/hadoop-0.20.2/lib/commons-cli-1.2.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount

只要在 classpath 中再加上一个 jar 包即可：
ubuntu@ubuntu:~/dev/wordcount$ javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java

打包报错：

administrator@Master:/$ jar -cvf wordcount.jar -C /home/administrator/hadoop-0.20.2/WordCount/v1/*.class

错误提示：

java.io.FileNotFoundException: wordcount.jar (Permission denied)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
   at java.io.FileOutputStream.<init>(FileOutputStream.java:99)
   at sun.tools.jar.Main.run(Main.java:187)
   at sun.tools.jar.Main.main(Main.java:1167)

打包成功：
示例 1：将两个类文件归档到一个名为 classes.jar 的归档文件中：
       jar cvf classes.jar Foo.class Bar.class
示例 2：使用现有的清单文件 "mymanifest" 并
           将 foo/ 目录中的所有文件归档到 "classes.jar" 中：
       jar cvfm classes.jar mymanifest -C foo/ .

administrator@Master:~/hadoop-0.20.2$ jar -cvf WordCount.jar -C WordCount .（ps:这个点号不能少,我的WordCount程序中是有包的）
标明清单(manifest)
增加：v1/(读入= 0) (写出= 0)(存储了 0%)
增加：v1/WordCount$TokenizerMapper.class(读入= 1852) (写出= 770)(压缩了 58%)
增加：v1/WordCount$IntSumReducer.class(读入= 1741) (写出= 741)(压缩了 57%)
增加：v1/WordCount.class(读入= 1839) (写出= 993)(压缩了 46%)

运行没有结果
administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar wordcount.jar v1/WordCount wordcount_input wordcount_output
13/07/06 11:03:06 INFO input.FileInputFormat: Total input paths to process : 0
13/07/06 11:03:07 INFO mapred.JobClient: Running job: job_201307061012_0002
13/07/06 11:03:08 INFO mapred.JobClient: map 0% reduce 0%
13/07/06 11:03:27 INFO mapred.JobClient: map 0% reduce 100%
13/07/06 11:03:29 INFO mapred.JobClient: Job complete: job_201307061012_0002
13/07/06 11:03:29 INFO mapred.JobClient: Counters: 8
13/07/06 11:03:29 INFO mapred.JobClient:   Job Counters
13/07/06 11:03:29 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/06 11:03:29 INFO mapred.JobClient:   Map-Reduce Framework
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce input groups=0
13/07/06 11:03:29 INFO mapred.JobClient:     Combine output records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce output records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Spilled Records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Combine input records=0
13/07/06 11:03:29 INFO mapred.JobClient:     Reduce input records=0
ps:后来我发现原来是我的WordCount.java程序中多写了String[] arg = { "hdfs://localhost:9000/user/administrator/input", "hdfs://localhost:9000/user/administrator/output" };命令行是指定输入输出文件夹的。

运行成功：

administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar WordCount.jar v1/WordCount wordcount_input wordcount_output

v1/WordCount是主类，不要忘记了把包名v1加上。我开始就犯了这个错误，主要是不懂怎么运行jar文件，看来基础的东西还是要牢牢掌握的好啊

13/07/06 11:20:49 INFO input.FileInputFormat: Total input paths to process : 2
13/07/06 11:20:50 INFO mapred.JobClient: Running job: job_201307061012_0003
13/07/06 11:20:51 INFO mapred.JobClient: map 0% reduce 0%
13/07/06 11:21:05 INFO mapred.JobClient: map 50% reduce 0%
13/07/06 11:21:08 INFO mapred.JobClient: map 100% reduce 0%
13/07/06 11:21:14 INFO mapred.JobClient: map 100% reduce 16%
13/07/06 11:21:20 INFO mapred.JobClient: map 100% reduce 100%
13/07/06 11:21:25 INFO mapred.JobClient: Job complete: job_201307061012_0003
13/07/06 11:21:25 INFO mapred.JobClient: Counters: 17
13/07/06 11:21:25 INFO mapred.JobClient:   Job Counters
13/07/06 11:21:25 INFO mapred.JobClient:     Launched reduce tasks=1
13/07/06 11:21:25 INFO mapred.JobClient:     Launched map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient:     Data-local map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient:   FileSystemCounters
13/07/06 11:21:25 INFO mapred.JobClient:     FILE_BYTES_READ=196
13/07/06 11:21:25 INFO mapred.JobClient:     HDFS_BYTES_READ=276
13/07/06 11:21:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=462
13/07/06 11:21:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=81
13/07/06 11:21:25 INFO mapred.JobClient:   Map-Reduce Framework
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce input groups=9
13/07/06 11:21:25 INFO mapred.JobClient:     Combine output records=15
13/07/06 11:21:25 INFO mapred.JobClient:     Map input records=23
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce shuffle bytes=80
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce output records=9
13/07/06 11:21:25 INFO mapred.JobClient:     Spilled Records=30
13/07/06 11:21:25 INFO mapred.JobClient:     Map output bytes=442
13/07/06 11:21:25 INFO mapred.JobClient:     Combine input records=42
13/07/06 11:21:25 INFO mapred.JobClient:     Map output records=42
13/07/06 11:21:25 INFO mapred.JobClient:     Reduce input records=15

查看结果：
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount/part-r-00000
cat: File does not exist: wordcount/part-r-00000
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount_output/part-r-00000
hadoop1   2
hadoop2   2
hadoop3   1
hadoop5   5
hell   3
hello   18
word   3
world1   4
world2   4
administrator@Master:~/hadoop-0.20.2$