用命令行运行hadoop程序,中间出现了很多错误,和大家分享一下
将WordCount.java文件放在Hadoop安装目录下,我的是放在/home/administrator/hadoop-0.20.2/下,并在此目录下创建输入目录input,改目录下有输入文件file01.txt,file02.txt
file01.txt内容为:
hello hadoop1
hello hadoop2
hello hadoop3
hello hadoop2
hello hadoop1
hello hadoop5
hello hadoop5
hello hadoop5
hello world1
hello world1
hello world2
hell word
file02.txt内容为:
hello world1
hello world2
hello world2
hello world2
hello world1
hello hadoop5
hello hadoop5
hell word
hell word
bin/hadoop fs -mkdir wordcount_input
上传本地目录input下的file文件
bin/hadoop fs -put /home/administrator/input/file02.txt wordcount_input
开始编译WordCount.java
我一开始的时候出现了编译错误
administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount
错误提示:
未找到 org.apache.commons.cli.Options 的类文件
String[] otherArgs = new GenericOptionsParser(conf, arg).getRemainingArgs();
^
1 错误
编译成功:
administrator@Master:/$ javac -classpath /home/administrator/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/administrator/hadoop-0.20.2/lib/commons-cli-1.2.jar /home/administrator/hadoop-0.20.2/WordCount.java -d /home/administrator/hadoop-0.20.2/WordCount
只要在 classpath 中再加上一个 jar 包即可:
ubuntu@ubuntu:~/dev/wordcount$ javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java
打包报错:
administrator@Master:/$ jar -cvf wordcount.jar -C /home/administrator/hadoop-0.20.2/WordCount/v1/*.class
错误提示:
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at java.io.FileOutputStream.<init>(FileOutputStream.java:99)
at sun.tools.jar.Main.run(Main.java:187)
at sun.tools.jar.Main.main(Main.java:1167)
打包成功:
示例 1:将两个类文件归档到一个名为 classes.jar 的归档文件中:
jar cvf classes.jar Foo.class Bar.class
示例 2:使用现有的清单文件 "mymanifest" 并
将 foo/ 目录中的所有文件归档到 "classes.jar" 中:
jar cvfm classes.jar mymanifest -C foo/ .
administrator@Master:~/hadoop-0.20.2$ jar -cvf WordCount.jar -C WordCount .(ps:这个点号不能少,我的WordCount程序中是有包的)
标明清单(manifest)
增加:v1/(读入= 0) (写出= 0)(存储了 0%)
增加:v1/WordCount$TokenizerMapper.class(读入= 1852) (写出= 770)(压缩了 58%)
增加:v1/WordCount$IntSumReducer.class(读入= 1741) (写出= 741)(压缩了 57%)
增加:v1/WordCount.class(读入= 1839) (写出= 993)(压缩了 46%)
运行没有结果
administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar wordcount.jar v1/WordCount wordcount_input wordcount_output
13/07/06 11:03:06 INFO input.FileInputFormat: Total input paths to process : 0
13/07/06 11:03:07 INFO mapred.JobClient: Running job: job_201307061012_0002
13/07/06 11:03:08 INFO mapred.JobClient: map 0% reduce 0%
13/07/06 11:03:27 INFO mapred.JobClient: map 0% reduce 100%
13/07/06 11:03:29 INFO mapred.JobClient: Job complete: job_201307061012_0002
13/07/06 11:03:29 INFO mapred.JobClient: Counters: 8
13/07/06 11:03:29 INFO mapred.JobClient: Job Counters
13/07/06 11:03:29 INFO mapred.JobClient: Launched reduce tasks=1
13/07/06 11:03:29 INFO mapred.JobClient: Map-Reduce Framework
13/07/06 11:03:29 INFO mapred.JobClient: Reduce input groups=0
13/07/06 11:03:29 INFO mapred.JobClient: Combine output records=0
13/07/06 11:03:29 INFO mapred.JobClient: Reduce shuffle bytes=0
13/07/06 11:03:29 INFO mapred.JobClient: Reduce output records=0
13/07/06 11:03:29 INFO mapred.JobClient: Spilled Records=0
13/07/06 11:03:29 INFO mapred.JobClient: Combine input records=0
13/07/06 11:03:29 INFO mapred.JobClient: Reduce input records=0
ps:后来我发现原来是我的WordCount.java程序中多写了String[] arg = { "hdfs://localhost:9000/user/administrator/input", "hdfs://localhost:9000/user/administrator/output" };命令行是指定输入输出文件夹的。
运行成功:
administrator@Master:~/hadoop-0.20.2$ bin/hadoop jar WordCount.jar v1/WordCount wordcount_input wordcount_output
v1/WordCount是主类,不要忘记了把包名v1加上。我开始就犯了这个错误,主要是不懂怎么运行jar文件,看来基础的东西还是要牢牢掌握的好啊
13/07/06 11:20:50 INFO mapred.JobClient: Running job: job_201307061012_0003
13/07/06 11:20:51 INFO mapred.JobClient: map 0% reduce 0%
13/07/06 11:21:05 INFO mapred.JobClient: map 50% reduce 0%
13/07/06 11:21:08 INFO mapred.JobClient: map 100% reduce 0%
13/07/06 11:21:14 INFO mapred.JobClient: map 100% reduce 16%
13/07/06 11:21:20 INFO mapred.JobClient: map 100% reduce 100%
13/07/06 11:21:25 INFO mapred.JobClient: Job complete: job_201307061012_0003
13/07/06 11:21:25 INFO mapred.JobClient: Counters: 17
13/07/06 11:21:25 INFO mapred.JobClient: Job Counters
13/07/06 11:21:25 INFO mapred.JobClient: Launched reduce tasks=1
13/07/06 11:21:25 INFO mapred.JobClient: Launched map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient: Data-local map tasks=2
13/07/06 11:21:25 INFO mapred.JobClient: FileSystemCounters
13/07/06 11:21:25 INFO mapred.JobClient: FILE_BYTES_READ=196
13/07/06 11:21:25 INFO mapred.JobClient: HDFS_BYTES_READ=276
13/07/06 11:21:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=462
13/07/06 11:21:25 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=81
13/07/06 11:21:25 INFO mapred.JobClient: Map-Reduce Framework
13/07/06 11:21:25 INFO mapred.JobClient: Reduce input groups=9
13/07/06 11:21:25 INFO mapred.JobClient: Combine output records=15
13/07/06 11:21:25 INFO mapred.JobClient: Map input records=23
13/07/06 11:21:25 INFO mapred.JobClient: Reduce shuffle bytes=80
13/07/06 11:21:25 INFO mapred.JobClient: Reduce output records=9
13/07/06 11:21:25 INFO mapred.JobClient: Spilled Records=30
13/07/06 11:21:25 INFO mapred.JobClient: Map output bytes=442
13/07/06 11:21:25 INFO mapred.JobClient: Combine input records=42
13/07/06 11:21:25 INFO mapred.JobClient: Map output records=42
13/07/06 11:21:25 INFO mapred.JobClient: Reduce input records=15
查看结果:
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount/part-r-00000
cat: File does not exist: wordcount/part-r-00000
administrator@Master:~/hadoop-0.20.2$ bin/hadoop fs -cat wordcount_output/part-r-00000
hadoop1 2
hadoop2 2
hadoop3 1
hadoop5 5
hell 3
hello 18
word 3
world1 4
world2 4
administrator@Master:~/hadoop-0.20.2$