前期工作:
Hadoop正常开启,将WordCount.Java文件放在Hadoop安装目录下,并在目录下创建输入目录input,目录下有输入文件file1、file2。其中:
file1的内容为:
hello world
file2的内容为:
hello Hadoop
hello mapreduce
准备好之后在命令行输入命令运行。下面对执行的命令进行介绍:
1)在集群上创建输入文件夹:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -mkdir input3
2) 上传本地目录input下的几个file文件到集群上的input3目录下:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -put input/file* input3
3)编译WordCount.Java程序,将结果放入当前目录的wordcount_classes目录下:
xiaoqian@ubuntu:~/opt/hadoop$ javac -classpath hadoop-0.20.1-core.jar:lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
4)将编译结果打成jar包:
xiaoqian@ubuntu:~/opt/hadoop$ jar -cvf wordcount.jar -C WordCount
5) 在集群上运行WordCount程序,以input3目录作为输入目录,output3目录作为输出目录:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop jar wordcount_classes.jar org.apache.hadoop.examples.WordCount input3 output314/04/21 17:56:52 INFO input.FileInputFormat: Total input paths to process : 2
14/04/21 17:56:52 INFO mapred.JobClient: Running job: job_201404211455_0013
14/04/21 17:56:53 INFO mapred.JobClient: map 0% reduce 0%
14/04/21 17:57:02 INFO mapred.JobClient: map 100% reduce 0%
14/04/21 17:57:14 INFO mapred.JobClient: map 100% reduce 100%
14/04/21 17:57:16 INFO mapred.JobClient: Job complete: job_201404211455_0013
14/04/21 17:57:16 INFO mapred.JobClient: Counters: 17
14/04/21 17:57:16 INFO mapred.JobClient: Job Counters
14/04/21 17:57:16 INFO mapred.JobClient: Launched reduce tasks=1
14/04/21 17:57:16 INFO mapred.JobClient: Launched map tasks=2
14/04/21 17:57:16 INFO mapred.JobClient: Data-local map tasks=2
14/04/21 17:57:16 INFO mapred.JobClient: FileSystemCounters
14/04/21 17:57:16 INFO mapred.JobClient: FILE_BYTES_READ=71
14/04/21 17:57:16 INFO mapred.JobClient: HDFS_BYTES_READ=41
14/04/21 17:57:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=212
14/04/21 17:57:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=37
14/04/21 17:57:16 INFO mapred.JobClient: Map-Reduce Framework
14/04/21 17:57:16 INFO mapred.JobClient: Reduce input groups=0
14/04/21 17:57:16 INFO mapred.JobClient: Combine output records=5
14/04/21 17:57:16 INFO mapred.JobClient: Map input records=3
14/04/21 17:57:16 INFO mapred.JobClient: Reduce shuffle bytes=47
14/04/21 17:57:16 INFO mapred.JobClient: Reduce output records=0
14/04/21 17:57:16 INFO mapred.JobClient: Spilled Records=10
14/04/21 17:57:16 INFO mapred.JobClient: Map output bytes=65
14/04/21 17:57:16 INFO mapred.JobClient: Combine input records=6
14/04/21 17:57:16 INFO mapred.JobClient: Map output records=6
14/04/21 17:57:16 INFO mapred.JobClient: Reduce input records=5
6)查看输出结果:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -cat output3/part-r-00000
hadoop 1
hello 3mapreduce 1
world 1