1.编写Map/Reduce/Driver类
Map:hadoop.TokenizerMapperReduce:hadoop.IntSumReducer
Driver:hadoop.WordCount
2.export jar
选择项目>右键/export... > Java/JAR file<WordCount-0.1.jar>> 下一步 ,选择相关资源文件和JAR文件名称路径 >下一步 >选择 Main class:hadoop.WordCount >点击 完成
3.准备输入数据
xcloud@xcloud:~/iworkspace/HelloHadoop$ sudo gedit input1.txt
[sudo] password for xcloud:
xcloud@xcloud:~/iworkspace/HelloHadoop$ sudo gedit input2.txt
xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -mkdir /tmp/input
#xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -mkdir /tmp/output #输出目录不用创建
xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -put input1.txt /tmp/input
xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -put input2.txt /tmp/input
其中input1.txt内容:
Hello, i love china
are you ok?
其中input2.txt内容:
hello, i love word
You are ok
4.运行
hadoop jar WordCount-0.1.jar hadoop.WordCount /tmp/input /tmp/output
运行jar命令:hadoop jar jar文件 Main class input-path output-path
xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop jar WordCount-0.1.jar hadoop.WordCount /tmp/input /tmp/output
11/12/31 10:11:43 INFO input.FileInputFormat: Total input paths to process : 2
11/12/31 10:11:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/31 10:11:43 WARN snappy.LoadSnappy: Snappy native library not loaded
11/12/31 10:11:43 INFO mapred.JobClient: Running job: job_201112310845_0002
11/12/31 10:11:44 INFO mapred.JobClient: map 0% reduce 0%
11/12/31 10:11:49 INFO mapred.JobClient: map 100% reduce 0%
11/12/31 10:11:56 INFO mapred.JobClient: map 100% reduce 33%
11/12/31 10:11:57 INFO mapred.JobClient: map 100% reduce 100%
11/12/31 10:11:58 INFO mapred.JobClient: Job complete: job_201112310845_0002
11/12/31 10:11:58 INFO mapred.JobClient: Counters: 22
11/12/31 10:11:58 INFO mapred.JobClient: Job Counters
11/12/31 10:11:58 INFO mapred.JobClient: Launched reduce tasks=1
11/12/31 10:11:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6175
11/12/31 10:11:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/12/31 10:11:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/12/31 10:11:58 INFO mapred.JobClient: Launched map tasks=2
11/12/31 10:11:58 INFO mapred.JobClient: Data-local map tasks=2
11/12/31 10:11:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8146
11/12/31 10:11:58 INFO mapred.JobClient: FileSystemCounters
11/12/31 10:11:58 INFO mapred.JobClient: FILE_BYTES_READ=152
11/12/31 10:11:58 INFO mapred.JobClient: HDFS_BYTES_READ=270
11/12/31 10:11:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=172859
11/12/31 10:11:59 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=73
11/12/31 10:11:59 INFO mapred.JobClient: Map-Reduce Framework
11/12/31 10:11:59 INFO mapred.JobClient: Reduce input groups=11
11/12/31 10:11:59 INFO mapred.JobClient: Combine output records=14
11/12/31 10:11:59 INFO mapred.JobClient: Map input records=4
11/12/31 10:11:59 INFO mapred.JobClient: Reduce shuffle bytes=158
11/12/31 10:11:59 INFO mapred.JobClient: Reduce output records=11
11/12/31 10:11:59 INFO mapred.JobClient: Spilled Records=28
11/12/31 10:11:59 INFO mapred.JobClient: Map output bytes=118
11/12/31 10:11:59 INFO mapred.JobClient: Combine input records=14
11/12/31 10:11:59 INFO mapred.JobClient: Map output records=14
11/12/31 10:11:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=208
11/12/31 10:11:59 INFO mapred.JobClient: Reduce input records=14
xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -ls /tmp/output/
Found 3 items
-rw-r--r-- 1 xcloud supergroup 0 2011-12-31 10:11 /tmp/output/_SUCCESS
drwxr-xr-x - xcloud supergroup 0 2011-12-31 10:11 /tmp/output/_logs
-rw-r--r-- 1 xcloud supergroup 73 2011-12-31 10:11 /tmp/output/part-r-00000
5.查看结果
hadoop fs -cat /tmp/output/part-r-00000xcloud@xcloud:~/iworkspace/HelloHadoop$ hadoop fs -cat /tmp/output/part-r-00000 Hello,1
You 1
are 2
china 1
hello, 1
i 2
love 2
ok 1
ok? 1
word 1
you 1
参考:http://trac.nchc.org.tw/cloud/wiki/waue/2009/0617