一个Mapreduce 实例
下载下面的电子书 要 us-ascii 编码的
The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson
The Notebooks of Leonardo Da Vinci
Ulysses by James Joyce
把上面的文件下载后放到一个文件夹下 比如我的 /data/gutenberg
启动hadoop Cluster
/bin/start-all.sh
把文件拷贝到HDFS文件系统中
root@pxiaohai-laptop:/data/hadoop# bin/hadoop dfs -copyFromLocal /data/gutenberg gutenberg
运行这个实例吧,这个是HADOOP自带一个例子wordcount
root@pxiaohai-laptop:/data/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
输出如下
root@pxiaohai-laptop:/data/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
10/06/27 12:25:51 INFO input.FileInputFormat: Total input paths to process : 3
10/06/27 12:25:52 INFO mapred.JobClient: Running job: job_201006271159_0001
10/06/27 12:25:53 INFO mapred.JobClient: map 0% reduce 0%
10/06/27 12:26:08 INFO mapred.JobClient: map 66% reduce 0%
10/06/27 12:26:14 INFO mapred.JobClient: map 100% reduce 0%
10/06/27 12:26:17 INFO mapred.JobClient: map 100% reduce 33%
10/06/27 12:26:23 INFO mapred.JobClient: map 100% reduce 100%
10/06/27 12:26:25 INFO mapred.JobClient: Job complete: job_201006271159_0001
10/06/27 12:26:25 INFO mapred.JobClient: Counters: 17
10/06/27 12:26:25 INFO mapred.JobClient: Job Counters
10/06/27 12:26:25 INFO mapred.JobClient: Launched reduce tasks=1
10/06/27 12:26:25 INFO mapred.JobClient: Launched map tasks=3
10/06/27 12:26:25 INFO mapred.JobClient: Data-local map tasks=3
10/06/27 12:26:25 INFO mapred.JobClient: FileSystemCounters
10/06/27 12:26:25 INFO mapred.JobClient: FILE_BYTES_READ=2063428
10/06/27 12:26:25 INFO mapred.JobClient: HDFS_BYTES_READ=3188585
10/06/27 12:26:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3386722
10/06/27 12:26:25 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=815480
10/06/27 12:26:25 INFO mapred.JobClient: Map-Reduce Framework
10/06/27 12:26:25 INFO mapred.JobClient: Reduce input groups=76948
10/06/27 12:26:25 INFO mapred.JobClient: Combine output records=92288
10/06/27 12:26:25 INFO mapred.JobClient: Map input records=69684
10/06/27 12:26:25 INFO mapred.JobClient: Reduce shuffle bytes=1323198
10/06/27 12:26:25 INFO mapred.JobClient: Reduce output records=76948
10/06/27 12:26:25 INFO mapred.JobClient: Spilled Records=235878
10/06/27 12:26:25 INFO mapred.JobClient: Map output bytes=5337259
10/06/27 12:26:25 INFO mapred.JobClient: Combine input records=554458
10/06/27 12:26:25 INFO mapred.JobClient: Map output records=554458
10/06/27 12:26:25 INFO mapred.JobClient: Reduce input records=92288