WordCount运行笔记

前期工作:

Hadoop正常开启,将WordCount.Java文件放在Hadoop安装目录下,并在目录下创建输入目录input,目录下有输入文件file1、file2。其中:

file1的内容为:

hello world

file2的内容为:

hello Hadoop

hello mapreduce

准备好之后在命令行输入命令运行。下面对执行的命令进行介绍:

1)在集群上创建输入文件夹:

xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -mkdir input3

2) 上传本地目录input下的几个file文件到集群上的input3目录下:

xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -put input/file* input3

3)编译WordCount.Java程序,将结果放入当前目录的wordcount_classes目录下:

xiaoqian@ubuntu:~/opt/hadoop$ javac -classpath hadoop-0.20.1-core.jar:lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java

4)将编译结果打成jar包:

xiaoqian@ubuntu:~/opt/hadoop$ jar -cvf wordcount.jar -C WordCount

5)  在集群上运行WordCount程序,以input3目录作为输入目录,output3目录作为输出目录:

xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop jar wordcount_classes.jar org.apache.hadoop.examples.WordCount input3 output3
14/04/21 17:56:52 INFO input.FileInputFormat: Total input paths to process : 2
14/04/21 17:56:52 INFO mapred.JobClient: Running job: job_201404211455_0013
14/04/21 17:56:53 INFO mapred.JobClient:  map 0% reduce 0%
14/04/21 17:57:02 INFO mapred.JobClient:  map 100% reduce 0%
14/04/21 17:57:14 INFO mapred.JobClient:  map 100% reduce 100%
14/04/21 17:57:16 INFO mapred.JobClient: Job complete: job_201404211455_0013
14/04/21 17:57:16 INFO mapred.JobClient: Counters: 17
14/04/21 17:57:16 INFO mapred.JobClient:   Job Counters 
14/04/21 17:57:16 INFO mapred.JobClient:     Launched reduce tasks=1
14/04/21 17:57:16 INFO mapred.JobClient:     Launched map tasks=2
14/04/21 17:57:16 INFO mapred.JobClient:     Data-local map tasks=2
14/04/21 17:57:16 INFO mapred.JobClient:   FileSystemCounters
14/04/21 17:57:16 INFO mapred.JobClient:     FILE_BYTES_READ=71
14/04/21 17:57:16 INFO mapred.JobClient:     HDFS_BYTES_READ=41
14/04/21 17:57:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=212
14/04/21 17:57:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=37
14/04/21 17:57:16 INFO mapred.JobClient:   Map-Reduce Framework
14/04/21 17:57:16 INFO mapred.JobClient:     Reduce input groups=0
14/04/21 17:57:16 INFO mapred.JobClient:     Combine output records=5
14/04/21 17:57:16 INFO mapred.JobClient:     Map input records=3
14/04/21 17:57:16 INFO mapred.JobClient:     Reduce shuffle bytes=47
14/04/21 17:57:16 INFO mapred.JobClient:     Reduce output records=0
14/04/21 17:57:16 INFO mapred.JobClient:     Spilled Records=10
14/04/21 17:57:16 INFO mapred.JobClient:     Map output bytes=65
14/04/21 17:57:16 INFO mapred.JobClient:     Combine input records=6
14/04/21 17:57:16 INFO mapred.JobClient:     Map output records=6

14/04/21 17:57:16 INFO mapred.JobClient:     Reduce input records=5

6)查看输出结果:

xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -cat output3/part-r-00000

hadoop 1

hello 3
mapreduce 1

world 1


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值