wordcount.java_一步一步编译运行wordcount.java

WordCount是学习Hadoop的经典入门范例。下面通过一步步的操作,来编译、打包、运行WordCount程序。

1、在Hadoop 1.0.4的解压目录的如下位置可以找到WordCount.java的源文件src/examples/org/apache/hadoop/examples/WordCount.java

2、新建一个dev的文件夹,将WordCount.java拷贝至dev/wordcount文件夹下

ubuntu@ubuntu:~/dev/wordcount$ pwd

/home/ubuntu/dev/wordcount

ubuntu@ubuntu:~/dev/wordcount$ ls

bin compile.txt WordCount.java

3、在dev/wordcount文件夹下创建一个bin文件夹,并将编译WordCount.java得到的class文件生成至bin文件夹下javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java

4、将生成的class文件打包成jar包jar -cvf WordCount.jar *.class

5、在bin下新建一个input文件夹,并生成两个输入文件ubuntu@ubuntu:~/dev/wordcount/bin/input$ ls

words-1.txt words-2.txt

ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-1.txt

i am a student!

how are you?

my name is lily.

ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-2.txt

i am a student!

how are you?

she is lily

he is my brother

ubuntu@ubuntu:~/dev/wordcount/bin/input$

6、在hdfs上创建input和output文件夹,并将两个输入文件上传至input文件夹

ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/input

ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/output

ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-1.txt /tmp/input

ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-2.txt /tmp/input

7、运行WordCount程序

ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop jar WordCount.jar WordCount /tmp/input /tmp/output/result

13/01/24 08:09:37 INFO input.FileInputFormat: Total input paths to process : 2

13/01/24 08:09:38 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13/01/24 08:09:38 WARN snappy.LoadSnappy: Snappy native library not loaded

13/01/24 08:09:38 INFO mapred.JobClient: Running job: job_201301240711_0003

13/01/24 08:09:39 INFO mapred.JobClient: map 0% reduce 0%

13/01/24 08:10:13 INFO mapred.JobClient: map 100% reduce 0%

13/01/24 08:10:34 INFO mapred.JobClient: map 100% reduce 100%

13/01/24 08:10:39 INFO mapred.JobClient: Job complete: job_201301240711_0003

13/01/24 08:10:39 INFO mapred.JobClient: Counters: 29

13/01/24 08:10:39 INFO mapred.JobClient: Job Counters

13/01/24 08:10:39 INFO mapred.JobClient: Launched reduce tasks=1

13/01/24 08:10:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=56253

13/01/24 08:10:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0

13/01/24 08:10:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0

13/01/24 08:10:39 INFO mapred.JobClient: Launched map tasks=2

13/01/24 08:10:39 INFO mapred.JobClient: Data-local map tasks=2

13/01/24 08:10:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18108

13/01/24 08:10:39 INFO mapred.JobClient: File Output Format Counters

13/01/24 08:10:39 INFO mapred.JobClient: Bytes Written=96

13/01/24 08:10:39 INFO mapred.JobClient: FileSystemCounters

13/01/24 08:10:39 INFO mapred.JobClient: FILE_BYTES_READ=251

13/01/24 08:10:39 INFO mapred.JobClient: HDFS_BYTES_READ=320

13/01/24 08:10:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65235

13/01/24 08:10:39 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=96

13/01/24 08:10:39 INFO mapred.JobClient: File Input Format Counters

13/01/24 08:10:39 INFO mapred.JobClient: Bytes Read=104

13/01/24 08:10:39 INFO mapred.JobClient: Map-Reduce Framework

13/01/24 08:10:39 INFO mapred.JobClient: Map output materialized bytes=257

13/01/24 08:10:39 INFO mapred.JobClient: Map input records=7

13/01/24 08:10:39 INFO mapred.JobClient: Reduce shuffle bytes=257

13/01/24 08:10:39 INFO mapred.JobClient: Spilled Records=48

13/01/24 08:10:39 INFO mapred.JobClient: Map output bytes=204

13/01/24 08:10:39 INFO mapred.JobClient: CPU time spent (ms)=7650

13/01/24 08:10:39 INFO mapred.JobClient: Total committed heap usage (bytes)=247275520

13/01/24 08:10:39 INFO mapred.JobClient: Combine input records=25

13/01/24 08:10:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=216

13/01/24 08:10:39 INFO mapred.JobClient: Reduce input records=24

13/01/24 08:10:39 INFO mapred.JobClient: Reduce input groups=15

13/01/24 08:10:39 INFO mapred.JobClient: Combine output records=24

13/01/24 08:10:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=301699072

13/01/24 08:10:39 INFO mapred.JobClient: Reduce output records=15

13/01/24 08:10:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1129721856

13/01/24 08:10:39 INFO mapred.JobClient: Map output records=25

ubuntu@ubuntu:~/dev/wordcount/bin$

8、查看运行结果

ubuntu@ubuntu:~$ hadoop fs -ls /tmp/output/result

Found 3 items

-rw-r--r-- 1 ubuntu supergroup 0 2013-01-24 08:10 /tmp/output/result/_SUCCESS

drwxr-xr-x - ubuntu supergroup 0 2013-01-24 08:09 /tmp/output/result/_logs

-rw-r--r-- 1 ubuntu supergroup 96 2013-01-24 08:10 /tmp/output/result/part-r-00000

ubuntu@ubuntu:~$ hadoop fs -cat /tmp/output/result/part-r-00000

a 2

am 2

are 2

brother 1

he 1

how 2

i 2

is 3

lily 1

lily. 1

my 2

name 1

she 1

student! 2

you? 2

ubuntu@ubuntu:~$

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值