linux统计单词程序,运行Hadoop自带的wordcount单词统计程序

0.前言

前面一篇《Hadoop初体验:快速搭建Hadoop伪分布式环境》搭建了一个Hadoop的环境,现在就使用Hadoop自带的wordcount程序来做单词统计的案例。

1.使用示例程序实现单词统计

(1)wordcount程序

wordcount程序在hadoop的share目录下,如下:

[root@linuxidc mapreduce]# pwd

/usr/local/hadoop/share/hadoop/mapreduce

[root@linuxidc mapreduce]# ls

hadoop-mapreduce-client-app-2.6.5.jar        hadoop-mapreduce-client-jobclient-2.6.5-tests.jar

hadoop-mapreduce-client-common-2.6.5.jar      hadoop-mapreduce-client-shuffle-2.6.5.jar

hadoop-mapreduce-client-core-2.6.5.jar        hadoop-mapreduce-examples-2.6.5.jar

hadoop-mapreduce-client-hs-2.6.5.jar          lib

hadoop-mapreduce-client-hs-plugins-2.6.5.jar  lib-examples

hadoop-mapreduce-client-jobclient-2.6.5.jar  sources

就是这个hadoop-mapreduce-examples-2.6.5.jar程序。

(2)创建HDFS数据目录

创建一个目录,用于保存MapReduce任务的输入文件:

[root@linuxidc ~]# hadoop fs -mkdir -p /data/wordcount

创建一个目录,用于保存MapReduce任务的输出文件:

[root@linuxidc ~]# hadoop fs -mkdir /output

查看刚刚创建的两个目录:

[root@linuxidc ~]# hadoop fs -ls /

drwxr-xr-x  - root supergroup          0 2017-09-01 20:34 /data

drwxr-xr-x  - root supergroup          0 2017-09-01 20:35 /output

(3)创建一个单词文件,并上传到HDFS

创建的单词文件如下:

[root@linuxidc ~]# cat myword.txt

linuxidc yyh

yyh xplinuxidc

katy ling

yeyonghao linuxidc

xpleaf katy

上传该文件到HDFS中:

[root@linuxidc ~]# hadoop fs -put myword.txt /data/wordcount

在HDFS中查看刚刚上传的文件及内容:

[root@linuxidc ~]# hadoop fs -ls /data/wordcount

-rw-r--r--  1 root supergroup        57 2017-09-01 20:40 /data/wordcount/myword.txt

[root@linuxidc ~]# hadoop fs -cat /data/wordcount/myword.txt

linuxidc yyh

yyh xplinuxidc

katy ling

yeyonghao linuxidc

xpleaf katy

(4)运行wordcount程序

执行如下命令:

[root@linuxidc ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount

...

17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully

17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38

File System Counters

FILE: Number of bytes read=585940

FILE: Number of bytes written=1099502

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=114

HDFS: Number of bytes written=48

HDFS: Number of read operations=15

HDFS: Number of large read operations=0

HDFS: Number of write operations=4

Map-Reduce Framework

Map input records=5

Map output records=10

Map output bytes=97

Map output materialized bytes=78

Input split bytes=112

Combine input records=10

Combine output records=6

Reduce input groups=6

Reduce shuffle bytes=78

Reduce input records=6

Reduce output records=6

Spilled Records=12

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=92

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=241049600

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=57

File Output Format Counters

Bytes Written=48

(5)查看统计结果

如下:

[root@linuxidc ~]# hadoop fs -cat /output/wordcount/part-r-00000

katy    2

linuxidc    2

ling    1

xplinuxidc  2

yeyonghao      1

yyh    2

0b1331709591d260c1c78e86d0c51c18.png

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值