eclipse运行wordcount参数配置

要想wordcount在hadoop上运行,那么必须为wordcount程序指定输入路径和输出路径。输入路径是我们要进行词频统计的文本文件,在这里我们的文件名是20417.txt。而输出路径是词频统计结果存放的路径。如下图所示,是进行参数配置:WordCount.java->右键->Run As->Run Configuration


上述的路径是HDFS中的路径,HDFS路径可以查看下图:


在图一中我们输入完输入输出路径以后,我们点击Apply,但是这个时候不能点击Run,因为这里的run是指在单机上run,而我们是要在hadoop集群上run,因此我们执行以下步骤:WordCount.java->右键->Run as->Run on hadoop

运行过程中console会提示一些信息,如下所示:

11/10/09 14:07:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
11/10/09 14:07:50 INFO input.FileInputFormat: Total input paths to process : 1
11/10/09 14:07:50 INFO mapred.JobClient: Running job: job_201110091333_0001
11/10/09 14:07:51 INFO mapred.JobClient:  map 0% reduce 0%
11/10/09 14:07:59 INFO mapred.JobClient:  map 100% reduce 0%
11/10/09 14:08:12 INFO mapred.JobClient:  map 100% reduce 100%
11/10/09 14:08:14 INFO mapred.JobClient: Job complete: job_201110091333_0001
11/10/09 14:08:14 INFO mapred.JobClient: Counters: 17
11/10/09 14:08:14 INFO mapred.JobClient:   Job Counters 
11/10/09 14:08:14 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:     Launched map tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:     Data-local map tasks=1
11/10/09 14:08:14 INFO mapred.JobClient:   FileSystemCounters
11/10/09 14:08:14 INFO mapred.JobClient:     FILE_BYTES_READ=143076
11/10/09 14:08:14 INFO mapred.JobClient:     HDFS_BYTES_READ=674762
11/10/09 14:08:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=286184
11/10/09 14:08:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=205265
11/10/09 14:08:14 INFO mapred.JobClient:   Map-Reduce Framework
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce input groups=0
11/10/09 14:08:14 INFO mapred.JobClient:     Combine output records=10015
11/10/09 14:08:14 INFO mapred.JobClient:     Map input records=12761
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce output records=0
11/10/09 14:08:14 INFO mapred.JobClient:     Spilled Records=20030
11/10/09 14:08:14 INFO mapred.JobClient:     Map output bytes=1082004
11/10/09 14:08:14 INFO mapred.JobClient:     Combine input records=112607
11/10/09 14:08:14 INFO mapred.JobClient:     Map output records=112607
11/10/09 14:08:14 INFO mapred.JobClient:     Reduce input records=10015
11/10/09 14:08:14 INFO input.FileInputFormat: Total input paths to process : 1
11/10/09 14:08:14 INFO mapred.JobClient: Running job: job_201110091333_0002
11/10/09 14:08:15 INFO mapred.JobClient:  map 0% reduce 0%
11/10/09 14:08:24 INFO mapred.JobClient:  map 100% reduce 0%
11/10/09 14:08:36 INFO mapred.JobClient:  map 100% reduce 100%
11/10/09 14:08:38 INFO mapred.JobClient: Job complete: job_201110091333_0002
11/10/09 14:08:38 INFO mapred.JobClient: Counters: 17
11/10/09 14:08:38 INFO mapred.JobClient:   Job Counters 
11/10/09 14:08:38 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:     Launched map tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:     Data-local map tasks=1
11/10/09 14:08:38 INFO mapred.JobClient:   FileSystemCounters
11/10/09 14:08:38 INFO mapred.JobClient:     FILE_BYTES_READ=143076
11/10/09 14:08:38 INFO mapred.JobClient:     HDFS_BYTES_READ=205265
11/10/09 14:08:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=286184
11/10/09 14:08:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=104533
11/10/09 14:08:38 INFO mapred.JobClient:   Map-Reduce Framework
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce input groups=0
11/10/09 14:08:38 INFO mapred.JobClient:     Combine output records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Map input records=10015
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce output records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Spilled Records=20030
11/10/09 14:08:38 INFO mapred.JobClient:     Map output bytes=123040
11/10/09 14:08:38 INFO mapred.JobClient:     Combine input records=0
11/10/09 14:08:38 INFO mapred.JobClient:     Map output records=10015
11/10/09 14:08:38 INFO mapred.JobClient:     Reduce input records=10015
在运行完以后,HDFS中会产生词频统计结果,如下图所示:

词频统计结果存放在part-r-00000这个文件中。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值