第一个hadoop程序:WordCount

在windows8.1+eclipse编写hadoop程序,并尝试运行,步骤如下:
1.在Eclipse开发环境中创建JAVA工程

双击桌面上的Eclipse的快捷方式。首先选择菜单“File -> New -> Java Project”。再在“Project name:”文本框中输入工程名“WordCount”。最后点击“Next”按钮。

2. 添加编译依赖类库

首先点击选项卡“Libraries”。再点击“Add External JARs…”按钮添加编译依赖类库,分别添加目录“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/mapreduce1”、“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/mapreduce1/lib”和“D:/hadoop-2.6.0-cdh5.4.2/share/hadoop/common”下的所有JAR文件。最后点击“Finish”按钮。

3. 在工程中添加类

首先右击工程WordCount,选择菜单“New -> Class”。再在“Package:”文本框中输入包名“lab2.module11”,在“Name:”文本框中输入类名“WordCount”。最后点击“Finish”按钮。

4. 编写词频统计代码
package hadoop;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
//输入键类型、输入值类型、输出键类型和输出值类型,在本例中分别为普通对象(Object)、字符串(Text)、字符串和整数型(IntWritable)
       public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
             private final static IntWritable one = new IntWritable(1);
             private Text word = new Text();
             //输入键key、输入值value和环境变量context。输入键默认为行号,输入值为每一行的文本字符串
             public void map(Object key , Text value , Context context ) throws IOException, InterruptedException {
                  StringTokenizer itr = new StringTokenizer( value .toString());
                   while ( itr .hasMoreTokens()) {
                         word .set( itr .nextToken()); //设置键的值
                         context .write( word , one ); //函数context.write()输出键为单词值为\(1\)的键值对
                  }
            }
      }
       //输入键类型、输入值类型、输出键类型和输出值类型,在本例中分别为字符串、整数型、字符串和整数型
       public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
             private IntWritable result = new IntWritable();
             //输入键key、输入值列表values和环境变量context
             public void reduce(Text key , Iterable<IntWritable> values , Context context ) throws IOException, InterruptedException {
                   int sum = 0;
                   for (IntWritable val : values ) {
                         sum += val .get();
                  }
                   result .set( sum );
                   context .write( key , result ); //输出键为单词值为单词出现次数的键值对
            }
      }
       public static void main(String[] args ) throws Exception {
            Configuration conf = new Configuration();  //初始化hadoop集群组件配置
            String[] otherArgs = new GenericOptionsParser( conf , args ).getRemainingArgs(); //解析命令行参数
             if ( otherArgs . length < 2) {
                  System. err .println( "Usage: wordcount <in> [<in>...] <out>" );
                  System.exit(2);
            }
             @SuppressWarnings ( "deprecation" )
            Job job = new Job( conf , "word count" ); //Job命名为"word count"
             job .setJarByClass(WordCount. class );    //设置主类
             job .setMapperClass(TokenizerMapper. class ); //设置mapper
             job .setReducerClass(IntSumReducer. class ); //设置reducer类
             job .setOutputKeyClass(Text. class ); //设置键输出类
             job .setOutputValueClass(IntWritable. class ); //设置值输出类
             for ( int i = 0; i < otherArgs . length - 1; ++ i ) {
                  FileInputFormat.addInputPath( job , new Path( otherArgs [ i ])); //设置作业输入路径
            }
            FileOutputFormat.setOutputPath( job , new Path( otherArgs [ otherArgs . length - 1])); //设置作业输出路径
            System.exit( job .waitForCompletion( true ) ? 0 : 1); //调用函数System.exit()等待作业退出,并给出退出状态。
      }
}

5. 准备输入数据
在D盘下新建“input”文件夹,并将文件“D:/LICENSE.txt”复制到路径“D:/input/”下。

6. 配置运行参数
首先选择菜单“Run -> Run Configurations…”。再选择左侧的“Java Application”,并点击左上角的新建按钮。然后点击选项卡“Arguments”,并在文本框“Program arguments:”中输入作业的输入和输出路径 D:\input D:\output 。然后点击选项卡“Environments”,点击右侧按钮“New”,在弹出的对话框中文本框“Name:”中输入 HADOOP_HOME ,文本框“Value:”中输入 D:\hadoop-2.6.0 ;再一次点击右侧按钮“New”,在弹出的对话框中文本框“Name:”中输入 PATH ,文本框“Value:”中输入 %PATH%;D:\hadoop-2.6.0\bin



8. 在Eclipse开发环境中测试程序
点击按钮“Run”,即可以在在Eclipse开发环境中运行程序,得到如下控制台输出。


运行结果:
2017-03-30 17:27:52,485 INFO  jvm.JvmMetrics ( JvmMetrics.java:init(76) ) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-03-30 17:27:55,388 WARN  mapreduce.JobResourceUploader ( JobResourceUploader.java:uploadFiles(171) ) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2017-03-30 17:27:55,424 INFO  input.FileInputFormat ( FileInputFormat.java:listStatus(283) ) - Total input paths to process : 1
2017-03-30 17:27:55,606 INFO  mapreduce.JobSubmitter ( JobSubmitter.java:submitJobInternal(198) ) - number of splits:1
2017-03-30 17:27:55,987 INFO  mapreduce.JobSubmitter ( JobSubmitter.java:printTokens(287) ) - Submitting tokens for job: job_local1924825305_0001
2017-03-30 17:27:57,190 INFO  mapreduce.Job ( Job.java:submit(1294) ) - The url to track the job: http://localhost:8080/
2017-03-30 17:27:57,192 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1339) ) - Running job: job_local1924825305_0001
2017-03-30 17:27:57,197 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:createOutputCommitter(471) ) - OutputCommitter set in config null
2017-03-30 17:27:57,214 INFO  output.FileOutputCommitter ( FileOutputCommitter.java:<init>(100) ) - File Output Committer Algorithm version is 1
2017-03-30 17:27:57,221 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:createOutputCommitter(489) ) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2017-03-30 17:27:57,514 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:runTasks(448) ) - Waiting for map tasks
2017-03-30 17:27:57,518 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:run(224) ) - Starting task: attempt_local1924825305_0001_m_000000_0
2017-03-30 17:27:57,762 INFO  output.FileOutputCommitter ( FileOutputCommitter.java:<init>(100) ) - File Output Committer Algorithm version is 1
2017-03-30 17:27:57,813 INFO  util.ProcfsBasedProcessTree ( ProcfsBasedProcessTree.java:isAvailable(192) ) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-30 17:27:58,200 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1360) ) - Job job_local1924825305_0001 running in uber mode : false
2017-03-30 17:27:58,342 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1367) ) -  map 0% reduce 0%
2017-03-30 17:27:58,956 INFO  mapred.Task ( Task.java:initialize(612) ) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@4d9fcb2
2017-03-30 17:27:58,990 INFO  mapred.MapTask ( MapTask.java:runNewMapper(756) ) - Processing split: file:/D:/input.txt:0+17087
2017-03-30 17:27:59,350 INFO  mapred.MapTask ( MapTask.java:setEquator(1205) ) - (EQUATOR) 0 kvi 26214396(104857584)
2017-03-30 17:27:59,350 INFO  mapred.MapTask ( MapTask.java:init(998) ) - mapreduce.task.io.sort.mb: 100
2017-03-30 17:27:59,350 INFO  mapred.MapTask ( MapTask.java:init(999) ) - soft limit at 83886080
2017-03-30 17:27:59,351 INFO  mapred.MapTask ( MapTask.java:init(1000) ) - bufstart = 0; bufvoid = 104857600
2017-03-30 17:27:59,351 INFO  mapred.MapTask ( MapTask.java:init(1001) ) - kvstart = 26214396; length = 6553600
2017-03-30 17:27:59,378 INFO  mapred.MapTask ( MapTask.java:createSortingCollector(403) ) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-03-30 17:27:59,626 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) -
2017-03-30 17:27:59,628 INFO  mapred.MapTask ( MapTask.java:flush(1460) ) - Starting flush of map output
2017-03-30 17:27:59,628 INFO  mapred.MapTask ( MapTask.java:flush(1482) ) - Spilling map output
2017-03-30 17:27:59,629 INFO  mapred.MapTask ( MapTask.java:flush(1483) ) - bufstart = 0; bufend = 25275; bufvoid = 104857600
2017-03-30 17:27:59,629 INFO  mapred.MapTask ( MapTask.java:flush(1485) ) - kvstart = 26214396(104857584); kvend = 26204872(104819488); length = 9525/6553600
2017-03-30 17:27:59,843 INFO  mapred.MapTask ( MapTask.java:sortAndSpill(1667) ) - Finished spill 0
2017-03-30 17:27:59,928 INFO  mapred.Task ( Task.java:done(1038) ) - Task:attempt_local1924825305_0001_m_000000_0 is done. And is in the process of committing
2017-03-30 17:27:59,968 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) - map
2017-03-30 17:27:59,970 INFO  mapred.Task ( Task.java:sendDone(1158) ) - Task 'attempt_local1924825305_0001_m_000000_0' done.
2017-03-30 17:27:59,970 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:run(249) ) - Finishing task: attempt_local1924825305_0001_m_000000_0
2017-03-30 17:27:59,971 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:runTasks(456) ) - map task executor complete.
2017-03-30 17:27:59,978 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:runTasks(448) ) - Waiting for reduce tasks
2017-03-30 17:27:59,978 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:run(302) ) - Starting task: attempt_local1924825305_0001_r_000000_0
2017-03-30 17:28:00,008 INFO  output.FileOutputCommitter ( FileOutputCommitter.java:<init>(100) ) - File Output Committer Algorithm version is 1
2017-03-30 17:28:00,014 INFO  util.ProcfsBasedProcessTree ( ProcfsBasedProcessTree.java:isAvailable(192) ) - ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-30 17:28:00,419 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1367) ) -  map 100% reduce 0%
2017-03-30 17:28:00,521 INFO  mapred.Task ( Task.java:initialize(612) ) -  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@82ac87a
2017-03-30 17:28:00,568 INFO  mapred.ReduceTask ( ReduceTask.java:run(362) ) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@7bb8724c
2017-03-30 17:28:00,624 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:<init>(197) ) - MergerManager: memoryLimit=1325976704, maxSingleShuffleLimit=331494176, mergeThreshold=875144640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2017-03-30 17:28:00,632 INFO  reduce.EventFetcher ( EventFetcher.java:run(61) ) - attempt_local1924825305_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2017-03-30 17:28:00,838 INFO  reduce.LocalFetcher ( LocalFetcher.java:copyMapOutput(144) ) - localfetcher#1 about to shuffle output of map attempt_local1924825305_0001_m_000000_0 decomp: 30041 len: 30045 to MEMORY
2017-03-30 17:28:00,889 INFO  reduce.InMemoryMapOutput ( InMemoryMapOutput.java:shuffle(100) ) - Read 30041 bytes from map-output for attempt_local1924825305_0001_m_000000_0
2017-03-30 17:28:00,894 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:closeInMemoryFile(315) ) - closeInMemoryFile -> map-output of size: 30041, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->30041
2017-03-30 17:28:00,899 INFO  reduce.EventFetcher ( EventFetcher.java:run(76) ) - EventFetcher is interrupted.. Returning
2017-03-30 17:28:00,901 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) - 1 / 1 copied.
2017-03-30 17:28:00,902 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:finalMerge(687) ) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2017-03-30 17:28:00,987 INFO  mapred.Merger ( Merger.java:merge(606) ) - Merging 1 sorted segments
2017-03-30 17:28:00,988 INFO  mapred.Merger ( Merger.java:merge(705) ) - Down to the last merge-pass, with 1 segments left of total size: 30035 bytes
2017-03-30 17:28:01,037 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:finalMerge(754) ) - Merged 1 segments, 30041 bytes to disk to satisfy reduce memory limit
2017-03-30 17:28:01,043 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:finalMerge(784) ) - Merging 1 files, 30045 bytes from disk
2017-03-30 17:28:01,045 INFO  reduce.MergeManagerImpl ( MergeManagerImpl.java:finalMerge(799) ) - Merging 0 segments, 0 bytes from memory into reduce
2017-03-30 17:28:01,046 INFO  mapred.Merger ( Merger.java:merge(606) ) - Merging 1 sorted segments
2017-03-30 17:28:01,057 INFO  mapred.Merger ( Merger.java:merge(705) ) - Down to the last merge-pass, with 1 segments left of total size: 30035 bytes
2017-03-30 17:28:01,059 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) - 1 / 1 copied.
2017-03-30 17:28:01,086 INFO  Configuration.deprecation ( Configuration.java:warnOnceIfDeprecated(1173) ) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-03-30 17:28:01,378 INFO  mapred.Task ( Task.java:done(1038) ) - Task:attempt_local1924825305_0001_r_000000_0 is done. And is in the process of committing
2017-03-30 17:28:01,389 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) - 1 / 1 copied.
2017-03-30 17:28:01,390 INFO  mapred.Task ( Task.java:commit(1199) ) - Task attempt_local1924825305_0001_r_000000_0 is allowed to commit now
2017-03-30 17:28:01,410 INFO  output.FileOutputCommitter ( FileOutputCommitter.java:commitTask(482) ) - Saved output of task 'attempt_local1924825305_0001_r_000000_0' to file:/D:/output/_temporary/0/task_local1924825305_0001_r_000000
2017-03-30 17:28:01,412 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:statusUpdate(591) ) - reduce > reduce
2017-03-30 17:28:01,412 INFO  mapred.Task ( Task.java:sendDone(1158) ) - Task 'attempt_local1924825305_0001_r_000000_0' done.
2017-03-30 17:28:01,413 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:run(325) ) - Finishing task: attempt_local1924825305_0001_r_000000_0
2017-03-30 17:28:01,414 INFO  mapred.LocalJobRunner ( LocalJobRunner.java:runTasks(456) ) - reduce task executor complete.
2017-03-30 17:28:01,421 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1367) ) -  map 100% reduce 100%
2017-03-30 17:28:02,423 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1378) ) - Job job_local1924825305_0001 completed successfully
2017-03-30 17:28:02,462 INFO  mapreduce.Job ( Job.java:monitorAndPrintJob(1385) ) - Counters: 30
      File System Counters
            FILE: Number of bytes read=94574
            FILE: Number of bytes written=542839
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
      Map-Reduce Framework
            Map input records=322
            Map output records=2382
            Map output bytes=25275
            Map output materialized bytes=30045
            Input split bytes=83
            Combine input records=0
            Combine output records=0
            Reduce input groups=760
            Reduce shuffle bytes=30045
            Reduce input records=2382
            Reduce output records=760
            Spilled Records=4764
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=0
            Total committed heap usage (bytes)=468713472
      Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
      File Input Format Counters
            Bytes Read=17087
      File Output Format Counters
            Bytes Written=8290


8. 查看测试输出结果

在Windows资源管理器中找到文件“D:/output/part-r-00000”,并将其拖拽到Eclipse中查看输出结果。

9. 将程序打包成Jar文件
首先右击工程 WordCount ,选择菜单“Export”。再选择菜单“Java -> JAR file”,点击按钮“Next”。 然后在文本框“JAR file:”右侧点击按钮“Browse…”,选择桌面路径,并将文件命名为“wordcount.jar”。最后点击按钮“Finish”。

10. 上传程序Jar文件到集群
11.提交MapReduce程序
调用命令 yarn jar提交MapReduce程序。需要注意的是,必须保证输出路径不存在,不然作业会执行失败。可以调用命令 hdfs dfs -rm -r删除HDFS上的文件夹。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值