6.2 解读WordCount

最新推荐文章于 2020-02-24 22:29:27 发布

SpencerHuangQY

最新推荐文章于 2020-02-24 22:29:27 发布

阅读量185

点赞数

分类专栏：大数据文章标签：大数据

大数据专栏收录该内容

6 篇文章 0 订阅

订阅专栏

第6章 MapReduce入门

6.2 解读WordCount

WordCount程序就是MapReduce的HelloWord程序。通过对WordCount程序分析，我们可以了解MapReduce程序的基本结构和执行过程。

6.2.1 WordCount设计思路

WordCount程序很好的体现了MapReduce编程思想。
一般来说，本文作为MapReduce的输入，MapReduce会将文本进行切分处理并将行号作为输入键值对的键，文本内容作为键值对的值，经map方法处理后，输出中间结果为<word,1>形式。MapReduce会默认按键值分发给reduce方法，在完成计数并输出最后结果<word,count>

这里写图片描述

6.2.2 MapReduce运行方式

MapReduce运行方式分为本地运行和服务端运行两种。
本地运行多指本地Windows环境，方便开发调试。
而服务端运行，多用于实际生产环境。

6.2.3 编写代码

（1）创建Java 项目

这里写图片描述

（2）修改Hadoop源码
注意，在Windows本地运行MapReduce程序时，需要修改Hadoop源码。如果在Linux服务器运行，则不需要修改Hadoop源码。

修改Hadoop源码，其实就是简单修改一下Hadoop的NativeIO类的源码

下载对应hadoop源代码，hadoop-2.7.3-src.tar.gz解压，hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 复制到对应的Eclipse的project.
修改代码


     
     
       
       
        
        
       
       
       
       
        
            
         
         public
         
          
         
         static
         
          
         
         boolean
         
          
         
         access
         
         (String path, AccessRight desiredAccess)
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         throws
         
          IOException {
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         return 
         
         true;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //return access0(path, desiredAccess.accessRight());
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       
     
     1
2
3
4
5

如果不修改NativeIO类的源码，在Windows本地运行MapReduce程序会产生异常：


     
     
       
       
        
        
       
       
       
       
        
        
         
         log4j:WARN No appenders could be found 
         
         for logger (org
         
         .apache
         
         .hadoop
         
         .metrics2
         
         .lib
         
         .MutableMetricsFactory).
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         log4j:WARN Please initialize the log4j system properly.
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         log4j:WARN See 
         
         http:/
         
         /logging
         
         .apache
         
         .org
         
         /log4j/
         
         1.2/faq
         
         .html
         
         #noconfig for more info.
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Exception 
         
         in thread 
         
         "main" java
         
         .lang
         
         .UnsatisfiedLinkError
         
         : org
         
         .apache
         
         .hadoop
         
         .io
         
         .nativeio
         
         .NativeIO$Windows
         
         .access
         
         0(Ljava/lang/String
         
         ;I)Z
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .io
         
         .nativeio
         
         .NativeIO$Windows
         
         .access
         
         0(Native Method)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .io
         
         .nativeio
         
         .NativeIO$Windows
         
         .access(NativeIO
         
         .java
         
         :
         
         609)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .FileUtil
         
         .canRead(FileUtil
         
         .java
         
         :
         
         977)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .util
         
         .DiskChecker
         
         .checkAccessByFileMethods(DiskChecker
         
         .java
         
         :
         
         187)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .util
         
         .DiskChecker
         
         .checkDirAccess(DiskChecker
         
         .java
         
         :
         
         174)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .util
         
         .DiskChecker
         
         .checkDir(DiskChecker
         
         .java
         
         :
         
         108)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .LocalDirAllocator$AllocatorPerContext
         
         .confChanged(LocalDirAllocator
         
         .java
         
         :
         
         285)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .LocalDirAllocator$AllocatorPerContext
         
         .getLocalPathForWrite(LocalDirAllocator
         
         .java
         
         :
         
         344)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .LocalDirAllocator
         
         .getLocalPathForWrite(LocalDirAllocator
         
         .java
         
         :
         
         150)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .LocalDirAllocator
         
         .getLocalPathForWrite(LocalDirAllocator
         
         .java
         
         :
         
         131)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .fs
         
         .LocalDirAllocator
         
         .getLocalPathForWrite(LocalDirAllocator
         
         .java
         
         :
         
         115)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapred
         
         .LocalDistributedCacheManager
         
         .setup(LocalDistributedCacheManager
         
         .java
         
         :
         
         125)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapred
         
         .LocalJobRunner$Job.<init>(LocalJobRunner
         
         .java
         
         :
         
         163)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapred
         
         .LocalJobRunner
         
         .submitJob(LocalJobRunner
         
         .java
         
         :
         
         731)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .JobSubmitter
         
         .submitJobInternal(JobSubmitter
         
         .java
         
         :
         
         240)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job$10
         
         .run(Job
         
         .java
         
         :
         
         1290)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job$10
         
         .run(Job
         
         .java
         
         :
         
         1287)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at java
         
         .security
         
         .AccessController
         
         .doPrivileged(Native Method)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at javax
         
         .security
         
         .auth
         
         .Subject
         
         .doAs(Unknown Source)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .security
         
         .UserGroupInformation
         
         .doAs(UserGroupInformation
         
         .java
         
         :
         
         1698)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job
         
         .submit(Job
         
         .java
         
         :
         
         1287)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job
         
         .waitForCompletion(Job
         
         .java
         
         :
         
         1308)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at cn
         
         .hadron
         
         .mr
         
         .RunJob
         
         .main(RunJob
         
         .java
         
         :
         
         33)
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

（3）定义Mapper类


     
     
       
       
        
        
       
       
       
       
        
        
         
         package cn.hadron.mr;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import java.io.IOException;
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.IntWritable;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.LongWritable;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.Text;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.Mapper;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.util.StringUtils;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         //4个泛型参数：前两个表示map的输入键值对的key和value的类型，后两个表示输出键值对的key和value的类型
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         public 
         
         class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         //该方法循环调用，从文件的split中读取每行调用一次，把该行所在的下标为key，该行的内容为value
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         protected
         
          void map(LongWritable key, Text value,Context context)
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         throws
         
          IOException, InterruptedException {
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 String[] words = StringUtils.split(value.toString(), 
         
         ' ');
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         for(String w :words){
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     context.write(
         
         new Text(w), 
         
         new IntWritable(
         
         1));
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         }
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

代码说明：

Mapper类用于读取数据输入并执行map方法，编写Mapper类需要继承org.apache.hadoop.mapreduce.Mapper类，并且根据相应问题实现map方法。
Mapper类的4个泛型参数：前两个表示map的输入键值对的key和value的类型，后两个表示输出键值对的key和value的类型
MapReduce计算框架会将键值对作为参数传递给map方法。该方法有3个参数，第1个是Object类型（一般使用LongWritable类型）参数，代表行号，第2个是Object类型参数（一般使用Text类型），代表该行内容，第3个Context参数，代表上下文。
Context类全名是org.apache.hadoop.mapreduce.Mapper.Context，也就是说Context类是Mapper类的静态内容类，在Mapper类中可以直接使用Context类。
在map方法中使用StringUtils的split方法，按空格将输入行内容分割成单词，然后通过Context类的write方法将其作为中间结果输出。

（4）定义Reducer类


     
     
       
       
        
        
       
       
       
       
        
        
         
         package cn.hadron.mr;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import java.io.IOException;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.IntWritable;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.Text;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.Reducer;
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         public 
         
         class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         /**
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
              * Map过程输出<key,values>中key为单个单词，而values是对应单词的计数值所组成的列表，Map的输出就是Reduce的输入，
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
              * 每组调用一次，这一组数据特点：key相同，value可能有多个。
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
              * /所以reduce方法只要遍历values并求和，即可得到某个单词的总次数。
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
              */
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         protected
         
          void reduce(Text key, Iterable<IntWritable> values,Context context)
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         throws
         
          IOException, InterruptedException {
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         int sum =
         
         0;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         for(IntWritable i: values){
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     sum=sum+i.get();
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 context.write(key, 
         
         new IntWritable(sum));
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         }
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

代码说明：

Reducer类用于接收Mapper输出的中间结果作为Reducer类的输入，并执行reduce方法。
Reducer类的4个泛型参数：前2个代表reduce方法输入的键值对类型（对应map输出类型），后2个代表reduce方法输出键值对的类型
reduce方法参数：key是单个单词，values是对应单词的计数值所组成的列表，Context类型是org.apache.hadoop.mapreduce.Reducer.Context，是Reducer的上下文。

（6）定义主方法（主类）


     
     
       
       
        
        
       
       
       
       
        
        
         
         package cn
         
         .hadron
         
         .mr
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .conf
         
         .Configuration
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .fs
         
         .FileSystem
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .fs
         
         .Path
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .io
         
         .IntWritable
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .io
         
         .Text
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .lib
         
         .input
         
         .FileInputFormat
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .lib
         
         .output
         
         .FileOutputFormat
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         public 
         
         class RunJob {
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         public 
         
         static void main(
         
         String[] args) {
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //设置环境变量HADOOP_USER_NAME，其值是root
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         System
         
         .setProperty(
         
         "HADOOP_USER_NAME", 
         
         "root")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //Configuration类包含了Hadoop的配置
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Configuration config =new 
         
         Configuration()
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //设置fs
         
         .defaultFS
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 config
         
         .set(
         
         "fs.defaultFS", 
         
         "hdfs://192.168.80.131:9000")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //设置yarn
         
         .resourcemanager
         
         节点
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 config
         
         .set(
         
         "yarn.resourcemanager.hostname", 
         
         "node1")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         try {
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         FileSystem fs =
         
         FileSystem
         
         .get(config)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         Job job =
         
         Job
         
         .getInstance(config)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setJarByClass(
         
         RunJob
         
         .class)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setJobName(
         
         "wc")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //设置Mapper类
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setMapperClass(
         
         WordCountMapper
         
         .class)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //设置Reduce类
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setReducerClass(
         
         WordCountReducer
         
         .class)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //设置reduce方法输出key的类型
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setOutputKeyClass(
         
         Text
         
         .class)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //设置reduce方法输出value的类型
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     job
         
         .setOutputValueClass(
         
         IntWritable
         
         .class)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //指定输入路径
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         FileInputFormat
         
         .addInputPath(job, new 
         
         Path(
         
         "/user/root/input/"))
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //指定输出路径（会自动创建）
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         Path outpath =new 
         
         Path(
         
         "/user/root/output/")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //输出路径是MapReduce自动创建的，如果存在则需要先删除
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         if(fs
         
         .exists(outpath)){
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                         fs
         
         .delete(outpath, 
         
         true)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     }
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         FileOutputFormat
         
         .setOutputPath(job, outpath)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //提交任务，等待执行完成
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     boolean f= job
         
         .waitForCompletion(
         
         true)
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         if(f){
        
        
       
       

       
       
        
        
       
       
       
       
        
                        
         
         System
         
         .out
         
         .println(
         
         "job任务执行成功")
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 } 
         
         catch (
         
         Exception e) {
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     e
         
         .printStackTrace()
         
         ;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         }
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

（6）本地运行

执行结果：
这里写图片描述


     
     
       
       
        
        
       
       
       
       
        
        
         
         [root
         
         @node1 ~]
         
         # hdfs dfs -ls /user/root/output
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Found 
         
         2 items
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         -rw-r--r--   
         
         3 root supergroup          
         
         0 
         
         2017-
         
         05-
         
         28 09
         
         :
         
         01 /user/root/output/_SUCCESS
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         -rw-r--r--   
         
         3 root supergroup         
         
         46 
         
         2017-
         
         05-
         
         28 09
         
         :
         
         01 /user/root/output/part-r-
         
         00000
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         [root
         
         @node1 ~]
         
         # hdfs dfs -cat /user/root/output/part-r-00000
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hadoop  
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hello   
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hi      
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Java    
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         World   
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         world   
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         [root
         
         @node1 ~]
         
         #
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12

6.2.4 服务端运行

（1）修改源码

上面代码中的主方法是根据本地运行设计的，如果要在服务器端运行，可以适当简化。
参照官方源码
http://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

将Mapper类和Reducer类写成主类的静态内部类


     
     
       
       
        
        
       
       
       
       
        
        
         
         package cn.hadron.mr;
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import java.io.IOException;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import java.util.StringTokenizer;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.conf.Configuration;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.fs.Path;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.IntWritable;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.io.Text;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.Job;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.Mapper;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.Reducer;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         public 
         
         class WordCount {
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         //4种形式的参数，分别用来指定map的输入key值类型、输入value值类型、输出key值类型和输出value值类型
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         public 
         
         static 
         
         class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         private 
         
         final 
         
         static 
         
         IntWritable one = 
         
         new 
         
         IntWritable(
         
         1);
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         private 
         
         Text word = 
         
         new 
         
         Text();
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //map方法中value值存储的是文本文件中的一行（以回车符为行结束标记），而key值为该行的首字母相对于文本文件的首地址的偏移量
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         public void 
         
         map(
         
         Object key, 
         
         Text value, 
         
         Context context) 
         
         throws 
         
         IOException, 
         
         InterruptedException {
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         StringTokenizer itr = 
         
         new 
         
         StringTokenizer(value.
         
         toString());
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         //StringTokenizer类将每一行拆分成为一个个的单词，并将<word,1>作为map方法的结果输出
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         while (itr.hasMoreTokens()) {
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                         word.
         
         set(itr.nextToken());
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                         context.write(word, one);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
         
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         public 
         
         static 
         
         class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         private 
         
         IntWritable result = 
         
         new 
         
         IntWritable();
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //Map过程输出<key,values>中key为单个单词，而values是对应单词的计数值所组成的列表，Map的输出就是Reduce的输入，
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //所以reduce方法只要遍历values并求和，即可得到某个单词的总次数。
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         public void 
         
         reduce(
         
         Text key, 
         
         Iterable<
         
         IntWritable> values, 
         
         Context context)
        
        
       
       

       
       
        
        
       
       
       
       
        
                        
         
         throws 
         
         IOException, 
         
         InterruptedException {
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     int sum = 
         
         0;
        
        
       
       

       
       
        
        
       
       
       
       
        
                    
         
         for (
         
         IntWritable 
         
         val : values) {
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                         sum += 
         
         val.
         
         get();
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     result.
         
         set(sum);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                     context.write(key, result);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         //执行MapReduce任务
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         public 
         
         static void main(
         
         String[] args) 
         
         throws 
         
         Exception {
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Configuration conf = 
         
         new 
         
         Configuration();
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Job job = 
         
         Job.getInstance(conf, 
         
         "wordCount");
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setJarByClass(
         
         WordCount.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setMapperClass(
         
         TokenizerMapper.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setCombinerClass(
         
         IntSumReducer.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setReducerClass(
         
         IntSumReducer.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setOutputKeyClass(
         
         Text.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 job.setOutputValueClass(
         
         IntWritable.
         
         class);
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         //命令行输入的第一个参数是输入路径，第二个参数是输出路径
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FileInputFormat.addInputPath(job, 
         
         new 
         
         Path(args[
         
         0]));
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FileOutputFormat.setOutputPath(job, 
         
         new 
         
         Path(args[
         
         1]));
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         System.exit(job.waitForCompletion(
         
         true) ? 
         
         0 : 
         
         1);
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             }
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         }
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

（2）导出jar包

这里写图片描述

（3）上传到服务器端运行
和前面一样，通过xftp将刚刚导出到桌面的wordcount.jar包上传到node1节点
这里写图片描述


     
     
       
       
        
        
       
       
       
       
        
        
         
         [root@node1 ~]
         
         # hadoop jar wordcount.jar cn.hadron.mr.WordCount input output
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10
         
         :
         
         41
         
         :
         
         41 INFO client
         
         .RMProxy
         
         : Connecting to ResourceManager at node1/
         
         192.168
         
         .80
         
         .131
         
         :
         
         8032
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Exception 
         
         in thread 
         
         "main" org
         
         .apache
         
         .hadoop
         
         .mapred
         
         .FileAlreadyExistsException
         
         : Output directory 
         
         hdfs:/
         
         /node1:
         
         9000
         
         /user
         
         /root/output already exists
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .lib
         
         .output
         
         .FileOutputFormat
         
         .checkOutputSpecs(FileOutputFormat
         
         .java
         
         :
         
         146)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .JobSubmitter
         
         .checkSpecs(JobSubmitter
         
         .java
         
         :
         
         266)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .JobSubmitter
         
         .submitJobInternal(JobSubmitter
         
         .java
         
         :
         
         139)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job$10
         
         .run(Job
         
         .java
         
         :
         
         1290)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job$10
         
         .run(Job
         
         .java
         
         :
         
         1287)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at java
         
         .security
         
         .AccessController
         
         .doPrivileged(Native Method)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at javax
         
         .security
         
         .auth
         
         .Subject
         
         .doAs(Subject
         
         .java
         
         :
         
         422)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .security
         
         .UserGroupInformation
         
         .doAs(UserGroupInformation
         
         .java
         
         :
         
         1698)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job
         
         .submit(Job
         
         .java
         
         :
         
         1287)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .mapreduce
         
         .Job
         
         .waitForCompletion(Job
         
         .java
         
         :
         
         1308)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at cn
         
         .hadron
         
         .mr
         
         .WordCount
         
         .main(WordCount
         
         .java
         
         :
         
         59)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at sun
         
         .reflect
         
         .NativeMethodAccessorImpl
         
         .invoke
         
         0(Native Method)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at sun
         
         .reflect
         
         .NativeMethodAccessorImpl
         
         .invoke(NativeMethodAccessorImpl
         
         .java
         
         :
         
         62)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at sun
         
         .reflect
         
         .DelegatingMethodAccessorImpl
         
         .invoke(DelegatingMethodAccessorImpl
         
         .java
         
         :
         
         43)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at java
         
         .lang
         
         .reflect
         
         .Method
         
         .invoke(Method
         
         .java
         
         :
         
         498)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .util
         
         .RunJar
         
         .run(RunJar
         
         .java
         
         :
         
         221)
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             at org
         
         .apache
         
         .hadoop
         
         .util
         
         .RunJar
         
         .main(RunJar
         
         .java
         
         :
         
         136)
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

这是由于output目录已经存在，删除即可


     
     
       
       
        
        
       
       
       
       
        
        
         
         [root
         
         @node1 ~]
         
         # hdfs dfs -rmr /user/root/output
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         rmr: DEPRECATED: Please 
         
         use 
         
         'rm -r' instead.
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         42:
         
         01 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 
         
         0 minutes, Emptier interval = 
         
         0 minutes.
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Deleted /user/root/output
        
        
       
       
     
     1
2
3
4

重新运行


     
     
       
       
        
        
       
       
       
       
        
        
         
         [root@node1 ~]# hadoop jar wordcount.jar cn.hadron.mr.WordCount input output
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         12 INFO client.RMProxy: Connecting 
         
         to ResourceManager at node1/
         
         192.168
         
         .80
         
         .131:
         
         8032
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing 
         
         not performed. Implement the Tool 
         
         interface 
         
         and execute your application 
         
         with ToolRunner 
         
         to remedy this.
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         15 INFO input.FileInputFormat: Total input paths 
         
         to 
         
         process : 
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         15 INFO mapreduce.JobSubmitter: number 
         
         of splits:
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         16 INFO mapreduce.JobSubmitter: Submitting tokens 
         
         for job: job_1495804618534_0001
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         17 INFO impl.YarnClientImpl: Submitted application application_1495804618534_0001
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         17 INFO mapreduce.Job: The url 
         
         to track the job: http:
         
         //node1:
         
         8088
         
         /proxy/application_1495804618534_0001/
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         17 INFO mapreduce.Job: Running job: job_1495804618534_0001
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         43 INFO mapreduce.Job: Job job_1495804618534_0001 running 
         
         in uber mode : false
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         43:
         
         43 INFO mapreduce.Job:  
         
         map 
         
         0% reduce 
         
         0%
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         44:
         
         19 INFO mapreduce.Job:  
         
         map 
         
         100% reduce 
         
         0%
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         44:
         
         33 INFO mapreduce.Job:  
         
         map 
         
         100% reduce 
         
         100%
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         44:
         
         35 INFO mapreduce.Job: Job job_1495804618534_0001 completed successfully
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         17/
         
         05/
         
         28 
         
         10:
         
         44:
         
         36 INFO mapreduce.Job: Counters: 
         
         50
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         File System Counters
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FILE: Number 
         
         of bytes 
         
         read=
         
         89
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FILE: Number 
         
         of bytes written=
         
         355368
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FILE: Number 
         
         of 
         
         read operations=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FILE: Number 
         
         of large 
         
         read operations=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         FILE: Number 
         
         of 
         
         write operations=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 HDFS: Number 
         
         of bytes 
         
         read=
         
         301
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 HDFS: Number 
         
         of bytes written=
         
         46
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 HDFS: Number 
         
         of 
         
         read operations=
         
         9
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 HDFS: Number 
         
         of large 
         
         read operations=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 HDFS: Number 
         
         of 
         
         write operations=
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             Job Counters 
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Killed 
         
         map tasks=
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Launched 
         
         map tasks=
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Launched reduce tasks=
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Data-
         
         local 
         
         map tasks=
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total 
         
         time spent by 
         
         all maps 
         
         in occupied slots (ms)=
         
         62884
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total 
         
         time spent by 
         
         all reduces 
         
         in occupied slots (ms)=
         
         12445
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total 
         
         time spent by 
         
         all 
         
         map tasks (ms)=
         
         62884
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total 
         
         time spent by 
         
         all reduce tasks (ms)=
         
         12445
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total vcore-milliseconds taken by 
         
         all 
         
         map tasks=
         
         62884
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total vcore-milliseconds taken by 
         
         all reduce tasks=
         
         12445
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total megabyte-milliseconds taken by 
         
         all 
         
         map tasks=
         
         64393216
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total megabyte-milliseconds taken by 
         
         all reduce tasks=
         
         12743680
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         Map-Reduce Framework
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Map input records=
         
         6
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Map output records=
         
         14
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Map output bytes=
         
         140
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Map output materialized bytes=
         
         95
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Input split bytes=
         
         216
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Combine input records=
         
         14
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Combine output records=
         
         7
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Reduce input groups=
         
         6
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Reduce shuffle bytes=
         
         95
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Reduce input records=
         
         7
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Reduce output records=
         
         6
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Spilled Records=
         
         14
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Shuffled Maps =
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Failed Shuffles=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Merged 
         
         Map outputs=
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 GC 
         
         time elapsed (ms)=
         
         860
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 CPU 
         
         time spent (ms)=
         
         10230
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Physical memory (bytes) snapshot=
         
         503312384
        
        
       
       

       
       
        
        
       
       
       
       
        
                
         
         Virtual memory (bytes) snapshot=
         
         6236766208
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Total committed heap usage (bytes)=
         
         301146112
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
             Shuffle Errors
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 BAD_ID=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 CONNECTION=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 IO_ERROR=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 WRONG_LENGTH=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 WRONG_MAP=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 WRONG_REDUCE=
         
         0
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         File Input Format Counters 
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Bytes 
         
         Read=
         
         85
        
        
       
       

       
       
        
        
       
       
       
       
        
            
         
         File Output Format Counters 
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
                 Bytes Written=
         
         46
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         [root@node1 ~]# 
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

查看结果


     
     
       
       
        
        
       
       
       
       
        
        
         
         [
         
         root@node1 
         
         ~
         
         ]
         
         #
         
          
         
         hdfs
         
          
         
         dfs
         
          
         
         -
         
         ls
         
          
         
         /user/root/output
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Found 
         
         2 
         
         items
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         -
         
         rw
         
         -
         
         r
         
         -
         
         -
         
         r
         
         -
         
         -   
         
         3 
         
         root 
         
         supergroup          
         
         0 
         
         2017
         
         -
         
         05
         
         -
         
         28 
         
         10:44 
         
         /user/root/output/_SUCCESS
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         -
         
         rw
         
         -
         
         r
         
         -
         
         -
         
         r
         
         -
         
         -   
         
         3 
         
         root 
         
         supergroup         
         
         46 
         
         2017
         
         -
         
         05
         
         -
         
         28 
         
         10:44 
         
         /user/root/output/part
         
         -
         
         r
         
         -
         
         00000
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         [
         
         root@node1 
         
         ~
         
         ]
         
         #
         
          
         
         hdfs
         
          
         
         dfs
         
          
         
         -
         
         cat
         
          
         
         /user/root/output/part
         
         -
         
         r
         
         -
         
         00000
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hadoop  
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hello   
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Hi      
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         Java    
         
         2
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         World   
         
         1
        
        
       
       

       
       
        
        
       
       
       
       
        
        
         
         world   
         
         1
        
        
       
       
     
     1
2
3
4
5
6
7
8
9
10
11

问题补充

2017-06-24
今天再次运行之前写的MapReduce程序时，报错：

(null) entry in command string: null chmod 0700
     
     1

解决办法：
（1）下载hadoop-2.7.3.tar.gz，解压缩。比如解压缩到D盘，hadoop根目录就是D:\hadoop-2.7.3
（2）拷贝debug工具(winutils.exe)到HADOOP_HOME/bin
这里写图片描述
（3）设置环境变量

这里写图片描述

灰常灰常感谢原博主的辛苦工作，为防止删博，所以转载，只供学习使用，不做其他任何商业用途。 https://blog.csdn.net/chengyuqiang/article/details/72794026

SpencerHuangQY

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
6.2 解读WordCount

第6章 MapReduce入门6.2 解读WordCountWordCount程序就是MapReduce的HelloWord程序。通过对WordCount程序分析，我们可以了解MapReduce程序的基本结构和执行过程。6.2.1 WordCount设计思路WordCount程序很好的体现了MapReduce...
复制链接

扫一扫

专栏目录