windows下eclipse项目wordcount操作hadoop2.8.3 mapreduce(3)

最新推荐文章于 2024-07-04 15:42:08 发布

_lichaoxi

最新推荐文章于 2024-07-04 15:42:08 发布

阅读量337

点赞数

分类专栏： hadoop 文章标签： wordcount hadoop eclipse windows

本文链接：https://blog.csdn.net/Elmo66/article/details/82191699

版权

hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

hadoop环境搭建详情见hadoop系列第一篇博客(hadoop配置直接影响到本程序的运行)

另外，windows环境下运行mapreduce程序需要hadoop.dll与winutils.exe的支持https://github.com/steveloughran/winutils

本次示例为hadoop2.8.3，把对应版本的hadoop.dll与winutils.exe复制到本地hadoop文件夹的bin目录下，并把hadoop.dll复制一份到windows系统C:\Windows\System32中(本地hadoop无需修改etc目录下的配置，但要设置Windows系统环境变量HADOOP_HOME与path并加载到eclipse中)

数据准备：

[hadoop@yourname ~]$ hadoop dfs -mkdir /wordcount
[hadoop@yourname ~]$ hadoop dfs -mkdir /wordcount/input
[hadoop@yourname ~]$ hadoop dfs -copyFromLocal test.txt /wordcount/input/

yourname详见hadoop系列第一篇博客；hadoop是登录linux系统的用户名；～指/home/hadoop目录；test.txt是在/home/hadoop目录下，上传到hdfs中/wordcount/input/目录下

test.txt

test hadoop
hello hadoop

package com.hadoop.test;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {

    public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        
        @Override
        protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
        		throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
    }

    public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        @Override
        protected void reduce(Text key, Iterable<IntWritable> values,
        		Reducer<Text, IntWritable, Text, IntWritable>.Context output) throws IOException, InterruptedException {
            int sum = 0;
            for(IntWritable value : values){
            	sum += value.get();
            }
            result.set(sum);
            output.write(key, result);
        }
    }
    
    
    public static void main(String[] args) throws Exception {
        String input = "hdfs://192.168.1.101:9000/wordcount/input";
        String output = "hdfs://192.168.1.101:9000/wordcount/output";

        
        Configuration conf = new Configuration();
        //配置信息不可缺少
        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","192.168.1.101");
        conf.set("fs.defaultFS","hdfs://192.168.1.101:9000/");
        conf.set("mapreduce.app-submission.cross-platform", "true");
        conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");
        
        Job job = Job.getInstance(conf);
        job.setJarByClass(WordCount.class);
        job.setJar("E:/wordcount.jar");
        job.setJobName("WordCount");
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(WordCountMapper.class);
        job.setCombinerClass(WordCountReducer.class);
        job.setReducerClass(WordCountReducer.class);

        FileInputFormat.addInputPath(job, new Path(input));
        FileOutputFormat.setOutputPath(job, new Path(output));

        job.waitForCompletion(true);
        System.exit(0);
    }

}

不配置conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");
异常：java.io.IOException:java.net.ConnectException: Call From yourname/192.168.182.100 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused

若没有指定配置信息，mapreduce则不会运行在远程的linux中(http://192.168.1.101:8088/cluster中无运行记录)
conf.set("mapreduce.framework.name","yarn");
conf.set("yarn.resourcemanager.hostname","192.168.1.101");
conf.set("fs.defaultFS","hdfs://192.168.1.101:9000/");
conf.set("mapreduce.app-submission.cross-platform", "true");//指定远程跨平台运行
conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");

job.setJar("E:/wordcount.jar"); 设置jar包，运行程序之前需要先将应用打包并放在指定位置，否则报异常java.io.FileNotFoundException

运行mapreduce：sbin目录下启动start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh start historyserver

eclipse右键运行run on hadoop

通过DFS Locations查看运行结果：

在eclipse中可双击part-r-00000打开查看结果(命令行形式：hadoop dfs -cat /wordcount/output/part-r-00000)：

hadoop   2
hello   1
test   1

若需要多次运行，则需要在运行前删掉output目录