基于MapReduce的Wordcount（本地）

最新推荐文章于 2024-10-15 20:04:37 发布

孤陋寡闻的闻

最新推荐文章于 2024-10-15 20:04:37 发布

阅读量244

点赞数 1

分类专栏： Hadoop 文章标签： mapreduce hadoop 大数据 eclipse

本文链接：https://blog.csdn.net/weixin_54132382/article/details/128519976

版权

Hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章详细介绍了Wordcount在MapReduce中的应用，包括Mapper和Reducer类的实现，以及如何在本地运行统计单词数量的程序。Mapper负责拆分文本，Reducer则进行求和操作，最后展示了运行结果。

摘要由CSDN通过智能技术生成

提示：在读大学生狗，写博客纯属笔记，不喜勿喷。

前言

提示：统计单词数量。

提示：以下是本篇文章正文内容，下面案例可供参考

一、Wordcount是什么？

Wordcount 是基于MapReduce的一个实例，是为了解决“统计单词个数”而创建的。

二、实例介绍

1.所用文件

包含单词如下（示例）：

hello tom hello allen hello
allen tom mac apple
hello allen apple
hello spark allen hadoop spark

将这些单词，复制粘贴到txt，保存在任意文件夹下。

2.项目结构

在这里插入图片描述

3.代码部分

Mapper类：

package wordcount;


import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;


public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    private Text outK = new Text();
    private IntWritable outV = new IntWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        
        String line = value.toString();
        String[] words = line.split(" ");

        
        for (String word : words) {
            outK.set(word);
            context.write(outK, outV);
        }
    }
}

Reducer类：

package wordcount;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;


public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> {
    private IntWritable outV = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        int sum = 0;
        for (IntWritable value : values) {
            sum += value.get();
        }

        outV.set(sum);

        context.write(key,outV);
    }
}

Driver类：

package wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountDriver {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

    	// 1 获取job
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        // 2 设置jar包路径
        job.setJarByClass(WordCountDriver.class);

        // 3 关联mapper和reducer
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);

        // 4 设置map输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 5 设置最终输出的kV类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

         // 6 设置输入路径和输出路径
        FileInputFormat.setInputPaths(job, new Path("D:\\data\\wordcount\\input"));
        FileOutputFormat.setOutputPath(job, new Path("D:\\data\\wordcount\\output"));

        // 7 提交job
        boolean result = job.waitForCompletion(true);

        System.exit(result ? 0 : 1);
       
    }
}