MapperReduce入门Wordcount案例

最新推荐文章于 2024-04-29 00:30:02 发布

L凝竹

最新推荐文章于 2024-04-29 00:30:02 发布

阅读量2.2k

点赞数

分类专栏：大数据技术入门文章标签： idea java

本文链接：https://blog.csdn.net/xiaoliu_qq/article/details/78825770

版权

大数据技术入门专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍如何使用MapperReduce实现Wordcount案例，包括环境搭建、代码解析及运行步骤。通过IntelliJ IDEA创建Java工程，并引入hadoop相关jar包及log4j配置，详细讲解了Mapper、Reducer和Driver类的设计思路。

摘要由CSDN通过智能技术生成

MapperReduce入门Wordcount案例

0.本案例是在本地运行MapperReduce
1.准备材料开发工具Intellij IDEA + 运行hadoop使用的jar包
2.打开IDEA创建一个普通Java工程，导入jar包，为方便查看日志信息，引入一个log4j.properties的配置文件
3.需要自己编写的类包括三个WordCountMapper、WordCountReducer、WordCountDriver
代码如下
WordCountMapper：

package com.liu;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class WCMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

    //1.mapper阶段，切片
    //  1).mapper类首先要继承自mapper类，指定输入的key类型，输入的value类型
    //  2).指定输出的key类型，输出的value类型
    //  3).重写map方法
    //     在map方法里面获取的是文本的行号，一行文本的内容，写出的上下文对象

    Text k = new Text();
    IntWritable v = new IntWritable(1);
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] words = line.split(" ");
        for (String word:words
             ) {
            k.set(word);
            context.write(k, v);
        }
    }
}

WordCountReducer：

package com.liu;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WCReduce extends Reducer<Text,IntWritable,Text,IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum=0; //定义一个变量来统计单词出现的次数
        for (IntWritable num:values //遍历这个迭代器，累计单词出现的次数
             ) {
            sum += num.get();
        }
        context.write(key,new IntWritable(sum));
    }
}

WordCountDriver：

package com.liu;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;

public class WCDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    //创建Job作业
        Job job  = Job.getInstance(new Configuration());
    //设置驱动类
        job.setJarByClass(WCDriver.class);
        //设置mapper类、reduce类
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReduce.class);
        //设置map阶段输出的key类型、value类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //设置reduce阶段输出key类型、value类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //设置读取文件路径、输出文件路径
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //等待提交作业
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }
}