MapReduce创建mapper、reducer、驱动器driver

最新推荐文章于 2020-07-14 00:48:42 发布

lds_include

最新推荐文章于 2020-07-14 00:48:42 发布

阅读量657

点赞数 2

分类专栏： MapReduce 大数据文章标签： MapReduce知识 Hadoop知识大数据知识

本文链接：https://blog.csdn.net/lds_include/article/details/88636481

版权

大数据同时被 2 个专栏收录

70 篇文章 5 订阅

订阅专栏

MapReduce

8 篇文章 0 订阅

订阅专栏

MapReduce创建mapper、reducer、驱动器driver

MapReduce编写

例子：求记录一个文件中的每个单词的个数

文件

1.txt
--------------------
java c c++ c# python 
hadoop hive scala spark
java c c++ c# python 
hadoop hive scala spark
java c c++ c# python 
hadoop hive scala spark
java c c++ c# python 
hadoop hive scala spark
---------------------

代码编写

说明：MapReduce程序整体分为三个部分，一个部分是mapper端的编写将文件拆成 key-value 的键值对，第二个部分是将mapper端出来的键值对进行聚合也就是reduce端的工作将计算结果输出到文件中去，第三个部分就是整个程序的驱动入口我这儿叫他为driver。
mapper端

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * 继承Mapper这个类。
 * LongWritable这个参数是指将文件拆分成每一行的行标所以是lang型，但是在mapper中lang和LongWritable一一对应
 * Text是每一行对应的数据是文本类型的所以为text型
 * 第二个Text是指从mapper中出去的键值对中key的类型，因为是一个个的单词所以也用文本类型Text。
 * IntWritable是指从mapper中出去的键值对中value的类型，因为我做的是对应一个1，所以对应的是int类型，在mapper中int和IntWritable对应。
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String string = value.toString();//先转化为string的字符串儿
        String[] split = string.split(" ");//因为每个单词之间是空格隔开的，所以用" "将每个单词拆分开。
        for (String string1: split) {
            context.write(new Text(string1), new IntWritable(1));//遇见一个单词将它标记为1个，方便reduce端进行聚合
        }
    }
}

reducer端

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * reducer类要继承MapReduce类的Reducer类
 * Text参数是从Mapper端传过来的key-value键值对的key的类型，原理和mapper端一样
 * IntWritable参数是从Mapper端传过来的key-value键值对的value的类型，原理和mapper端一样
 * Text参数是从reduce端传出的key值的类型也就是最终答案的单词的类型为text
 * IntWritable参数是从reduce端传出的value的类型也就是key对应的个数所以为IntWritable类型
 */
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    /*
    要按自己需要的业务规则重写reduce方法
     */
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        /*
        聚合每个单词出现的1的次数
         */
        for (IntWritable inc: values) {
            count += inc.get();
        }
        context.write(key, new IntWritable(count));//写出到文件中去
    }
}

dive端

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.qianfeng.bigdata.mapreduce.mapjoin.mapper.GetEpMapper;

import java.io.IOException;

public class WordCountDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        //构建Job类的对象
        Job job = Job.getInstance(conf);
        //给当前job类的对象设置job名称
        job.setJobName("");

        //设置运行主类
        job.setJarByClass(WordCountDriver.class);

        //设置job的Mapper及其输出K,V的类型
        job.setMapperClass(WordCountMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        //本程序不要自定义实现Reducer
        //设置job的输出K,V的类型，也可以说是Reducer输出的K,V的类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        //设置要处理的HDFS上的文件的路径
        FileInputFormat.addInputPath(job,new Path(" "));
        //设置最终输出结果的路径
        FileOutputFormat.setOutputPath(job,new Path(" "));

        //等待程序完成后自动结束程序
        System.exit(job.waitForCompletion(true)?0:1);
    }
}

结果

out.txt
-----------
java 4
c 4
c++ 4
c# 4
python 4
scala 4
spark 4
hadoop 4
hive 4
----------

说明：MapReduce的过程主要是建立在map这个集合上的，利用map的key-value键值对处理问题得到相应的结果。

lds_include

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录