MapReduce之间的嵌套应用

最新推荐文章于 2020-10-02 08:41:36 发布

CodeGuN

最新推荐文章于 2020-10-02 08:41:36 发布

阅读量253

点赞数

分类专栏：大数据

原文链接：https://blog.csdn.net/u010521842/article/details/75042771

版权

大数据专栏收录该内容

5 篇文章 0 订阅

订阅专栏

参考：https://blog.csdn.net/u010521842/article/details/75042771 感谢博主

多个MapReduce之间的嵌套

在Coding过程中发现，大多数时候需要使用到MapReduce的嵌套运行
在网上搜了好久才找到详细合适的方案，记录下来。

根据log日志计算log中不同的IP地址数量是多少
Log
字段使用Tab分割

实现方法

任务分为两个MR过程，第一个MR（命名为MR1）负责将重复的ip地址去掉，然后将无重复的ip地址进行输出。第二个MR（命名为MR2）负责将MR1输出的ip地址文件进行汇总，然后将计算总数输出。

MR1阶段

Map过程

public class IpFilterMapper extends Mapper<LongWritable, Text, Text, NullWritable> {

    @Override
    protected void map(LongWritable key, Text value,
            Mapper<LongWritable, Text, Text, NullWritable>.Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        String[] splits = line .split("\t");
        String ip = splits[3];
        context.write(new Text(ip), NullWritable.get());
    }
}

输入的key和value是文本的行号和每行的内容
输出的key是ip地址，输出的value为空类型

Shuffle过程 Hadoop完成

主要是针对map阶段输出的key进行排序和分组，将相同的key分为一组，并且将相同key的value放到同一个集合里面，所以不同的组绝对不会出现相同的ip地址，分好组之后将值传递给reduce。

Reduce过程

public class IpFilterReducer extends Reducer<Text, NullWritable, Text, NullWritable> {

    @Override
    protected void reduce(Text key, Iterable<NullWritable> values, Context context) 
            throws IOException, InterruptedException {
        context.write(key, NullWritable.get());
    }
}

经过shuffle阶段之后所有输入的key都是不同的，也就是ip地址是无重复的，所以可以直接输出。

MR2阶段

Map过程

public class IpCountMapper extends Mapper<LongWritable, Text, Text, NullWritable> {

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, NullWritable>.Context context)
            throws IOException, InterruptedException {
        //输出的key为字符串"ip",这个可以随便设置，只要保证每次输出的key都一样就行
        //目的是为了在shuffle阶段分组
        context.write(new Text("ip"), NullWritable.get());
    }
}

Shuffle过程
Reduce过程

public class IpCountReducer extends Reducer<Text, NullWritable, Text, NullWritable> {

    @Override
    protected void reduce(Text key, Iterable<NullWritable> values,
            Reducer<Text, NullWritable, Text, NullWritable>.Context context) throws IOException, InterruptedException {
        //用于存放ip地址总数量
        int count = 0;
        for (NullWritable v : values) {
            count ++;
        }
        context.write(new Text(count+""), NullWritable.get());
    }
}