MapReduce---chain链条式操作

最新推荐文章于 2024-08-18 10:00:00 发布

缘定三石

最新推荐文章于 2024-08-18 10:00:00 发布

阅读量832

点赞数

分类专栏： Hadoop实战文章标签： Hadoop mapreduce chain链条式操作

本文链接：https://blog.csdn.net/tian_qing_lei/article/details/77389059

版权

本文通过实例解析MapReduce的chain链条式操作，包括Mapper1切割单词、Mapper2去除'of'、Mapper2_2过滤'tom'开头的单词，以及Mapper3对输出进行次数统计并排除出现次数少于1次的单词。

摘要由CSDN通过智能技术生成

准备数据：

hello world of tom1
hello world of tom1
hello world of tom2
hello world of tom3
hello world of tom3
hello world of tom4
hello world of tom4

思路分析图：

1、Mapper1(切割单词)

2、Mapper2 （滤掉单词中of）

3、Mapper2_2(滤掉单词中带tom的)

4、Mapper3（reduce的输出是Mapper3的输入，滤掉单次数量少于1=次的）

1、Mapper1(切割单词)

package hadoop.mr.chain;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 *
 */
public class Mapper1 extends Mapper<LongWritable,Text,Text,IntWritable>{
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		System.out.println("map1 : " + value.toString());
		String line = value.toString();
		String[] arr =  line.split(" ");
		for(String w : arr){
			context.write(new Text(w),new IntWritable(1));
		}
	}
}

2、Mapper2 （滤掉单词中of）