MapReduce-定制Partitioner-求文件奇偶数行之和

最新推荐文章于 2021-08-11 10:58:01 发布

doegoo

最新推荐文章于 2021-08-11 10:58:01 发布

阅读量1.4k

点赞数

本文链接：https://blog.csdn.net/doegoo/article/details/50392109

版权

hadoop 同时被 3 个专栏收录

15 篇文章 0 订阅

订阅专栏

mapreduce

14 篇文章 0 订阅

订阅专栏

MapReduce

13 篇文章 5 订阅

订阅专栏

这篇博客说明Partioner定制的问题，partion发生在map阶段的最后，会先调用job.setPartitionerClass对这个List进行分区，每个分区映射到一个reducer。每个分区内又调用job.setSortComparatorClass设置的key比较函数类排序。前面的几篇博客的实例都是用的一个reducer，这个实例的完成将使用二个reducer的情况，至于多reducer的测试将在全局排序的实例中演示。
下面是本篇博客的实例的需求：
测试数据：
324
654
23
34
78
2
756
134
32
需求：求出数据的奇数行和偶数行之和
这里主要是用到定制partitioner，以下是如何自定义分区函数类。
只要继承Partitioner<T,T>
public class MyPartitioner extends Partitioner<T,T>
然后去实现其中的getPartition()方法就行了，在其中完成分区的逻辑以及一些对于需求的对象key-value对的修改
下面为实现代码：

自定义partitioner:

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Partitioner;

public class MyPartitioner extends Partitioner<LongWritable, IntWritable> {
	@Override
	public int getPartition(LongWritable key, IntWritable value, int arg2) {
		/**
		 * 根据行号进行分区，把行号为的偶数的分区到0号reduce
		 * 把行号为奇数的分区到1号reduce，并把key的值设置为0或1
		 * 目的是为了在进入reduce时奇数和偶数能被分别放到同一个
		 * 迭代器中以便求和操作
		 */
		if( key.get() % 2 == 0) {
			key.set(0);
			return 0;
		} else {
			key.set(1);
			return 1;
		}
	}
}

map阶段：

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntWritable> {


	private long lineNum = 0;
	private LongWritable okey = new LongWritable();
	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		lineNum ++;
		okey.set(lineNum);
		/**
		 * 输出行号作为key,并把行的值作为value,这里只是简单的说明的patitioner的定制
		 * 不考虑多mapper情况下行号控制，这里只关注partitioner的使用就行
		 */
		context.write(okey, new IntWritable(Integer.parseInt(value.toString())));
	}
}

reduce阶段：

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<LongWritable, IntWritable, Text, IntWritable> {

	@Override
	protected void reduce(LongWritable key, Iterable<IntWritable> value, Context context)
			throws IOException, InterruptedException {
		int sum = 0;
		for( IntWritable val : value) {
			sum += val.get();
		}
		if( key.get() == 0 ) {
			context.write(new Text("偶数行之和为："), new IntWritable(sum));
		} else if ( key.get() == 1) {
			context.write(new Text("奇数行之和为："), new IntWritable(sum));
		}
	}
}

启动函数：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class JobMain {
	public static void main(String[] args) throws Exception {
		Configuration configuration = new Configuration();
		Job job = new Job(configuration, "partitioner-job");
		job.setJarByClass(JobMain.class);
		job.setMapperClass(MyMapper.class);
		job.setMapOutputKeyClass(LongWritable.class);
		job.setMapOutputValueClass(IntWritable.class);
		//设置自定义的Partitioner对map输出进行分区
		job.setPartitionerClass(MyPartitioner.class);
		job.setReducerClass(MyReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		//设置job的reducer的个数为2
		job.setNumReduceTasks(2);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		Path outputDir = new Path(args[1]);
		FileSystem fs = FileSystem.get(configuration);
		if( fs.exists(outputDir)) {
			fs.delete(outputDir ,true);
		}
		FileOutputFormat.setOutputPath(job, outputDir);
		System.exit(job.waitForCompletion(true) ? 0: 1);
	}
}

运行结果：

结论：
为了说明某一个知识点的作用，博客都是以尽可以只涉及要讲的点的运行，到后面会有一些综合一点的，一些点结合起来的例子，mapreduce框架很灵活，可以定制的功能也很多，后面会一一的说明，比如自定义InputFormat、RecordReader、OutputFormat、RecordWriter，后面还会说明难一点的实例--使用mapreduce处理xml以及json格式的文件来分别说明这些扩展点。对于大文件被分成多个spilt而用多个map计算奇偶数行之和，参见《 MapReduce-定制Partitioner-使用NLineInputFormat处理大文件-求文件奇偶数行之和》