mapreduce排序_二次排序

二次排序Partitioner、SortComparator、GroupingComparator

Partitioner:完成分区,重写getPartition()函数
SortComparator与GroupingComparator异同:
相同:都要继承WritableComparator对象,构造函数关联bean对象,重写compare()方法.
不同:SortComparator完成的是二次排序功能,其compare()方法完成bean对象的排序,GroupingComparator完成分组功能,其compare()方法完成bean对象分组。


需求分析:
1.键值对是两个整数(int1,int2),int1范围是1-100000,int2范围是1-100.
2.要求先按int2排序,再按int1排序。
3.reduce至少五个,且reduce的输出全排序


bean对象:要实现WritableComparable的功能,这里重写了compareTo方法进行排序,其功能与SortComparator一致。

public class MyBean implements WritableComparable<MyBean> {

	private int int1;
	private int int2;

	public MyBean() {
	}
	public MyBean(int int1, int int2) {
		this.int1 = int1;
		this.int2 = int2;
	}
	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(int1);
		out.writeInt(int2);
	}
	@Override
	public void readFields(DataInput in) throws IOException {
		this.int1 = in.readInt();
		this.int2 = in.readInt();
	}
	@Override
	public int compareTo(MyBean o) { 	//实现二次排序
		if (this.int2 == o.getInt2()) {
			return this.int1 - o.getInt1();
		} else {
			return this.int2 - o.getInt2();
		}
	}
	@Override
	public String toString() {
		return "(" + int1 + "," + int2 + ")";
	}
	public int getInt1() {
		return int1;
	}
	public void setInt1(int int1) {
		this.int1 = int1;
	}
	public int getInt2() {
		return int2;
	}
	public void setInt2(int int2) {
		this.int2 = int2;
	}
}

map

	public static class MyMapper extends Mapper<LongWritable, Text, MyBean, NullWritable> {

		@Override
		protected void map(LongWritable key, Text value, Context context) 
		throws IOException, InterruptedException {

			String line = value.toString(); // (int1,int2)
			String[] fields = line.split(",");
			String num1 = fields[0].substring(1, fields[0].length());
			String num2 = fields[1].substring(0, fields[1].length() - 1);

			MyBean bean = new MyBean(Integer.parseInt(num1), Integer.parseInt(num2));
			context.write(bean, NullWritable.get());
		}
	}

partitioner

	public static class MyPartitioner extends Partitioner<MyBean, NullWritable> {

		@Override
		public int getPartition(MyBean key, NullWritable value, int numPartitions) {
			int int2 = key.getInt2();
			return (int2 - 1) / 20;
		}
	}

groupingComparator:只根据int2分区

public static class GroupingComparator extends WritableComparator {

		public GroupingComparator() {
			super(MyBean.class, true);
		}

		@Override
		public int compare(WritableComparable a, WritableComparable b) {

			MyBean beanA = (MyBean) a;
			MyBean beanB = (MyBean) b;
			return beanA.getInt2() - beanB.getInt2();
		}
	}

reducer
说明一点:相同的key值会进入同一个reduce函数,这里二次排序只根据int2对key(bean对象)进行分组,实际上key值(bean对象)不完全相同,存在多个在同一组的key值(bean对象),存在两种情况:
1.int2相同,int1不同。
2.int2相同,int1也相同。
这时value是NullWritable类型,要获取不同的bean对象,必须通过遍历values来获得不同的key值。否则每次获取的都是第一个key值(bean对象)

public static class MyReducer extends Reducer<MyBean, NullWritable, Text, NullWritable> {

		@Override
		protected void reduce(MyBean key, Iterable<NullWritable> values, Context context)
				throws IOException, InterruptedException {
			String str = "";
			str += key.getInt2()+":";
			for (NullWritable value : values) {
				str += key.getInt1() + ",";
			}
			context.write(new Text(str.substring(0, str.length()-1)), NullWritable.get());
		}
	}

driver

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

		Configuration configuration = new Configuration();
		Job job = Job.getInstance(configuration);

		job.setJarByClass(MyGroupingComparator.class);

		job.setPartitionerClass(MyPartitioner.class);
		job.setGroupingComparatorClass(GroupingComparator.class);
		job.setNumReduceTasks(5);

		job.setMapperClass(MyMapper.class);
		job.setReducerClass(MyReducer.class);

		job.setMapOutputKeyClass(MyBean.class);
		job.setMapOutputValueClass(NullWritable.class);

		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(NullWritable.class);

		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		boolean result = job.waitForCompletion(true);
		System.exit(result ? 0 : 1);
	}

其中mapper、reducer和driver都在同一个类里

public class MyGroupingComparator {}

以上完成了分区、排序、分组的功能。排序的功能有两种实现方法:

1.以上是用继承了WritableComparable的bean对象的compareTo函数实现的。

@Override
	public int compareTo(MyBean o) { 	//实现二次排序
		if (this.int2 == o.getInt2()) {
			return this.int1 - o.getInt1();
		} else {
			return this.int2 - o.getInt2();
		}
	}

2.也可以继承SortComparator类实现

job.setSortComparatorClass(SortComparator.class); 	//driver
public class SortComparator extends WritableComparator {

		public SortComparator() {
			super(MyBean.class, true);
		}

		@Override
		@SuppressWarnings("rawtypes")
		public int compare(WritableComparable a, WritableComparable b) {
			MyBean beanA = (MyBean) a;
			MyBean beanB = (MyBean) b;
			if (beanA.getInt2() == beanB.getInt2()) {
				return beanA.getInt1() - beanB.getInt1();
			} else {
				return beanA.getInt2() - beanB.getInt2();
			}
		}
	}

附: (int1,int2)的生成类

/*
 * 使用随机数生成以(整数1,整数2)为(int1,int2)的文本文件,
 * 文件数量不少于100个,
 * 单个文件记录数量不少于10万条,
 * 其中int1为1-100000的随机数,int2位1-100的随机数。
 */


public class InitRandom {

	public static void main(String[] args) throws IOException {
		
		int int1 = 100000;
		int int2 = 100;
		int numOfFiles = 100;
		int numOfRecords = 100000;
		
		String path = args[0];		//inputPath
		FileOutputStream fos = null;
		Random random = new java.util.Random();
		
		for (int i = 1; i <= numOfFiles; i++) {
			System.out.println("writing file#"+i);
			fos = new FileOutputStream(new File(path + "/file" + i));
			List<String> list = new ArrayList<String>();
			for (int j = 0; j < numOfRecords; j++)
				list.add("(" + (random.nextInt(int1) + 1) +","+ (random.nextInt(int2) + 1) +")");//line
			PrintStream pStream = new PrintStream(new BufferedOutputStream(fos));
			for (String str : list) {
				pStream.println(str);
			}
			pStream.close();
			fos.close();
		}
	}
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值