MapReduce的分组

不想看废话就直接到最后找总结

一:
MapReduce的分组其实就是把相同的key合并到一起
比如map后输出
hadoop 1
hadoop 1
hadoop 1
分组后变成 hadoop <1,1,1> 所以Reducer类的reduce的方法的第二个参数是map传出的value的迭代器,这个迭代器就是 <1,1,1>
在这里插入图片描述
那么问题来,他是怎么比较key是否相同的,一开始我以为是key的equal方法,但是偶然之下注释了equal方法(在我定义的Writable中的equal方法),但是还是分组成功了。这个时候想到了compareTo方法,没错就是它,当这个方法返回0的时候就认为这两个元素相同从而分到一组中。
而且分组的时候key就是原来的key,只是把value加到原来的value后边。为什么这么说呢,测试的时候我定义了一个Writable类,

public static class GroupWritable implements WritableComparable<GroupWritable>{
		private String word;
		private int count ;
		public void write(DataOutput out) throws IOException {
			out.writeUTF(this.word);
			out.writeInt(this.count);
		}
		public void readFields(DataInput in) throws IOException {
			// TODO Auto-generated method stub
			this.word = in.readUTF();
			this.count = in.readInt();
		}
		public int compareTo(GroupWritable o) {
			// TODO Auto-generated method stub
			//return o.count - this.count;
			return this.count - o.count;
		}
//		@Override
//		public boolean equals(Object obj) {	
//			GroupWritable gw = (GroupWritable)obj;
//			return (gw.word == this.word) && (this.count == gw.count);
//		}
		public GroupWritable(String word, int count) {
			super();
			this.word = word;
			this.count = count;
		}
		public GroupWritable() {
			super();
			// TODO Auto-generated constructor stub
		}
		public String getWord() {
			return word;
		}
		public void setWord(String word) {
			this.word = word;
		}
		public int getCount() {
			return count;
		}
		public void setCount(int count) {
			this.count = count;
		}
		@Override
		public String toString() {
			// TODO Auto-generated method stub
			return "word:"+word+" count:"+count;
		}
	}

这个类有两个属性String 和 int ,我在写自己的Writable类的compareTo方法的时候只要两个int 属性相同就返回0.这个时候奇妙的事发生了。
下边是map输出key和value
GroupWritable 后面是value,
(hello,11)1
(world,11)1
(haha,11) 1
分组后的结果成了 (hello,11) <1,1,1>
因为在reduce中我把迭代器当中的元素进行了累加,并在最终结果中得到了(hello 11) 3 的结果
如果没有分到一起,那么最终结果会是
(hello,11)1
(world,11)1
(haha,11) 1

public static class GroupReducer extends Reducer<GroupWritable, IntWritable, GroupWritable, IntWritable>{
		
		@Override
		protected void reduce(GroupWritable key, Iterable<IntWritable> value,Context context)
				throws IOException, InterruptedException {
			int sum = 0 ;
			
			for(IntWritable i : value) {
				sum += i.get();
			}
			context.write(key, new IntWritable(sum));
			

		}
		
		
	}

为什么会分到一起,因为我定义的compareTo方法认为这三个元素相同,然后reducer分组把三个分到一起了,并且以第一个key为key,然后把value追加到迭代器中。

完整代码: 最终得到结果是(hello,11) 3
可以自己输入
(hello,11)1
(world,11)1
(haha,11) 1

package GroupTest;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import MoreTopNTest.CoursePartition;
import MoreTopNTest.FindPersonalAvgAndPatition;
import MoreTopNTest.StudentWritable;
import MoreTopNTest.FindPersonalAvgAndPatition.MyMapper;
import MoreTopNTest.FindPersonalAvgAndPatition.MyReducer;

public class GroupTestDemo {
	public static class GroupWritable implements WritableComparable<GroupWritable>{
		private String word;
		private int count ;
		public void write(DataOutput out) throws IOException {
			out.writeUTF(this.word);
			out.writeInt(this.count);
			
		}
		public void readFields(DataInput in) throws IOException {
			// TODO Auto-generated method stub
			this.word = in.readUTF();
			this.count = in.readInt();
		}
		public int compareTo(GroupWritable o) {
			// TODO Auto-generated method stub
			return o.count - this.count;
		}
		//把equal注释掉后发现新世界
//		@Override
//		public boolean equals(Object obj) {	
//			GroupWritable gw = (GroupWritable)obj;
//			return (gw.word == this.word) && (this.count == gw.count);
//		}
		public GroupWritable(String word, int count) {
			super();
			this.word = word;
			this.count = count;
		}
		public GroupWritable() {
			super();
			// TODO Auto-generated constructor stub
		}
		public String getWord() {
			return word;
		}
		public void setWord(String word) {
			this.word = word;
		}
		public int getCount() {
			return count;
		}
		public void setCount(int count) {
			this.count = count;
		}
		@Override
		public String toString() {
			// TODO Auto-generated method stub
			return "word:"+word+" count:"+count;
		}
	}
	public static class GroupMapper extends Mapper<LongWritable, Text, GroupWritable, IntWritable>{
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			
			String [] splited = value.toString().split(" ");
			GroupWritable gw = new GroupWritable(splited[0],Integer.parseInt(splited[1]));
			context.write(gw,new IntWritable(1));
			
		}
	
	}
	public static class GroupReducer extends Reducer<GroupWritable, IntWritable, GroupWritable, IntWritable>{
	
		@Override
		protected void reduce(GroupWritable key, Iterable<IntWritable> value,Context context)
				throws IOException, InterruptedException {
			int sum = 0 ;
			
			for(IntWritable i : value) {
				sum += i.get();
			}
			context.write(key, new IntWritable(sum));
		

		}
		
		
	}
	public static void main(String [] args) throws IOException, ClassNotFoundException, InterruptedException {
		Job job = Job.getInstance(new Configuration ());
		job.setJarByClass(GroupTestDemo.class);
		
		job.setMapperClass(GroupMapper.class);
		job.setMapOutputKeyClass(GroupWritable.class);
		job.setMapOutputValueClass(IntWritable.class);
		
	
		
		job.setReducerClass(GroupReducer.class);
		job.setOutputKeyClass(GroupWritable.class);
		job.setOutputValueClass(IntWritable.class);
		
		
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		job.waitForCompletion(true);
		
		
		
	}
	

}

二:ok,那么还有个问题,就是job.setGroupingComparatorClass是干什么用的, 看别人博客都说她是分组用的,于是我在想,这个优先级和compareTo的优先级谁高,
试验一下:
方法:
定义一个job.setGroupingComparatorClass,在这个类中的compare方法都返回-1,同时实现了GroupWritable的compareTo方法,这个方法还是认为两者的属性int属性相同就返回,
然后再主类中job.setGroupingComparatorClass一下。
输入数据还是
(hello,11)1
(world,11)1
(haha,11) 1
预期结果:
如果comparTo方法比GroupingComparatorClass 更高优先级,那么就会把上述三个分到一组中,相反则会分成三组

package GroupTest;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.RawComparator;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import MoreTopNTest.CoursePartition;
import MoreTopNTest.FindPersonalAvgAndPatition;
import MoreTopNTest.StudentWritable;
import MoreTopNTest.FindPersonalAvgAndPatition.MyMapper;
import MoreTopNTest.FindPersonalAvgAndPatition.MyReducer;

public class GroupTestDemo {
	public static class GroupWritable implements WritableComparable<GroupWritable>{
		private String word;
		private int count ;
		public void write(DataOutput out) throws IOException {
			out.writeUTF(this.word);
			out.writeInt(this.count);
		}
		public void readFields(DataInput in) throws IOException {
			// TODO Auto-generated method stub
			this.word = in.readUTF();
			this.count = in.readInt();
		}
		public int compareTo(GroupWritable o) {
			// TODO Auto-generated method stub
			//return o.count - this.count;
			return this.count - o.count;
		}
//		@Override
//		public boolean equals(Object obj) {	
//			GroupWritable gw = (GroupWritable)obj;
//			return (gw.word == this.word) && (this.count == gw.count);
//		}
		public GroupWritable(String word, int count) {
			super();
			this.word = word;
			this.count = count;
		}
		public GroupWritable() {
			super();
			// TODO Auto-generated constructor stub
		}
		public String getWord() {
			return word;
		}
		public void setWord(String word) {
			this.word = word;
		}
		public int getCount() {
			return count;
		}
		public void setCount(int count) {
			this.count = count;
		}
		@Override
		public String toString() {
			// TODO Auto-generated method stub
			return "word:"+word+" count:"+count;
		}
	}
	public static class GroupMapper extends Mapper<LongWritable, Text, GroupWritable, IntWritable>{
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			
			String [] splited = value.toString().split(" ");
			GroupWritable gw = new GroupWritable(splited[0],Integer.parseInt(splited[1]));
			context.write(gw,new IntWritable(1));
			
			
		}
	
	}
	public static class GroupReducer extends Reducer<GroupWritable, IntWritable, GroupWritable, IntWritable>{
		
		@Override
		protected void reduce(GroupWritable key, Iterable<IntWritable> value,Context context)
				throws IOException, InterruptedException {
			int sum = 0 ;
			
			for(IntWritable i : value) {
				sum += i.get();
			}
			context.write(key, new IntWritable(sum));
			

		}
		
		
	}
	public static void main(String [] args) throws IOException, ClassNotFoundException, InterruptedException {
		Job job = Job.getInstance(new Configuration ());
		job.setJarByClass(GroupTestDemo.class);
		
		job.setMapperClass(GroupMapper.class);
		job.setMapOutputKeyClass(GroupWritable.class);
		job.setMapOutputValueClass(IntWritable.class);
		
		job.setGroupingComparatorClass(MyGroupingComparator.class);
		
		job.setReducerClass(GroupReducer.class);
		job.setOutputKeyClass(GroupWritable.class);
		job.setOutputValueClass(IntWritable.class);
		
		
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		job.waitForCompletion(true);
		
		
		
	}
	public static class MyGroupingComparator implements RawComparator<GroupWritable>{
		
		public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
			// TODO Auto-generated method stub
			return -1;
		}
		public int compare(GroupWritable o1, GroupWritable o2) {
			// TODO Auto-generated method stub
			//如果是的话就返回1否则返回0
			return -1;
		}
	}

}

最终结果是分成三组,也就是说,这个GroupingComparatorClass会覆盖compareTo方法。

总结:
分组会把相同的key的value分到一个组中(以迭代器的方式,就相当于<value1,value2,value3>),那么怎么比较key是否相同呢,有两种方法,第一种是key的compareTo方法,返回0时认为两个key相同,第二种是设置一个自定义的GroupingComparatorClass(怎么自定义请百度),然后重写类里compare方法,这个方法会覆盖掉key中的compareTo方法,总而作为比较key是否相同的方法。

其实分析也可以得到,GroupingComparatorClass的比较方法优先级高于自定义数据类型中的compareTo方法,因为每个自定义数据类型都要实现compareTo方法,如果GroupingComparatorClass的比较方法的优先级不高,那么这个类根本就没什么用。

  • 6
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值