二次排序中 map prtitioner combokey.comparetor（组合排序对比函数） reduce 和分组管理器（GroupComparetor）的运行先后顺序

最新推荐文章于 2024-02-20 23:00:53 发布

来一口98的香肠

最新推荐文章于 2024-02-20 23:00:53 发布

阅读量472

点赞数

本文链接：https://blog.csdn.net/weixin_43548589/article/details/83963220

版权

二次排序中 map prtitioner combokey.comparetor（组合排序对比函数） reduce 和分组管理器（GroupComparetor）的运行先后顺序

在这里插入图片描述

文件为txt文本内容为年份和温度如
1997 50

由上图可知
map端
运行先后顺序是map先读取一行的文本然后切割处理之后调用自定义分区类（partitioner）进行分区然后读取下一组然后切割分区。。。。

reduce
map端完全结束后
map传过来的数据首先进行排序普通排序为对key排序二次排序为对combokey排序当排序完全结束后 reduce将会从上到下的顺序读取数据例如图中最后的4组数据

============ reduce
2046 : 400
GroupComparetor 2046 : 400 2046 : 11
2046 : 11
GroupComparetor 2046 : 400 2046 : -7
2046 : -7
GroupComparetor 2046 : 400 2046 : -17

====================reduce
2067 : -17

reduce是在排好序的数据中读取数据并且在读取的过程中进行分组即每读取一行数据时判断他是否与上一组数据同一组如果不同则将它定义为新的一组

public class YearGroupComparator extends WritableComparator{
protected YearGroupComparator(){
    super(Combokey.class,true);
}
public int compare(WritableComparable o1, WritableComparable o2) {
    Combokey k1 = (Combokey)o1;
    Combokey k2 = (Combokey)o2;
    return k1.getYear()-k2.getYear();
}
}

这是上诉过程中reduce的代码当 return k1.getYear()-k2.getYear(); 的返回值为0时则判断这一组满足你所定义的分组条件他们判为同一组否则则判为不同组将定义为新组

所以在二次排序中要在map端重写分区函数否则本应该同组的数据就分在不同reduce中的不同组中