首先,已经在job上设置了分组比较器,而且继承了WritableComparator
job.setGroupingComparatorClass(MyGroupingComparator.class);
后来发现虽然我的MyGroupingComparator 继承了WritableComparator,但是重写方法错了:
不应该重写参数为Object类型的方法
public int compare(Object o11, Object o22) {...}
因为在WritableComparator中
compare(Object a, Object b)会调用compare(WritableComparable a, WritableComparable b),如下:
/** Compare two WritableComparables.
*
* <p> The default implementation uses the natural ordering, calling {@link
* Comparable#compareTo(Object)}. */
@SuppressWarnings("unchecked")
public int compare(WritableComparable a, WritableComparable b) {
return a.compareTo(b);
}
@Override
public int compare(Object a, Object b) {
return compare((WritableComparable)a, (WritableComparable)b);
}
所以,如果没有重写compare(WritableComparable a, WritableComparable b)的话,就会调用a.compareTo(b);
reduce执行到GroupingComparator时,上述的a, b 值都是传入的自定义的MapKey类型,那么就会调用我传入的自定义的keyClass的比较器了。
所以!
应该重写参数为WritableComparable 类型的方法
public int compare(WritableComparable a, WritableComparable b){...}
哎,检查了半天,原来是重写方法名错了,不过也算帮我熟悉了mr的这个过程,毕竟还是深入到源码了。。。