Hadoop的GroupComparator是如何起做用的（源码分析）

最新推荐文章于 2020-11-07 00:09:51 发布

cnkxy68446

最新推荐文章于 2020-11-07 00:09:51 发布

阅读量116

点赞数

文章标签：大数据 java

目标：弄明白，我们配置的GroupComparator是如何对进入reduce函数中的key Iterable<value> 进行影响。
如下是一个配置了 GroupComparator 的reduce 函数。具体影响是我们可以在自定义的 GroupComparator 中确定哪儿些value组成一组，进入一个reduce函数

点击(此处)折叠或打开

public static class DividendGrowthReducer extends Reducer<Stock, DoubleWritable, NullWritable, DividendChange> {
private NullWritable outputKey = NullWritable.get();
private DividendChange outputValue = new DividendChange();
@Override
protected void reduce(Stock key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double previousDividend = 0.0;
for(DoubleWritable dividend : values) {
double currentDividend = dividend.get();
double growth = currentDividend - previousDividend;
if(Math.abs(growth) > 0.000001) {
outputValue.setSymbol(key.getSymbol());
outputValue.setDate(key.getDate());
outputValue.setChange(growth);
context.write(outputKey, outputValue);
previousDividend = currentDividend;
}
}
}
}

着先我们找到向上找，是谁调用了我们写的这个reduce函数。 Reducer类的run 方法。通过如下代码，可以看到是在run方法中，对于每个key，调用一次reduce函数。
此处传入reduce函数的都是对象引用。

点击(此处)折叠或打开

/**
* Advanced application writers can use the
* {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
* control how the reduce task works.
*/
public void run(Context context) throws IOException, InterruptedException {
.....
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
.....
}
.....
}
}

结合我们写的reduce函数，key是在遍历value的时候会对应变化。
那我们继续跟踪context.getValues 得到的迭代器的next方法。context 此处是ReduceContext.java （接口）. 对应的实现类为 ReduceContextImpl.java

点击(此处)折叠或打开

protected class ValueIterable implements Iterable<VALUEIN> {
private ValueIterator iterator = new ValueIterator();
@Override
public Iterator<VALUEIN> iterator() {
return iterator;
}
}
/**
* Iterate through the values for the current key, reusing the same value
* object, which is stored in the context.
* @return the series of values associated with the current key. All of the
* objects returned directly and indirectly from this method are reused.
*/
public
Iterable<VALUEIN> getValues() throws IOException, InterruptedException {
return iterable;
}

直接返回了一个iterable。继续跟踪 ValueIterable 类型的iterable。那明白了，在reduce 函数中进行Iterable的遍历，其实调用的是ValueIterable的next方法。下面看一下next的实现。

点击(此处)折叠或打开

@Override
public VALUEIN next() {
………………
nextKeyValue();
return value;
………………
}

再继续跟踪nextKeyValue()方法。终于找了一个comparator。这个就是我们配置的GroupingComparator.

点击(此处)折叠或打开

@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
……………………………………
if (hasMore) {
nextKey = input.getKey();
nextKeyIsSame = comparator.compare(currentRawKey.getBytes(), 0,
currentRawKey.getLength(),
nextKey.getData(),
nextKey.getPosition(),
nextKey.getLength() - nextKey.getPosition()
) == 0;
} else {
nextKeyIsSame = false;
}
inputValueCounter.increment(1);
return true;
}

为了证明这个就是我们配置的 GroupingComparator。跟踪ReduceContextImpl的构造调用者。 ReduceTask的run方法。

点击(此处)折叠或打开

@Override
@SuppressWarnings("unchecked")
public void run(JobConf job, final TaskUmbilicalProtocol umbilical){
………………………………
RawComparator comparator = job.getOutputValueGroupingComparator();
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}

下面把 runNewReducer 的代码也贴出来。

点击(此处)折叠或打开

void runNewReducer(JobConf job,
final TaskUmbilicalProtocol umbilical,
final TaskReporter reporter,
RawKeyValueIterator rIter,
RawComparator<INKEY> comparator,
Class<INKEY> keyClass,
Class<INVALUE> valueClass
) {
org.apache.hadoop.mapreduce.Reducer.Context
reducerContext = createReduceContext(reducer, job, getTaskID(),
rIter, reduceInputKeyCounter,
reduceInputValueCounter,
trackedRW,
committer,
reporter, comparator, keyClass,
valueClass);

好吧，关于自定义GroupingComparator如何起做用的代码分析，就到此吧。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/30066956/viewspace-2095520/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/30066956/viewspace-2095520/

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。