模式的定义
apache-hive-1.2.1-src\apache-hive-1.2.1-src\ql\src\Java\org\apache\Hadoop\hive\ql\udf\generic\GenericUDAFEvaluator.java
原码如下:
public static enum Mode {
/**
* PARTIAL1: from original data to partial aggregation data: iterate() and
* terminatePartial() will be called.
*/
PARTIAL1,
/**
* PARTIAL2: from partial aggregation data to partial aggregation data:
* merge() and terminatePartial() will be called.
*/
PARTIAL2,
/**
* FINAL: from partial aggregation to full aggregation: merge() and
* terminate() will be called.
*/
FINAL,
/**
* COMPLETE: from original data directly to full aggregation: iterate() and
* terminate() will be called.
*/
COMPLETE
};
UDAF中需要实现的数据处理函数
iterate() 无返回值
terminatePartial() 有返回值
merge() 无返回值
terminate() 有返回值
无返回值的函数叫作aggregate
有返回值的函数叫作evaluate
在代码中的体现是
/**
* This function will be called by GroupByOperator when it sees a new input
* row.
*
* @param agg
* The object to store the aggregation result.
* @param parameters
* The row, can be inspected by the OIs passed in init().
*/
public void aggregate(AggregationBuffer agg, Object[] parameters) throws HiveException {
if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {
iterate(agg, parameters);
} else {
assert (parameters.length == 1);
merge(agg, parameters[0]);
}
}
/**
* This function will be called by GroupByOperator when it sees a new input
* row.
*
* @param agg
* The object to store the aggregation result.
*/
public Object evaluate(AggregationBuffer agg) throws HiveException {
if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {
return terminatePartial(agg);
} else {
return terminate(agg);
}
}
四种模式下的输入数据,与可能调用的数据处理函数的关系如下图
每个模式下,输入数据的类型是不会变的,而调用的数据处理函数都有两种可能。
partial1的输入只可能是原始数据;
partial2的输入只可能是部分聚合结果;
final的输入是部分聚合数据;
complete的输入是原始数据;
terminatePartial()与terminate()的输入是有两种可能性的,要按照模式来区分处理。