-
合并值缓冲区大小,这里是用来保存字符串长度,因此设为4byte
-
@return
*/
@Override
public int estimate() {
return JavaDataModel.PRIMITIVES1;
}
}
- 新建FieldLengthUDAFEvaluator.java,里面是整个UDAF逻辑实现,关键代码已经添加了注释,请结合前面的图片来理解,核心思路是iterate将当前分组的字段处理完毕,merger把分散的数据合并起来,再由terminate决定当前分组计算结果:
package com.bolingcavalry.hiveudf.udaf;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
/**
-
@Description: 这里是UDAF的实际处理类
-
@author: willzhao E-mail: zq2599@gmail.com
-
@date: 2020/11/4 9:57
*/
public class FieldLengthUDAFEvaluator extends GenericUDAFEvaluator {
PrimitiveObjectInspector inputOI;
ObjectInspector outputOI;
PrimitiveObjectInspector integerOI;
/**
-
每个阶段都会被执行的方法,
-
这里面主要是把每个阶段要用到的输入输出inspector好,其他方法被调用时就能直接使用了
<