hive GenericUDAF中的四种模式解析

最新推荐文章于 2021-01-01 21:04:58 发布

conggova

最新推荐文章于 2021-01-01 21:04:58 发布

阅读量924

点赞数

分类专栏：大数据技术文章标签： apache hive udaf udf

本文链接：https://blog.csdn.net/conggova/article/details/77799145

版权

大数据技术专栏收录该内容

10 篇文章 0 订阅

订阅专栏

模式的定义

apache-hive-1.2.1-src\apache-hive-1.2.1-src\ql\src\Java\org\apache\Hadoop\hive\ql\udf\generic\GenericUDAFEvaluator.java
原码如下：

 public static enum Mode {
    /**
     * PARTIAL1: from original data to partial aggregation data: iterate() and
     * terminatePartial() will be called.
     */
    PARTIAL1,
        /**
     * PARTIAL2: from partial aggregation data to partial aggregation data:
     * merge() and terminatePartial() will be called.
     */
    PARTIAL2,
        /**
     * FINAL: from partial aggregation to full aggregation: merge() and
     * terminate() will be called.
     */
    FINAL,
        /**
     * COMPLETE: from original data directly to full aggregation: iterate() and
     * terminate() will be called.
     */
    COMPLETE
  };

UDAF中需要实现的数据处理函数

iterate() 无返回值
terminatePartial() 有返回值
merge() 无返回值
terminate() 有返回值
无返回值的函数叫作aggregate
有返回值的函数叫作evaluate
在代码中的体现是

  /**
   * This function will be called by GroupByOperator when it sees a new input
   * row.
   * 
   * @param agg
   *          The object to store the aggregation result.
   * @param parameters
   *          The row, can be inspected by the OIs passed in init().
   */
  public void aggregate(AggregationBuffer agg, Object[] parameters) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {
      iterate(agg, parameters);
    } else {
      assert (parameters.length == 1);
      merge(agg, parameters[0]);
    }
  }


  /**
   * This function will be called by GroupByOperator when it sees a new input
   * row.
   * 
   * @param agg
   *          The object to store the aggregation result.
   */
  public Object evaluate(AggregationBuffer agg) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {
      return terminatePartial(agg);
    } else {
      return terminate(agg);
    }
  }