Flink 聚合函数

用户定义的聚合函数(UDAGG)将一个表(一个或多个具有一个或多个属性的行)聚合为标量值。


640?wx_fmt=png

上图显示了聚合的示例。假设您有一个包含饮料数据的表格。该表由三列的id,name和price5行。想象一下,您需要找到表中所有饮料的最高价格,即执行max()聚合。您需要检查5行中的每一行,结果将是单个数值。


用户定义的聚合函数通过扩展AggregateFunction类来实现。一个AggregateFunction作品如下。首先,它需要一个accumulator,它是保存聚合的中间结果的数据结构。通过调用createAccumulator()方法创建一个空累加器AggregateFunction。随后,accumulate()为每个输入行调用函数的方法以更新累加器。处理完所有行后,将getValue()调用该函数的方法来计算并返回最终结果。


每种方法都必须使用以下方法AggregateFunction:


  • createAccumulator()

  • accumulate()

  • getValue()


Flink的类型提取工具无法识别复杂的数据类型,例如,如果它们不是基本类型或简单的POJO。类似于ScalarFunction和TableFunction,AggregateFunction提供了指定TypeInformation结果类型(通过 AggregateFunction#getResultType())和累加器类型(通过AggregateFunction#getAccumulatorType())的方法。


除了上述方法之外,还有一些可以选择性实施的简约方法。虽然其中一些方法允许系统更有效地执行查询,但其他方法对于某些用例是强制性的。例如,merge()如果聚合函数应该应用于会话组窗口的上下文中,则该方法是必需的(当观察到“连接”它们的行时,需要连接两个会话窗口的累加器)。


所有方法AggregateFunction必须声明为public,而不是static完全按照上面提到的名称命名。该方法createAccumulator,getValue,getResultType,和getAccumulatorType在定义的AggregateFunction抽象类,而另一些则收缩的方法。为了定义聚合函数,必须扩展基类org.apache.flink.table.functions.AggregateFunction并实现一个(或多个)accumulate方法。该方法accumulate可以使用不同的参数类型重载,并支持可变参数。

 
 
/**	
  * Base class for aggregation functions. 	
  *	
  * @param <T>   the type of the aggregation result	
  * @param <ACC> the type of the aggregation accumulator. The accumulator is used to keep the	
  *             aggregated values which are needed to compute an aggregation result.	
  *             AggregateFunction represents its state using accumulator, thereby the state of the	
  *             AggregateFunction must be put into the accumulator.	
  */	
public abstract class AggregateFunction<T, ACC> extends UserDefinedFunction {	
	
  /**	
    * Creates and init the Accumulator for this [[AggregateFunction]].	
    *	
    * @return the accumulator with the initial value	
    */	
  public ACC createAccumulator(); // MANDATORY	
	
  /** Processes the input values and update the provided accumulator instance. The method	
    * accumulate can be overloaded with different custom types and arguments. An AggregateFunction	
    * requires at least one accumulate() method.	
    *	
    * @param accumulator           the accumulator which contains the current aggregated results	
    * @param [user defined inputs] the input value (usually obtained from a new arrived data).	
    */	
  public void accumulate(ACC accumulator, [user defined inputs]); // MANDATORY	
	
  /**	
    * Retracts the input values from the accumulator instance. The current design assumes the	
    * inputs are the values that have been previously accumulated. The method retract can be	
    * overloaded with different custom types and arguments. This function must be implemented for	
    * datastream bounded over aggregate.	
    *	
    * @param accumulator           the accumulator which contains the current aggregated results	
    * @param [user defined inputs] the input value (usually obtained from a new arrived data).	
    */	
  public void retract(ACC accumulator, [user defined inputs]); // OPTIONAL	
	
  /**	
    * Merges a group of accumulator instances into one accumulator instance. This function must be	
    * implemented for datastream session window grouping aggregate and dataset grouping aggregate.	
    *	
    * @param accumulator  the accumulator which will keep the merged aggregate results. It should	
    *                     be noted that the accumulator may contain the previous aggregated	
    *                     results. Therefore user should not replace or clean this instance in the	
    *                     custom merge method.	
    * @param its          an [[java.lang.Iterable]] pointed to a group of accumulators that will be	
    *                     merged.	
    */	
  public void merge(ACC accumulator, java.lang.Iterable<ACC> its); // OPTIONAL	
	
  /**	
    * Called every time when an aggregation result should be materialized.	
    * The returned value could be either an early and incomplete result	
    * (periodically emitted as data arrive) or the final result of the	
    * aggregation.	
    *	
    * @param accumulator the accumulator which contains the current	
    *                    aggregated results	
    * @return the aggregation result	
    */	
  public T getValue(ACC accumulator); // MANDATORY	
	
  /**	
    * Resets the accumulator for this [[AggregateFunction]]. This function must be implemented for	
    * dataset grouping aggregate.	
    *	
    * @param accumulator  the accumulator which needs to be reset	
    */	
  public void resetAccumulator(ACC accumulator); // OPTIONAL	
	
  /**	
    * Returns true if this AggregateFunction can only be applied in an OVER window.	
    *	
    * @return true if the AggregateFunction requires an OVER window, false otherwise.	
    */	
  public Boolean requiresOver = false; // PRE-DEFINED	
	
  /**	
    * Returns the TypeInformation of the AggregateFunction's result.	
    *	
    * @return The TypeInformation of the AggregateFunction's result or null if the result type	
    *         should be automatically inferred.	
    */	
  public TypeInformation<T> getResultType = null; // PRE-DEFINED	
	
  /**	
    * Returns the TypeInformation of the AggregateFunction's accumulator.	
    *	
    * @return The TypeInformation of the AggregateFunction's accumulator or null if the	
    *         accumulator type should be automatically inferred.	
    */	
  public TypeInformation<T> getAccumulatorType = null; // PRE-DEFINED	
}

640?wx_fmt=jpeg


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值