聊聊flink Table的Distinct Aggregation

 

本文主要研究一下flink Table的Distinct Aggregation

实例

 
  1. //Distinct can be applied to GroupBy Aggregation, GroupBy Window Aggregation and Over Window Aggregation.

  2. Table orders = tableEnv.scan("Orders");

  3. // Distinct aggregation on group by

  4. Table groupByDistinctResult = orders

  5. .groupBy("a")

  6. .select("a, b.sum.distinct as d");

  7. // Distinct aggregation on time window group by

  8. Table groupByWindowDistinctResult = orders

  9. .window(Tumble.over("5.minutes").on("rowtime").as("w")).groupBy("a, w")

  10. .select("a, b.sum.distinct as d");

  11. // Distinct aggregation on over window

  12. Table result = orders

  13. .window(Over

  14. .partitionBy("a")

  15. .orderBy("rowtime")

  16. .preceding("UNBOUNDED_RANGE")

  17. .as("w"))

  18. .select("a, b.avg.distinct over w, b.max over w, b.min over w");

  19.  
  20. //User-defined aggregation function can also be used with DISTINCT modifiers

  21. Table orders = tEnv.scan("Orders");

  22. // Use distinct aggregation for user-defined aggregate functions

  23. tEnv.registerFunction("myUdagg", new MyUdagg());

  24. orders.groupBy("users").select("users, myUdagg.distinct(points) as myDistinctResult");

  • Distinct Aggregation可以用于内置的及自定义的aggregation function;内置的aggregation function诸如GroupBy Aggregation、GroupBy Window Aggregation、Over Window Aggregation

AggregateFunction

flink-table_2.11-1.7.0-sources.jar!/org/apache/flink/table/functions/AggregateFunction.scala

 
  1. /**

  2. * Base class for User-Defined Aggregates.

  3. *

  4. * The behavior of an [[AggregateFunction]] can be defined by implementing a series of custom

  5. * methods. An [[AggregateFunction]] needs at least three methods:

  6. * - createAccumulator,

  7. * - accumulate, and

  8. * - getValue.

  9. *

  10. * There are a few other methods that can be optional to have:

  11. * - retract,

  12. * - merge, and

  13. * - resetAccumulator

  14. *

  15. * All these methods must be declared publicly, not static and named exactly as the names

  16. * mentioned above. The methods createAccumulator and getValue are defined in the

  17. * [[AggregateFunction]] functions, while other methods are explained below.

  18. *

  19. *

  20. * {{{

  21. * Processes the input values and update the provided accumulator instance. The method

  22. * accumulate can be overloaded with different custom types and arguments. An AggregateFunction

  23. * requires at least one accumulate() method.

  24. *

  25. * @param accumulator the accumulator which contains the current aggregated results

  26. * @param [user defined inputs] the input value (usually obtained from a new arrived data).

  27. *

  28. * def accumulate(accumulator: ACC, [user defined inputs]): Unit

  29. * }}}

  30. *

  31. *

  32. * {{{

  33. * Retracts the input values from the accumulator instance. The current design assumes the

  34. * inputs are the values that have been previously accumulated. The method retract can be

  35. * overloaded with different custom types and arguments. This function must be implemented for

  36. * datastream bounded over aggregate.

  37. *

  38. * @param accumulator the accumulator which contains the current aggregated results

  39. * @param [user defined inputs] the input value (usually obtained from a new arrived data).

  40. *

  41. * def retract(accumulator: ACC, [user defined inputs]): Unit

  42. * }}}

  43. *

  44. *

  45. * {{{

  46. * Merges a group of accumulator instances into one accumulator instance. This function must be

  47. * implemented for datastream session window grouping aggregate and dataset grouping aggregate.

  48. *

  49. * @param accumulator the accumulator which will keep the merged aggregate results. It should

  50. * be noted that the accumulator may contain the previous aggregated

  51. * results. Therefore user should not replace or clean this instance in the

  52. * custom merge method.

  53. * @param its an [[java.lang.Iterable]] pointed to a group of accumulators that will be

  54. * merged.

  55. *

  56. * def merge(accumulator: ACC, its: java.lang.Iterable[ACC]): Unit

  57. * }}}

  58. *

  59. *

  60. * {{{

  61. * Resets the accumulator for this [[AggregateFunction]]. This function must be implemented for

  62. * dataset grouping aggregate.

  63. *

  64. * @param accumulator the accumulator which needs to be reset

  65. *

  66. * def resetAccumulator(accumulator: ACC): Unit

  67. * }}}

  68. *

  69. *

  70. * @tparam T the type of the aggregation result

  71. * @tparam ACC the type of the aggregation accumulator. The accumulator is used to keep the

  72. * aggregated values which are needed to compute an aggregation result.

  73. * AggregateFunction represents its state using accumulator, thereby the state of the

  74. * AggregateFunction must be put into the accumulator.

  75. */

  76. abstract class AggregateFunction[T, ACC] extends UserDefinedFunction {

  77. /**

  78. * Creates and init the Accumulator for this [[AggregateFunction]].

  79. *

  80. * @return the accumulator with the initial value

  81. */

  82. def createAccumulator(): ACC

  83.  
  84. /**

  85. * Called every time when an aggregation result should be materialized.

  86. * The returned value could be either an early and incomplete result

  87. * (periodically emitted as data arrive) or the final result of the

  88. * aggregation.

  89. *

  90. * @param accumulator the accumulator which contains the current

  91. * aggregated results

  92. * @return the aggregation result

  93. */

  94. def getValue(accumulator: ACC): T

  95.  
  96. /**

  97. * Returns true if this AggregateFunction can only be applied in an OVER window.

  98. *

  99. * @return true if the AggregateFunction requires an OVER window, false otherwise.

  100. */

  101. def requiresOver: Boolean = false

  102.  
  103. /**

  104. * Returns the TypeInformation of the AggregateFunction's result.

  105. *

  106. * @return The TypeInformation of the AggregateFunction's result or null if the result type

  107. * should be automatically inferred.

  108. */

  109. def getResultType: TypeInformation[T] = null

  110.  
  111. /**

  112. * Returns the TypeInformation of the AggregateFunction's accumulator.

  113. *

  114. * @return The TypeInformation of the AggregateFunction's accumulator or null if the

  115. * accumulator type should be automatically inferred.

  116. */

  117. def getAccumulatorType: TypeInformation[ACC] = null

  118. }

  • AggregateFunction继承了UserDefinedFunction;它有两个泛型,一个T表示value的泛型,一个ACC表示Accumulator的泛型;它定义了createAccumulator、getValue、getResultType、getAccumulatorType方法(这几个方法中子类必须实现createAccumulator、getValue方法)
  • 对于AggregateFunction,有一个accumulate方法这里没定义,但是需要子类定义及实现,该方法接收ACC,T两个参数,返回void;另外还有retract、merge、resetAccumulator三个方法是可选的,需要子类根据情况去定义及实现
  • 对于datastream bounded over aggregate操作,要求实现restract方法,该方法接收ACC,T两个参数,返回void;对于datastream session window grouping aggregate以及dataset grouping aggregate操作,要求实现merge方法,该方法接收ACC,java.lang.Iterable<T>两个参数,返回void;对于dataset grouping aggregate操作,要求实现resetAccumulator方法,该方法接收ACC参数,返回void

小结

  • Table的Distinct Aggregation可以用于内置的及自定义的aggregation function;内置的aggregation function诸如GroupBy Aggregation、GroupBy Window Aggregation、Over Window Aggregation
  • AggregateFunction继承了UserDefinedFunction;它有两个泛型,一个T表示value的泛型,一个ACC表示Accumulator的泛型;它定义了createAccumulator、getValue、getResultType、getAccumulatorType方法(这几个方法中子类必须实现createAccumulator、getValue方法)
  • 对于AggregateFunction,有一个accumulate方法这里没定义,但是需要子类定义及实现,该方法接收ACC,T两个参数,返回void;另外还有retract、merge、resetAccumulator三个方法是可选的,需要子类根据情况去定义及实现(对于datastream bounded over aggregate操作,要求实现restract方法,该方法接收ACC,T两个参数,返回void;对于datastream session window grouping aggregate以及dataset grouping aggregate操作,要求实现merge方法,该方法接收ACC,java.lang.Iterable\<T\>两个参数,返回void;对于dataset grouping aggregate操作,要求实现resetAccumulator方法,该方法接收ACC参数,返回void)

doc

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值