Functions
原文
https://prestodb.io/docs/current/develop/functions.html
Plugin Implementation
为实现新的function,必须写一个plugin,返回从getFunctions()返回更多的函数:
public class ExampleFunctionsPlugin
implements Plugin
{
@Override
public Set<Class<?>> getFunctions()
{
return ImmutableSet.<Class<?>>builder()
.add(ExampleNullFunction.class)
.add(IsNullFunction.class)
.add(IsEqualOrNullFunction.class)
.add(ExampleStringFunction.class)
.add(ExampleAverageFunction.class)
.build();
}
}
ImmutableSet 类是来自的Guava的工具类。源码包中presto-ml
模块中是 machine learning functions,presto-teradata-functions中是Teradata-compatible functions
Scalar Function Implementation
function框架使用annotations注解标识函数的相关信息,包括 name, description, return type and parameter types,下面是is_null函数的实现:
public class ExampleNullFunction
{
@ScalarFunction("is_null")
@Description("Returns TRUE if the argument is NULL")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNull(@SqlNullable @SqlType(StandardTypes.VARCHAR) Slice string)
{
return (string == null);
}
}
is_null 输入一个VARCHAR
argument 返回一个BOOLEAN, 输入参数的类型为 Slice
. VARCHAR
uses Slice,本质是一个byte[]
-
@SqlType
:@SqlType注解用于声明返回类型, argument types. 返回type 必须、java code的arguments必须和native container types 相一致
-
@SqlNullable
:@SqlNullable
注解说明argument可能是null,否则默认当入参为null,即返回null。当使用一个type ,这个type有primitive native container type ,如BigintType,use the object wrapper for the native container type when using@SqlNullable。如果当参数不为null,但返回值可以为null时,必须用
@SqlNullable
注解。
Parametric Scalar Functions
有类型参数的Scalar functions 实现时要复杂些,使得上面的例子可以作用于任意类型的参数:
@ScalarFunction(name = "is_null")
@Description("Returns TRUE if the argument is NULL")
public final class IsNullFunction
{
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNullSlice(@SqlNullable @SqlType("T") Slice value)
{
return (value == null);
}
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNullLong(@SqlNullable @SqlType("T") Long value)
{
return (value == null);
}
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isNullDouble(@SqlNullable @SqlType("T") Double value)
{
return (value == null);
}
// ...and so on for each native container type
}
-
@TypeParameter
:@TypeParameter
annotation 用于声明一个type parameter,这个用于@SqlType
annotation,或者函数的返回值类型,也可以用于annotate一个Type的type参数。运行时,engine将具体的类型绑定到这个参数中。 -
@OperatorDependency
可以用于声明将一个函数参数(an additional function for operating on the given type parameter is needed), -
例如下面将一个equals function绑定:
@ScalarFunction(name = "is_equal_or_null")
@Description("Returns TRUE if arguments are equal or both NULL")
public final class IsEqualOrNullFunction
{
@TypeParameter("T")
@SqlType(StandardTypes.BOOLEAN)
public static boolean isEqualOrNullSlice(
@OperatorDependency(operator = OperatorType.EQUAL, returnType = StandardTypes.BOOLEAN, argumentTypes = {"T", "T"}) MethodHandle equals,
@SqlNullable @SqlType("T") Slice value1,
@SqlNullable @SqlType("T") Slice value2)
{
if (value1 == null && value2 == null) {
return true;
}
if (value1 == null || value2 == null) {
return false;
}
return (boolean) equals.invokeExact(value1, value2);
}
// ...and so on for each native container type
}
Another Scalar Function Example
The lowercaser
function takes a single VARCHAR
argument and returns a VARCHAR
, which is the argument converted to lower case:
public class ExampleStringFunction
{
@ScalarFunction("lowercaser")
@Description("converts the string to alternating case")
@SqlType(StandardTypes.VARCHAR)
public static Slice lowercaser(@SqlType(StandardTypes.VARCHAR) Slice slice)
{
String argument = slice.toStringUtf8();
return Slices.utf8Slice(argument.toLowerCase());
}
}
Note that for most common string functions, including converting a string to lower case, the Slice library also provides implementations that work directly on the underlying byte[]
, which have much better performance. This function has no @SqlNullable
annotations, meaning that if the argument is NULL
, the result will automatically be NULL
(the function will not be called).
Aggregation Function Implementation
聚合函数略复杂.
-
AccumulatorState
:所有的聚合函数对输入的rows计算成一个 state object,这个object实现AccumulatorState,对于简单的aggregations,仅仅 extend
AccumulatorState
为一个新的 interface (有getters and setters you want),framework将为你实现所有的实现和序列化。如果需要更复杂的state object,需要通过注解AccumulatorStateMetadata实现AccumulatorStateFactory和AccumulatorStateSerializer。
The following code implements the aggregation function avg_double
which computes the average of a DOUBLE
column:
@AggregationFunction("avg_double")
public class AverageAggregation
{
@InputFunction
public static void input(LongAndDoubleState state, @SqlType(StandardTypes.DOUBLE) double value)
{
state.setLong(state.getLong() + 1);
state.setDouble(state.getDouble() + value);
}
@CombineFunction
public static void combine(LongAndDoubleState state, LongAndDoubleState otherState)
{
state.setLong(state.getLong() + otherState.getLong());
state.setDouble(state.getDouble() + otherState.getDouble());
}
@OutputFunction(StandardTypes.DOUBLE)
public static void output(LongAndDoubleState state, BlockBuilder out)
{
long count = state.getLong();
if (count == 0) {
out.appendNull();
}
else {
double value = state.getDouble();
DOUBLE.writeDouble(out, value / count);
}
}
}
The average has two parts: the sum of the DOUBLE
in each row of the column and the LONG
count of the number of rows seen. LongAndDoubleState
is an interface which extends AccumulatorState
:
public interface LongAndDoubleState
extends AccumulatorState
{
long getLong();
void setLong(long value);
double getDouble();
void setDouble(double value);
}
更详细的说明下上面的一个annotations:
-
@InputFunction
:The
@InputFunction
annotation declares the function which accepts input rows and stores them in theAccumulatorState
. Similar to scalar functions you must annotate the arguments with@SqlType
. Note that, unlike in the above scalar example whereSlice
is used to holdVARCHAR
, the primitivedouble
type is used for the argument to input. In this example, the input function simply keeps track of the running count of rows (viasetLong()
) and the running sum (viasetDouble()
). -
@CombineFunction
:The
@CombineFunction
annotation declares the function used to combine two state objects. This function is used to merge all the partial aggregation states. It takes two state objects, and merges the results into the first one (in the above example, just by adding them together). -
@OutputFunction
:The
@OutputFunction
is the last function called when computing an aggregation. It takes the final state object (the result of merging all partial states) and writes the result to aBlockBuilder
. -
Where does serialization happen, and what is
GroupedAccumulatorState
?The
@InputFunction
is usually run on a different worker from the@CombineFunction
, so the state objects are serialized and transported between these workers by the aggregation framework.GroupedAccumulatorState
is used when performing aGROUP BY
aggregation, and an implementation will be automatically generated for you, if you don’t specify aAccumulatorStateFactory
-