UDFs:
Phoenix 4.4.0 引入 UDFs
UDFs类型
temporary UDFs : session/connection
permanent UDFs : meta information 存储在系统表
domain-specific scala UDFs :
配置:
配置
在phoenxi 客户端的hbase-site.xml添加如下配置
<property>
<name>phoenix.functions.allowUserDefinedFunctions</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>${hbase.tmp.dir}/hbase</value>
<description>The directory shared by region servers and into
which HBase persists. The URL should be 'fully-qualified'
to include the filesystem scheme. For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase. By default, we write
to whatever ${hbase.tmp.dir} is set too -- usually /tmp --
so change this configuration or else all data will be lost on
machine restart.</description>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib</value>
<description>
The directory from which the custom udf jars can be loaded
dynamically by the phoenix client/region server without the need to restart. However,
an already loaded udf class would not be un-loaded. See
HBASE-1936 for more details.
</description>
</property>
最后两个配置需要与hbase服务端相匹配
其他的属性配置,在JDBC连接时指定
Properties props = new Properties();
props.setProperty("phoenix.functions.allowUserDefinedFunctions", "true");
Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost", props);
以下可选参数用于类加载,从hdfs复制jar包至本地路径
<property>
<name>hbase.local.dir</name>
<value>${hbase.tmp.dir}/local/</value>
<description>Directory on the local filesystem to be used
as a local storage.</description>
</property>
创建UDF
1.实现通用UDF
a).继承org.apache.phoenix.expression.function.ScalarFunction
b).实现getDataType方法,决定返回函数的类型
c).实现evaluate方法,这个方法被调用去计算每一行的结果。
a.1)参数1 org.apache.phoenix.schema.tuple.Tuple: 表示当前行的状态
a.2)参数2 org.apache.hadoop.hbase.io.ImmutableBytesWritable:需要填充,指向函数执行的结果。
a.3) 返回 false : 表示没有足够可用去计算结果,否则返回true
以下是额外的为了优化的步骤
1.为了有能力处理 扫描 start/stop key.需要实现如下两个方法
/**
* Determines whether or not a function may be used to form
* the start/stop key of a scan
* @return the zero-based position of the argument to traverse
* into to look for a primary key column reference, or
* {@value #NO_TRAVERSAL} if the function cannot be used to
* form the scan key.
*/
public int getKeyFormationTraversalIndex() {
return NO_TRAVERSAL;
}
/**
* Manufactures a KeyPart used to construct the KeyRange given
* a constant and a comparison operator.
* @param childPart the KeyPart formulated for the child expression
* at the {@link #getKeyFormationTraversalIndex()} position.
* @return the KeyPart for constructing the KeyRange for this
* function.
*/
public KeyPart newKeyPart(KeyPart childPart) {
return null;
}
2.为了能开启order by 优化或者在合适的位置 group by
/**
* Determines whether or not the result of the function invocation
* will be ordered in the same way as the input to the function.
* Returning YES enables an optimization to occur when a
* GROUP BY contains function invocations using the leading PK
* column(s).
* @return YES if the function invocation will always preserve order for
* the inputs versus the outputs and false otherwise, YES_IF_LAST if the
* function preserves order, but any further column reference would not
* continue to preserve order, and NO if the function does not preserve
* order.
*/
public OrderPreserving preservesOrder() {
return OrderPreserving.NO;
}
2.编译成jar,上传至hdfs,把hbase.dynamic.jars.dir配置上传的jar的路径
3.CREATE FUNCTION
CREATE FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction' using jar 'hdfs:/localhost:8080/hbase/lib/myjar.jar'
CREATE FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction'
CREATE FUNCTION my_increment(integer, integer constant defaultvalue='10') returns integer as 'com.mypackage.MyIncrementFunction' using jar '/hbase/lib/myincrement.jar'
CREATE TEMPORARY FUNCTION my_reverse(varchar) returns varchar as 'com.mypackage.MyReverseFunction' using jar 'hdfs:/localhost:8080/hbase/lib/myjar.jar'
删除UDFs
DROP FUNCTION IF EXISTS my_reverse
DROP FUNCTION my_reverse