https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
User-Defined Functions 场景:
1.build-in函数满足不了我们的需求,所以我们需要扩展我们自己的函数
2.RDBMS ==> Hive
UDF/UDAF/UDTF
UDF: one-to-one eg:进来一个出去一个substr() -->
UDAF: User-Defined Aggregation Function eg:进去多个出来一个 sum(),count()…
UDTF:Table-Generating Functions 一对多
UDF定义步骤
-
Creating Custom UDFs
First, you need to create a new class that extends UDF, with one or more methods named evaluate. -
After compiling your code to a jar, you need to add this to the Hive classpath. See the section below on deploying jars.
-
In order to start using your UDF, you first need to add the code to the classpath
简单示例:
package com.ruozedata.bigdata.HiveUDF;
import org.apache.hadoop.hive.ql.exec.UDF;
public class FirstUDF extends UDF{
public String evaluate(String name){
return new String(name+",hello!");
}
public static void main(String[] args) {
FirstUDF firstUDF = new FirstUDF();
System.out.println(firstUDF.evaluate("wzj"));
}
}
打包上传,添加路径(***如果不想使用add jar xx,可以将jar包传至$HIVE_HOME/auxlib/路径下面,直接去创建函数)
hive (default)> add jar /home/wzj/lib/firstudf-hive-1.0.jar
> ;
Added [/home/wzj/lib/firstudf-hive-1.0.jar] to class path
Added resources: [/home/wzj/lib/firstudf-hive-1.0.jar]
hive (default)> list jars
> ;
/home/wzj/lib/firstudf-hive-1.0.jar
创建临时函数(***只对当前session有效)
hive (default)> create temporary function firstudf as 'com.ruozedata.bigdata.HiveUDF.FirstUDF';
OK
Time taken: 0.024 seconds
全局生效
- jar包上传至hdfs
In Hive 0.13 or later, functions can be registered to the metastore, so they can be referenced in a query without having to create a temporary function each session.
0.13之后可以将信息加载进元数据
mysql> select * from funcs;
Empty set (0.00 sec)
[wzj@hadoop001 lib]$ hadoop fs -mkdir -p /hiveudf/
20/04/02 00:42:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[wzj@hadoop001 lib]$ hadoop fs -put /home/wzj/lib/firstudf-hive-1.0.jar /hiveudf/
20/04/02 00:42:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hive (wzj)> CREATE FUNCTION firstudf2 AS 'com.ruozedata.bigdata.HiveUDF.FirstUDF' USING JAR 'hdfs://hadoop001:9000/hiveudf/firstudf-hive-1.0.jar';
converting to local hdfs://hadoop001:9000/hiveudf/firstudf-hive-1.0.jar
Added [/tmp/9cd160b9-d384-41d6-8740-595e73ff7bce_resources/firstudf-hive-1.0.jar] to class path
Added resources: [hdfs://hadoop001:9000/hiveudf/firstudf-hive-1.0.jar]
OK
Time taken: 0.581 seconds
加载之后
mysql> select * from funcs;
+---------+----------------------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+----------------------------------------+-------------+-------+-----------+-----------+------------+------------+
| 1 | com.ruozedata.bigdata.HiveUDF.FirstUDF | 1585801503 | 6 | firstudf2 | 1 | NULL | USER |
+---------+----------------------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.00 sec)
hive (wzj)> select firstudf2("wzj") from test;
converting to local hdfs://hadoop001:9000/hiveudf/firstudf-hive-1.0.jar
Added [/tmp/cecb2563-d4e6-4f58-bb5f-cddc773fa921_resources/firstudf-hive-1.0.jar] to class path
Added resources: [hdfs://hadoop001:9000/hiveudf/firstudf-hive-1.0.jar]
OK
_c0
wzj,hello!
wzj,hello!
wzj,hello!
wzj,hello!
Time taken: 1.495 seconds, Fetched: 4 row(s)
欢迎关注公众号,一起愉快的交流