1 User Defined Functions
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
UDF
: 一进一出UDAF(Aggregation)
: 聚集函数,多进一出,类似于 count / max /minUDTF(Table-Generating)
: 一进多出,例如 lateral view explore()
2 Hive UDF 编程步骤
- 继承
org.apache.hadoop.hive.ql.UDF
- 需要实现
evaluate
函数,evalute
函数支持重载;
2.1 注意事项
- UDF 必须要有返回类型,可以返回 null , 但是返回类型不能为 void;
- UDF 中常用的
Text, LongWritable
等类型,不推荐用 java 的类型;
3 UDF 测试
- 添加依赖
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0-cdh5.7.0</version>
</dependency>
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class LowerUDF extends UDF {
public Text evaluate(Text str) {
if (null == str.toString()) {
return null;
}
return new Text(str.toString().toLowerCase());
}
public static void main(String[] args) {
System.out.println(new LowerUDF().evaluate(new Text("HIVE")));
}
}
- maven 打包
- hive 添加 jar 包
hive (default)> add jar /home/hadoop/testhadoop-1.0.jar;
Added [/home/hadoop/testhadoop-1.0.jar] to class path
Added resources: [/home/hadoop/testhadoop-1.0.jar]
create temporary FUNCTION my_lower as "hive.LowerUDF";
SELECT ename,my_lower(ename) lowername from emp limit 5;
3.2 从 hdfs 添加 UDF jar
- 从 hive-0.13 开始
CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';