hive自定义函数分为三类:
-
UDF(User-Defined-Function)普通函数, 一进一出
-
UDAF(User-Defined Aggregation Function)聚合函数,多进一出
-
UDTF(User-Defined Table-Generating Functions)表生成函数, 一进多出
UDF实例
-
新建子模块hive_demo
-
配置pom.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.hive</groupId>
<artifactId>hive-udf</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.4</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
编写一个java类,继承UDF,并重载evaluate方法。
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class Lower extends UDF{
public Text evaluate(Text s){
if(s==null){
return null;
}
return new Text(s.toString().toLowerCase());
}
}
-
打成jar包上传到服务器
-
将jar包添加到hive的classpath
hive>add JAR /home/hadoop/udf.jar;
-
创建临时函数与开发好的java class关联
create temporary function tolowercase as 'cn.itcast.cloud.hive.demo.udf.CustomUDF';
-
即可在hql中使用自定义的函数tolowercase ip
Select tolowercase(name),age from t_test;