1.UDF(一对一)
临时函数:
- 在idea中创建一个maven工程,然后导入依赖:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
- 建一个类,然后继承UDF,并实现evaluate方法:
package test;
import org.apache.hadoop.hive.ql.exec.UDF;
/**
* @author :xiaotao
* @date :2021/3/10 14:27
* @description: hive自定义UDF函数
* 继承UDF
* 实现evaluate方法
* 输入一条数据 转化为小写 然后输出
*/
public class Lower extends UDF {
public String evaluate(final String a) {
if (a == null) {
return null;
}
return a.toLowerCase();
}
}
- 打包,将打好的jar包上传到hive所在的节点,任意位置
/root/hive-test-1.0-SNAPSHOT.jar
- 将jar包添加到hive的classpath,命令:add jar /root/hive-test-1.0-SNAPSHOT.jar;
0: jdbc:hive2://wxt01:10000> add jar /root/hive-test-1.0-SNAPSHOT.jar;
Added [/root/Lower.jar] to class path
Added resources: [/root/Lower.jar]
No rows affected (0.142 seconds)
- 创建临时函数与开发好的java class(全类名)关联
0: jdbc:hive2://wxt01:10000> create temporary function mylower as "test.Lower";
OK
No rows affected (0.184 seconds)
表test数据如下:
0: jdbc:hive2://wxt01:10000> select * from test;
OK
+------------+--+
| test.a |
+------------+--+
| HBASE |
| ZOOKEEPER |
| JAVA |
| HADOOP |
| SPARK |
| FILNK |
| MYSQL |
| SCALA |
| HDFS |
| YARN |
| FLUME |
| SQOOP |
| KAFKA |
+------------+--+
13 rows selected (1.209 seconds)
- 使用自定义的函数mylower()
0: jdbc:hive2://wxt01:10000> select mylower(a) from test;
OK
+------------+--+
| _c0 |
+------------+--+
| hbase |
| zookeeper |
| java |
| hadoop |
| spark |
| filnk |
| mysql |
| scala |
| hdfs |
| yarn |
| flume |
| sqoop |
| kafka |
+------------+--+
13 rows selected (2.874 seconds)
永久函数:
-
将打的jar包上传到hdfs中
dfs -put /root/hive-test-1.0-SNAPSHOT.jar /UDF/;
0: jdbc:hive2://wxt01:10000> dfs -put /root/hive-test-1.0-SNAPSHOT.jar /UDF/;
+-------------+--+
| DFS Output |
+-------------+--+
+-------------+--+
No rows selected (6.961 seconds)
-
创建永久函数
create function mylowers as ‘test.Lower’ using jar ‘hdfs://wxt01:9000/UDF/hive-test-1.0-SNAPSHOT.jar’;
0: jdbc:hive2://wxt01:10000> create function mylowers as 'test.Lower' using jar 'hdfs://wxt01:9000/UDF/hive-test-1.0-SNAPSHOT.jar';
Added [/tmp/88d7049c-7e87-49eb-8672-a4a1936e45e5_resources/hive-test-1.0-SNAPSHOT.jar] to class path
Added resources: [hdfs://wxt01:9000/UDF/hive-test-1.0-SNAPSHOT.jar]
OK
No rows affected (8.876 seconds)
- 验证
0: jdbc:hive2://wxt01:10000> select mylowers(a) from test;
OK
+------------+--+
| _c0 |
+------------+--+
| hbase |
| zookeeper |
| java |
| hadoop |
| spark |
| filnk |
| mysql |
| scala |
| hdfs |
| yarn |
| flume |
| sqoop |
| kafka |
+------------+--+
13 rows selected (4.951 seconds)
- 永久函数的删除
drop function mylowers;
0: jdbc:hive2://wxt01:10000> drop function mylowers;
OK
No rows affected (0.299 seconds)
UDAF和UDTF后续补充.