-
Hive内置函数
Hive为我们提供了很多内置函数
查看方式:
- 官方地址
- 启动hive后通过命令查看函数
SHOW FUNCTIONS; DESCRIBE FUNCTION <function_name>; DESCRIBE FUNCTION EXTENDED <function_name>;
hive> show functions; ... Time taken: 1.933 seconds, Fetched: 220 row(s) hive> DESCRIBE FUNCTION case; OK CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f Time taken: 0.009 seconds, Fetched: 1 row(s) hive> DESCRIBE FUNCTION EXTENDED case; OK CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f Example: SELECT CASE deptno WHEN 1 THEN Engineering WHEN 2 THEN Finance ELSE admin END, CASE zone WHEN 7 THEN Americas ELSE Asia-Pac END FROM emp_details Time taken: 0.005 seconds, Fetched: 13 row(s)
-
自定义函数
临时函数
-
自定义UDF需要继承org.apache.hadoop.hive.ql.UDF
-
需要实现evaluate函数,evaluate函数支持重载。(注意:在一个库中创建的UDF函数,不能在另一个库中使用 )
package com.cnyimi.udf; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; @Description(name = "myadd", value = "myadd(int a , int b) ==> return a + b ", extended = "Example:\n" + " myadd(1,1) ==> 2 \n" + " myadd(1,2,3) ==> 6;") public class AddUDF extends UDF { public int evaluate(int a ,int b) { return a + b ; } public int evaluate(int a ,int b , int c) { return a + b + c; } }
-
将程序打成jar包放到目标机器上去
cmd>cd {classes所在目录} cmd>jar cvf HiveDemo.jar -C x/x/x/x/classes/ .
或 使用maven插件打包
-
进入hive客户端,添加jar包
hive> add jar /root/test/HiveDemo.jar; 或 (使用hdfs) hive> add jar hdfs://nameservice1/user/hive/udf/HiveDemo.jar;
-
创建临时函数
hive> CREATE TEMPORARY FUNCTION add_example AS 'com.cnyimi.udf.AddUDF';
-
查询HQL语句
SELECT add_example(8, 9) FROM scores; SELECT add_example(scores.math, scores.art) FROM scores; SELECT add_example(6, 7, 8, 6.8) FROM scores;
-
销毁临时函数
hive> DROP TEMPORARY FUNCTION add_example;
注意:
UDF只能实现一进一出
的操作,如果需要实现多进一出,则需要实现UDAF
;
一进多出,则需要实现UDTF
。永久函数
- 自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。(注意该类的包名必须是org.apache.hadoop.hive.ql.udf)
- 需要实现evaluate函数,evaluate函数支持重载。
- 修改FunctionRegistry这个类,注册定义的udf
- 把udf函数编译成class放到hive-exec-0.12.0-cdh5.0.0.jar中org\apache\hadoop\hive\ql\udf 路径下面
- 将新的FunctionRegistry编译后的class文件替换hive-exec-0.12.0-cdh5.0.0.jar中的org.apache.hadoop.hive.ql.exec
或
-
将jar包上传到hdfs上
[root@node00 ~]# hdfs dfs -put /root/test/substring_index_UDF.jar /user/hive/udf [root@node00 ~]# hdfs dfs -ls /user/hive/udf Found 1 items -rw-r--r-- 3 yarn hive 24859 2018-12-21 20:48 /user/hive/udf/substring_index_UDF.jar
-
创建永久函数
hive> CREATE FUNCTION substring_index AS 'com.hnb.data.hive.SubStringIndexUDF' using jar 'hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar';
注意:执行这条语句创建永久函数,show functiuons 会加上默认的数据库名在函数名前。(default.substring_index)
-
在MySQL中查询创建的自定义函数
mysql> use hive; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from funcs; ERROR 1146 (42S02): Table 'hive.funcs' doesn't exist mysql> SELECT * FROM FUNCS; +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+ | FUNC_ID | CLASS_NAME | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE | +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+ | 1 | com.hnb.data.hive.SubStringIndexUDF | 1545016176 | 1 | substring_index | 1 | NULL | USER | +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM FUNC_RU; +---------+---------------+-----------------------------------------------------------+-------------+ | FUNC_ID | RESOURCE_TYPE | RESOURCE_URI | INTEGER_IDX | +---------+---------------+-----------------------------------------------------------+-------------+ | 1 | 1 | hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar | 0 | +---------+---------------+-----------------------------------------------------------+-------------+ 1 row in set (0.00 sec) mysql>
-
Hive内置函数和自定义函数UDF
最新推荐文章于 2023-05-10 19:56:02 发布