hive笔记-自定义UDF

9 篇文章 0 订阅

1、定义自己的UDF函数

package com.hihi.hive;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class HelloWord  extends UDF {
    public Text evaluate(final Text s) {
        if (s == null) { return null; }
        return new Text("HelloWord:" + s.toString().toLowerCase());
    }
}
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>study-hadoop</groupId>
  <artifactId>hive</artifactId>
  <version>1.0</version>

  <properties>
    <projcet.build.sourceEncoding>UTF-8</projcet.build.sourceEncoding>
    <hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
    <hive.version>1.1.0-cdh5.7.0</hive.version>
  </properties>

  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <dependencies>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>${hadoop.version}</version>
      </dependency>

    <dependency>
      <groupId>org.apache.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>${hive.version}</version>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.10</version>
    </dependency>
  </dependencies>
</project>

2、将代码打包,放上后台

[root@hadoop001 jar]# rz
rz waiting to receive.
Starting zmodem transfer.  Press Ctrl+C to cancel.
Transferring hive-1.0.jar...
  100%       2 KB       2 KB/sec    00:00:01       0 Errors  

[root@hadoop001 jar]# pwd
/home/hadoop/jar

3、进入hive执行命令,这命令能创建一个临时函数,该函数只适用于当前会话

hive> add jar /home/hadoop/jar/hive-1.0.jar;
Added [/home/hadoop/jar/hive-1.0.jar] to class path
Added resources: [/home/hadoop/jar/hive-1.0.jar]
hive> create temporary function my_hello as 'com.hihi.hive.HelloWord';
OK
Time taken: 0.016 seconds

hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
OK
SMITH   HelloWord:SMITH
JONES   HelloWord:JONES
SCOTT   HelloWord:SCOTT
Time taken: 0.124 seconds, Fetched: 3 row(s)

hive> list jars;
/home/hadoop/jar/hive-1.0.jar

4、查询元数据发现不存在关于函数的数据

mysql> select * from funcs;
Empty set (0.00 sec)

5、重新建立会话,由于刚刚建立的是临时函数,所以发现报错

hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
FAILED: SemanticException [Error 10011]: Line 1:14 Invalid function 'my_hello'

6、尝试创建一个永久的函数

hive> add jar /home/hadoop/jar/hive-1.0.jar;
Added [/home/hadoop/jar/hive-1.0.jar] to class path
Added resources: [/home/hadoop/jar/hive-1.0.jar]
hive> create function my_hello as 'com.hihi.hive.HelloWord';
OK
Time taken: 0.016 seconds

7、查找元数据,发现有改函数的信息,但func_ru表中却没有数据。

mysql> select * from funcs;
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME              | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
|       6 | com.hihi.hive.HelloWord |  1515675864 |     1 | my_hello  |         1 | NULL       | USER       |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.00 sec)
mysql> select * from func_ru;
Empty set (0.00 sec)

8、重新进入会话,发现调用函数还是失败

hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
FAILED: SemanticException [Error 10011]: Line 1:14 Invalid function 'my_hello'

9、尝试从HDFS导入jar包

CREATE FUNCTION my_hello AS 'com.hihi.hive.HelloWord' USING JAR 'hdfs://hadoop001:9000/jar/hive-1.0.jar';

10、查看元数据,发现func_ru现在有关于函数my_hello的数据,那是不是每次调用函数,就读取元数据重新加载jar包并创建函数呢?

mysql> select * from func_ru;
+---------+---------------+----------------------------------------+-------------+
| FUNC_ID | RESOURCE_TYPE | RESOURCE_URI                           | INTEGER_IDX |
+---------+---------------+----------------------------------------+-------------+
|      11 |             1 | hdfs://hadoop001:9000/jar/hive-1.0.jar |           0 |
+---------+---------------+----------------------------------------+-------------+
1 row in set (0.00 sec)

mysql> select * from funcs;
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME              | CREATE_TIME | DB_ID | FUNC_NAME | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
|      11 | com.hihi.hive.HelloWord |  1515676179 |     1 | my_hello  |         1 | NULL       | USER       |
+---------+-------------------------+-------------+-------+-----------+-----------+------------+------------+
1 row in set (0.00 sec)

11、重新登陆会话,先检查jar包是否被加载再调用函数,会发现调用函数的时候会重新加载jar包,加载jar包的规则记录在元数据库的func_ru表格中

hive> list jar;
hive> select ename, my_hello(ename) from emp_dept_partition limit 3;
converting to local hdfs://hadoop001:9000/jar/hive-1.0.jar
Added [/tmp/9da42cea-1284-46f1-9969-74dc80ed05fe_resources/hive-1.0.jar] to class path
Added resources: [hdfs://hadoop001:9000/jar/hive-1.0.jar]
OK
SMITH   HelloWord:SMITH
JONES   HelloWord:JONES
SCOTT   HelloWord:SCOTT
Time taken: 1.252 seconds, Fetched: 3 row(s)

hive> list jar;
/tmp/9da42cea-1284-46f1-9969-74dc80ed05fe_resources/hive-1.0.jar

不足点:show functions的命令并不会显示该命令,而且每次使用都要重新加载jar包的话还是挺麻烦的。所以后续会继续寻找其解决方法。

【来自@若泽大数据】









  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值