hive udf案列，实战操作_hive 查看 udf 创建语句-CSDN博客

本文链接：https://blog.csdn.net/weixin_38629422/article/details/113653557

转自：https://blog.csdn.net/zsigner/article/details/106885634?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-0&spm=1001.2101.3001.4242

自己做了些小修改，添加了创建永久函数的方法，请大家参考

1.数据格式实例：

-------------------------------------
1,zhangsan:18:beijing|nan|it,2000
2,lisi:28:nanjing|nan|it,4000
3,xiaowu:38:qingdao|nan|it,1000
--------------------------------

创建表，然后load到hive中：

create table userinfo (id int,info string,salary int)
row format delimited
fields terminated by ",";

load data:

load data local inpath "/home/darren/hive/qipa.data" into table userinfo;

select id,name,age,sex from where salary >1000 //一般的sql查询是这样的，但是数据不规则，这种方式无效

select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from where salary >1000 //自定义一个函数，处理不规则数据。

2.编写一个hive api java程序：继承hive.ql.exec.UDF类，重写evaluate方法

package com.darren.wang;
import org.apache.hadoop.hive.ql.exec.UDF;
public class Personinfo extends UDF {
    public String evaluate(String info,int index) {
        String rep = info.replaceAll("\\|", ":");  //将传来的info数据中的|替换为：
        String[] guize = rep.split(":");           //以：切分数据
        return guize[index-1];  //sql传的下表是从1开始，而列表是从0开始，所以要减1，返回的是info切割后的对应下标的字段。
    }
}

3.打包好的jar包放到一个目录下，然后添加到hive中：

hive>add jar /home/darren/test.jar

4.创建一个函数，指定类名（创建永久函数）：

hive>create temporary function fun1 as 'com.darren.wang.Personinfo';

永久函数上传到hdfs
hdfs dfs -put /xx/xx/.jar  /xx/hive/udf/

hive>create function fun1 as 'com.darren.wang.Personinfo';

hive>using jar '/xx/hive/udf/personInfo.jar';

5.查看创建的函数，创建成功后会找到fun1：

hive>show functions;

6.执行查询语句，查看表内容：

hive>select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from userinfo;