推荐大家去看原文博主的文章,条理清晰阅读方便,转载是为了方便以后个人查阅
https://blog.csdn.net/wangjiadongge/article/details/82424870
1.数据格式实例:
-------------------------------------
1,zhangsan:18:beijing|nan|it,2000
2,lisi:28:nanjing|nan|it,4000
3,xiaowu:38:qingdao|nan|it,1000
--------------------------------
创建表,然后load到hive中:
create table userinfo (id int,info string,salary int)
row format delimited
fields terminated by ",";
load data:
load data local inpath "/home/darren/hive/qipa.data" into table userinfo;
select id,name,age,sex from where salary >1000 //一般的sql查询是这样的,但是数据不规则,这种方式无效
select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from where salary >1000 //自定义一个函数,处理不规则数据。
2.编写一个hive api java程序:
package com.darren.wang;
import org.apache.hadoop.hive.ql.exec.UDF;
public class Personinfo extends UDF {
public String evaluate(String info,int index) {
String rep = info.replaceAll("\\|", ":"); //将传来的info数据中的|替换为:
String[] guize = rep.split(":"); //以:切分数据
return guize[index-1]; //sql传的下表是从1开始,而列表是从0开始,所以要减1,返回的是info切割后的对应下标的字段。
}
}
3.打包好的jar包放到一个目录下,然后添加到hive中:
hive>add jar /home/darren/test.jar
4.创建一个函数,指定类名:
hive>create temporary function fun1 as 'com.darren.wang.Personinfo';
5.查看创建的函数,创建成功后会找到fun1:
hive>show functions;
6.执行查询语句,查看表内容:
hive>select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from userinfo;
结果为:
1 zhangsan 18 beijing
2 lisi 28 nanjing
3 xiaowu 38 qingdao
7.可以根据查询出来的结果创建一个表,否则退出hive时,创建的函数失效,需要重新创建自定义函数。
create table testuserinfo as
select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from userinfo;