自己做了些小修改,添加了创建永久函数的方法,请大家参考
1.数据格式实例:
-------------------------------------
1,zhangsan:18:beijing|nan|it,2000
2,lisi:28:nanjing|nan|it,4000
3,xiaowu:38:qingdao|nan|it,1000
--------------------------------
创建表,然后load到hive中:
create table userinfo (id int,info string,salary int)
row format delimited
fields terminated by ",";
load data:
load data local inpath "/home/darren/hive/qipa.data" into table userinfo;
select id,name,age,sex from where salary >1000 //一般的sql查询是这样的,但是数据不规则,这种方式无效
select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from where salary >1000 //自定义一个函数,处理不规则数据。
2.编写一个hive api java程序:继承hive.ql.exec.UDF类,重写evaluate方法
package com.darren.wang;
import org.apache.hadoop.hive.ql.exec.UDF;
public class Personinfo extends UDF {
public String evaluate(String info,int index) {
String rep = info.replaceAll("\\|", ":"); //将传来的info数据中的|替换为:
String[] guize = rep.split(":"); //以:切分数据
return guize[index-1]; //sql传的下表是从1开始,而列表是从0开始,所以要减1,返回的是info切割后的对应下标的字段。
}
}
3.打包好的jar包放到一个目录下,然后添加到hive中:
hive>add jar /home/darren/test.jar
4.创建一个函数,指定类名(创建永久函数):
hive>create temporary function fun1 as 'com.darren.wang.Personinfo';
永久函数上传到hdfs
hdfs dfs -put /xx/xx/.jar /xx/hive/udf/
hive>create function fun1 as 'com.darren.wang.Personinfo';
hive>using jar '/xx/hive/udf/personInfo.jar';
5.查看创建的函数,创建成功后会找到fun1:
hive>show functions;
6.执行查询语句,查看表内容:
hive>select id,fun1(info,1) as name,fun1(info,2) as age,fun1(info,3) as city from userinfo;