环境准备
:
hadoop2.7+centos7+hive1.2.1+VirtualBox+xshll+eclipse+jdk1.8
数据准备:
- 启动hadoop集群和hive:
# start-dfs.sh
#source /etc/profile (注:本人集群搭建应该有问题,每次启动hive时都得先运行一下这个命令)
#hive
- 建立表格:
create table littlebigdata(
name string,
email string,
bday string,
ip string,
gender string,
anum int)
row format delimited fields terminated by ',';
- 导入数据:
load data local inpath '/root/data/data6' into table littlebigdata;
“/root/data/data6”修改成自己要上传数据的路径
数据到这里就准备好了,接下来就是编写自己的UDF了
编写UDF:
打开Eclipse
package hivejar;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
@Description(name="zodiac", value="_FUNC_(date)-from the input date string"+"or separate month and day arguments,returns the sign of the Zodiac",
extended ="Example :\n"+"> SELECT _FUNC_(data_string) from src;\n"+ ">SELECT _FUNC_(mouth,day0) FORM src;")
public class Zodiac extends UDF {
private SimpleDateFormat df;
public Zodiac(){
df=new SimpleDateFormat("MM-dd-yyyy");
}
public String evaluate(Date bday){
return this.evaluate(bday.getMonth(),bday.getDay());
}
public String evaluate(String bday){
Date date =null;
try{
date = df.parse(bday);
}catch (Exception ex){
return null;
}
return this.evaluate(date.getMonth()+1,date.getDay());
}
//在这里只写了两个月
public String evaluate(Integer month,Integer day){
if (month==1){
if(day<20){return "Capricorn";
}else{
return "Aquarius";
}
}
if(month==2){
if(day<19){
return "Aquarius";
}else{
return "Pisces";
}
}
return null;
}
}
然后打包成jar包,然后上传到虚拟机中,
其中 zodiac.jar就是打好的jar包
在hive会话中将这个jar文件加载到类路径下:
hive> add jar /root/data/zodiac.jar;
hive> create temporary function zodiac
hive> as 'hivejar.Zodiac';
路径根据自己实际情况修改,
describe function extended zodiac;
到这里就已经可以使用了
hive> select name ,bday,zodiac(bday) from littlebigdata;