项目
MyEclips>New>Java Project>project Name :* trimCharacter *
新建包:package com.tenly.trim;
新建类: trimCharacter
功能实现:去掉数据左右两点
导包
复制:
/hive/lib一级目录下所有包(多余jar包一并复制)
/hadoop/share/hadoop/common/hadoop-common-*.jar
将包加入项目
UDF代码
代码
package com.tenly.trim;
import org.apache.hadoop.hive.ql.exec.UDF;
//继承UDF
public class trimCharacter extends UDF{
//可多个参数
public String evaluate
(String value,String leftCharacter,String rightCharacter)
{
if(value==null || value.length()==0 &&
leftCharacter==null ||leftCharacter.length()==0&&
rightCharacter==null ||rightCharacter.length()==0)
{
return "";
}else{
String trimValue=value.trim();
if(trimValue.indexOf(leftCharacter)==0 &&
trimValue.lastIndexOf(rightCharacter)==trimValue.length()-1){
return trimValue.substring(1, trimValue.length()-1);
}
}
return "";
}
}
打包
trimCharacter >export>JAR file
上传至Hive节点(非HDFS路径,地址非固定): /usr/local/hive/udf
执行
数据创建,上传HDFS,创建外部表
[root@namenode hadoop]# `cat /usr/local/hive/udf/a.txt
100001 '一个悲催的程序员'
100002 '在写一个悲催的笔记'
100003 '另一个悲催的程序员'
100004 '正在看这篇笔记'
[root@namenode hadoop]# `hadoop fs -put /usr/local/hive/udf/a.txt /usr/local/hive/udf`
hive> create external table MyTestUDF(id string , note string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/usr/local/hive/udf';
hive> select * from MyTestUDF;
100001 '一个悲催的程序员'
100002 '在写一个悲催的笔记'
100003 '另一个悲催的程序员'
100004 '正在看这篇笔记'
执行UDF三步: 1.加载Jar,2.创建UDF函数,3.使用函数.
hive> add jar /usr/local/hive/udf/trimCharacter.jar
#Added [/usr/local/hive/udf/trimCharacter.jar] to class path
#Added resources: [/usr/local/hive/udf/trimCharacter.jar]
hive> CREATE TEMPORARY FUNCTION trimCharacter AS 'com.tenly.trim.trimCharacter';
hive> select id,trimCharacter(note,"'","'") from MyTestUDF;