User-Defined Functions
- UDF(User-Defined-Function)一进一出
- UDAF(User-Defined Aggregation Function) 多进一出 聚集函数,类似于count、max
- UDTF(User-Defined Table-Gennerating Functions) 一进多出 类似于lateral、view、explore
UDF:允许用户自定义函数,允许用户扩展HiveQL功能:
UDF 编程步骤:
- UDF(User-Defined-Function)一进一出
- UDAF(User-Defined Aggregation Function) 多进一出 聚集函数,类似于count、max
- UDTF(User-Defined Table-Gennerating Functions) 一进多出 类似于lateral、view、explore
1.继承org.apache.hadoop.hive.ql.exec.UDF
2.需要实现evalueate函数;evaluate函数支持重载;
注意事项:
- UDF必须要有返回类型,可以返回null,但是返回类型不能为void;
- UDF中常用Text/LongWriteable等类型,不推荐使用Java类型
- UDF必须要有返回类型,可以返回null,但是返回类型不能为void;
- UDF中常用Text/LongWriteable等类型,不推荐使用Java类型
create customer UDF
First, you need to create a new class that extends UDF, with one or more methods named evaluate.
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public final class Lower extends UDF {
public Text evaluate(final Text s) {
if (s == null) { return null; }
return new Text(s.toString().toLowerCase());
}
}
示例:自定义UDF,在导入日志数据到表中时,清楚日期字符串的引号
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
/**
* Created by huangxgc on 2017/3/26 0026.
*/
public class DateFormate extends UDF {
SimpleDateFormat inputFormat = new SimpleDateFormat("dd/MMM/yyyy:hh:mm:ss", Locale.ENGLISH);
SimpleDateFormat outputFormat = new SimpleDateFormat("yyyyMMddHHmmss");
public Text evaluate(Text input) {
Text output = new Text();
if (input == null) {
return null;
}
String inputDate = input.toString().trim();
System.out.println(inputDate);
Date parseDate = null;
try {
parseDate = inputFormat.parse(inputDate);
} catch (ParseException e) {
e.printStackTrace();
}
String outputDate = outputFormat.format(parseDate);
output.set(outputDate);
return output;
}
// Test 测试UDF方法
public static void main(String[] args) {
System.out.println(new DateFormate().evaluate(new Text("30/May/2013:17:38:20 +0800")));
}
}
UDAF
UDTF实现 generic UDTF 抽象类,重写
initialize
,
process以及可能的close方法
First, you need to create a new class that extends UDF, with one or more methods named evaluate.
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public final class Lower extends UDF {
public Text evaluate(final Text s) {
if (s == null) { return null; }
return new Text(s.toString().toLowerCase());
}
}
示例:自定义UDF,在导入日志数据到表中时,清楚日期字符串的引号
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;
/**
* Created by huangxgc on 2017/3/26 0026.
*/
public class DateFormate extends UDF {
SimpleDateFormat inputFormat = new SimpleDateFormat("dd/MMM/yyyy:hh:mm:ss", Locale.ENGLISH);
SimpleDateFormat outputFormat = new SimpleDateFormat("yyyyMMddHHmmss");
public Text evaluate(Text input) {
Text output = new Text();
if (input == null) {
return null;
}
String inputDate = input.toString().trim();
System.out.println(inputDate);
Date parseDate = null;
try {
parseDate = inputFormat.parse(inputDate);
} catch (ParseException e) {
e.printStackTrace();
}
String outputDate = outputFormat.format(parseDate);
output.set(outputDate);
return output;
}
// Test 测试UDF方法
public static void main(String[] args) {
System.out.println(new DateFormate().evaluate(new Text("30/May/2013:17:38:20 +0800")));
}
}
UDAF
initialize
,
process以及可能的close方法
package org.apache.hadoop.hive.contrib.udtf.example;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
/**
* GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice
* to test outputting of rows on close with lateral view.
*
*/
public class GenericUDTFCount2 extends GenericUDTF {
Integer count = Integer.valueOf(0);
Object forwardObj[] = new Object[1];
@Override
public void close() throws HiveException {
forwardObj[0] = count;
forward(forwardObj);
forward(forwardObj);
}
@Override
public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("col1");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
}
@Override
public void process(Object[] args) throws HiveException {
count = Integer.valueOf(count.intValue() + 1);
}
}
jar 包的使用。
hive> add jar my_jar.jar;
Added my_jar.jar to class path
添加本地的jar包
hive> add jar /tmp/my_jar.jar;
Added /tmp/my_jar.jar to class path
显示所有的jar包
hive> list jars;
my_jar.jar
As of
Hive 0.13
, UDFs also have the option of being able to specify required jars in the
CREATE FUNCTION
statement:
CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';
package org.apache.hadoop.hive.contrib.udtf.example;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
/**
* GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice
* to test outputting of rows on close with lateral view.
*
*/
public class GenericUDTFCount2 extends GenericUDTF {
Integer count = Integer.valueOf(0);
Object forwardObj[] = new Object[1];
@Override
public void close() throws HiveException {
forwardObj[0] = count;
forward(forwardObj);
forward(forwardObj);
}
@Override
public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("col1");
fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
fieldOIs);
}
@Override
public void process(Object[] args) throws HiveException {
count = Integer.valueOf(count.intValue() + 1);
}
}
jar 包的使用。
hive> add jar my_jar.jar;
Added my_jar.jar to class path
添加本地的jar包
hive> add jar /tmp/my_jar.jar;
Added /tmp/my_jar.jar to class path
显示所有的jar包
hive> list jars;
my_jar.jar
As of
Hive 0.13
, UDFs also have the option of being able to specify required jars in the
CREATE FUNCTION
statement:
CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';