和UDF相比,通用GDF(GenericUDF)支持复杂类型(比如List,struct等)的输入和输出。
下面来看一个小示例。
Hive中whereme表中包含若干人的行程如下:
A 2013-10-10 8:00:00 home
A 2013-10-10 10:00:00 Super Market
A 2013-10-10 12:00:00 KFC
A 2013-10-10 15:00:00 school
A 2013-10-10 20:00:00 home
A 2013-10-15 8:00:00 home
A 2013-10-15 10:00:00 park
A 2013-10-15 12:00:00 home
A 2013-10-15 15:30:00 bank
A 2013-10-15 19:00:00 home
通过查询我们要得到如下结果:
A 2013-10-10 08:00:00 home 10:00:00 Super Market
A 2013-10-10 10:00:00 Super Market 12:00:00 KFC
A 2013-10-10 12:00:00 KFC 15:00:00 school
A 2013-10-10 15:00:00 school 20:00:00 home
A 2013-10-15 08:00:00 home 10:00:00 park
A 2013-10-15 10:00:00 park 12:00:00 home
A 2013-10-15 12:00:00 home 15:30:00 bank
A 2013-10-15 15:30:00 bank 19:00:00 home
1.编写GenericUDF.
package com.wz.udf;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.serde2.lazy.LazyString;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StandardListObje