和UDF相比,通用GDF(GenericUDF)支持复杂类型(比如List,struct等)的输入和输出。
下面来看一个小示例。
Hive中whereme表中包含若干人的行程如下:
A 2013-10-10 8:00:00 home
A 2013-10-10 10:00:00 Super Market
A 2013-10-10 12:00:00 KFC
A 2013-10-10 15:00:00 school
A 2013-10-10 20:00:00 home
A 2013-10-15 8:00:00 home
A 2013-10-15 10:00:00 park
A 2013-10-15 12:00:00 home
A 2013-10-15 15:30:00 bank
A 2013-10-15 19:00:00 home
通过查询我们要得到如下结果:
A 2013-10-10 08:00:00 home 10:00:00 Super Market
A 2013-10-10 10:00:00 Super Market 12:00:00 KFC
A 2013-10-10 12:00:00 KFC 15:00:00 school
A 2013-10-10 15:00:00 school 20:00:00 home
A 2013-10-15 08:00:00 home 10:00:00 park
A 2013-10-15 10:00:00 park 12:00:00 home
A 2013-10-15 12:00:00 home 15:30:00 bank
A 2013-10-15 15:30:00 bank 19:00:00 home
1.编写GenericUDF.
package com.wz.udf;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.serde2.lazy.LazyString;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;
impo

本文介绍如何利用Hive的GenericUDF处理包括List和struct在内的复杂类型数据。通过一个示例,展示了从whereme表中查询并处理数据的过程,包括编写GenericUDF,创建存储查询结果的结构体表,最终获取处理后的结果。
最低0.47元/天 解锁文章
4万+

被折叠的 条评论
为什么被折叠?



