Hive自带Function使用及UDF编程

User-Defined Functions

  • UDF(User-Defined-Function)一进一出 
  • UDAF(User-Defined Aggregation Function) 多进一出 聚集函数,类似于count、max
  • UDTF(User-Defined Table-Gennerating Functions) 一进多出 类似于lateral、view、explore

UDF:允许用户自定义函数,允许用户扩展HiveQL功能:

UDF 编程步骤:

1.继承org.apache.hadoop.hive.ql.exec.UDF

2.需要实现evalueate函数;evaluate函数支持重载;

注意事项:
  • UDF必须要有返回类型,可以返回null,但是返回类型不能为void;
  • UDF中常用Text/LongWriteable等类型,不推荐使用Java类型

create customer UDF

First, you need to create a new class that extends UDF, with one or more methods named evaluate.
     
     
  1. package com.example.hive.udf;
  2. import org.apache.hadoop.hive.ql.exec.UDF;
  3. import org.apache.hadoop.io.Text;
  4. public final class Lower extends UDF {
  5. public Text evaluate(final Text s) {
  6. if (s == null) { return null; }
  7. return new Text(s.toString().toLowerCase());
  8. }
  9. }
示例:自定义UDF,在导入日志数据到表中时,清楚日期字符串的引号
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

/**
 * Created by huangxgc on 2017/3/26 0026.
 */
public class DateFormate extends UDF {

    SimpleDateFormat inputFormat = new  SimpleDateFormat("dd/MMM/yyyy:hh:mm:ss", Locale.ENGLISH);
    SimpleDateFormat outputFormat = new SimpleDateFormat("yyyyMMddHHmmss");

    public Text evaluate(Text input) {
        Text output = new Text();
        if (input == null) {
            return null;
        }
        String inputDate = input.toString().trim();
        System.out.println(inputDate);
        Date parseDate = null;

        try {
            parseDate = inputFormat.parse(inputDate);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String outputDate = outputFormat.format(parseDate);
            output.set(outputDate);

        return output;
    }
    // Test 测试UDF方法
    public static void main(String[] args)  {
        System.out.println(new DateFormate().evaluate(new Text("30/May/2013:17:38:20 +0800")));
    }
}



UDAF  
UDTF实现 generic UDTF 抽象类,重写 initialize process以及可能的close方法

    
    
  1. package org.apache.hadoop.hive.contrib.udtf.example;
  2. import java.util.ArrayList;
  3. import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
  4. import org.apache.hadoop.hive.ql.metadata.HiveException;
  5. import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
  6. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
  7. import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
  8. import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
  9. import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
  10. /**
  11. * GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice
  12. * to test outputting of rows on close with lateral view.
  13. *
  14. */
  15. public class GenericUDTFCount2 extends GenericUDTF {
  16. Integer count = Integer.valueOf(0);
  17. Object forwardObj[] = new Object[1];
  18. @Override
  19. public void close() throws HiveException {
  20. forwardObj[0] = count;
  21. forward(forwardObj);
  22. forward(forwardObj);
  23. }
  24. @Override
  25. public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
  26. ArrayList<String> fieldNames = new ArrayList<String>();
  27. ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
  28. fieldNames.add("col1");
  29. fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
  30. return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
  31. fieldOIs);
  32. }
  33. @Override
  34. public void process(Object[] args) throws HiveException {
  35. count = Integer.valueOf(count.intValue() + 1);
  36. }
  37. }


jar 包的使用。
     
     
  1. hive> add jar my_jar.jar;
  2. Added my_jar.jar to class path
添加本地的jar包
     
     
  1. hive> add jar /tmp/my_jar.jar;
  2. Added /tmp/my_jar.jar to class path
显示所有的jar包
     
     
  1. hive> list jars;
  2. my_jar.jar
As of  Hive 0.13 , UDFs also have the option of being able to specify required jars in the  CREATE FUNCTION  statement:
     
     
  1. CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值