Hive自带Function使用及UDF编程

最新推荐文章于 2024-04-06 19:26:09 发布

Wonf

最新推荐文章于 2024-04-06 19:26:09 发布

阅读量2.6k

点赞数

分类专栏： hive 文章标签： UDF hive SimpleDateFormat

本文链接：https://blog.csdn.net/wushuang3625/article/details/67252334

版权

hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

User-Defined Functions

UDF（User-Defined-Function）一进一出
UDAF(User-Defined Aggregation Function) 多进一出聚集函数，类似于count、max
UDTF(User-Defined Table-Gennerating Functions) 一进多出类似于lateral、view、explore

UDF:允许用户自定义函数，允许用户扩展HiveQL功能：

UDF 编程步骤：

1.继承org.apache.hadoop.hive.ql.exec.UDF

2.需要实现evalueate函数；evaluate函数支持重载；

注意事项：

UDF必须要有返回类型，可以返回null，但是返回类型不能为void；

UDF中常用Text/LongWriteable等类型，不推荐使用Java类型

create customer UDF

First, you need to create a new class that extends UDF, with one or more methods named evaluate.

     
     package com.example.hive.udf;
 
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
 
public final class Lower extends UDF {
  public Text evaluate(final Text s) {
    if (s == null) { return null; }
    return new Text(s.toString().toLowerCase());
  }
}

示例：自定义UDF，在导入日志数据到表中时，清楚日期字符串的引号

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

/**
 * Created by huangxgc on 2017/3/26 0026.
 */
public class DateFormate extends UDF {

    SimpleDateFormat inputFormat = new  SimpleDateFormat("dd/MMM/yyyy:hh:mm:ss", Locale.ENGLISH);
    SimpleDateFormat outputFormat = new SimpleDateFormat("yyyyMMddHHmmss");

    public Text evaluate(Text input) {
        Text output = new Text();
        if (input == null) {
            return null;
        }
        String inputDate = input.toString().trim();
        System.out.println(inputDate);
        Date parseDate = null;

        try {
            parseDate = inputFormat.parse(inputDate);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        String outputDate = outputFormat.format(parseDate);
            output.set(outputDate);

        return output;
    }
    // Test 测试UDF方法
    public static void main(String[] args)  {
        System.out.println(new DateFormate().evaluate(new Text("30/May/2013:17:38:20 +0800")));
    }
}

UDAF

UDTF实现　generic UDTF 抽象类，重写 initialize ， process以及可能的close方法

    
    package org.apache.hadoop.hive.contrib.udtf.example;
 
import java.util.ArrayList;
 
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
 
 
/**
 * GenericUDTFCount2 outputs the number of rows seen, twice. It's output twice
 * to test outputting of rows on close with lateral view.
 *
 */
public class GenericUDTFCount2 extends GenericUDTF {
 
  Integer count = Integer.valueOf(0);
  Object forwardObj[] = new Object[1];
 
  @Override
  public void close() throws HiveException {
    forwardObj[0] = count;
    forward(forwardObj);
    forward(forwardObj);
  }
 
  @Override
  public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
    ArrayList<String> fieldNames = new ArrayList<String>();
    ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
    fieldNames.add("col1");
    fieldOIs.add(PrimitiveObjectInspectorFactory.javaIntObjectInspector);
    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
        fieldOIs);
  }
 
  @Override
  public void process(Object[] args) throws HiveException {
    count = Integer.valueOf(count.intValue() + 1);
  }
 
}

jar 包的使用。

     
     hive> add jar my_jar.jar;
Added my_jar.jar to class path

添加本地的jar包

     
     hive> add jar /tmp/my_jar.jar;
Added /tmp/my_jar.jar to class path

显示所有的jar包

     
     
hive> list jars;
my_jar.jar

As of Hive 0.13 , UDFs also have the option of being able to specify required jars in the CREATE FUNCTION statement:

     
     CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar';

Wonf

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Hive自带Function使用及UDF编程

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

User-Defined Functions

1.继承org.apache.hadoop.hive.ql.exec.UDF

2.需要实现evalueate函数；evaluate函数支持重载；

注意事项： UDF必须要有返回类型，可以返回null，但是返回类型不能为void； UDF中常用Text/LongWriteable等类型，不推荐使用Java类型

create customer UDF

注意事项：

UDF必须要有返回类型，可以返回null，但是返回类型不能为void；

UDF中常用Text/LongWriteable等类型，不推荐使用Java类型