在hive中创建udf函数

丶钥匙十元三把丶

已于 2024-05-10 10:08:00 修改

阅读量242

点赞数 1

文章标签： hive 大数据

于 2024-05-09 19:29:11 首次发布

本文链接：https://blog.csdn.net/shiyuansanba/article/details/138627827

版权

1. 首先在写Java程序，继承udf类，重写evaluate函数

//包名
package com.aaa.bbb;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.json.JSONArray;
import org.json.JSONObject;
public  class getcontent extends UDF{

    public static boolean isJson(String content) {
        try {
            // 尝试将字符串转换为JSONObject
            new JSONObject(content);
            return true;
        } catch (Exception e) {
            // 如果转换过程中抛出异常，则认为不是JSON格式
            return false;
        }
    }

    public String evaluate(String data){
         String result = "格式错误";

         //首先判断是否为json格式且其中的log_id是否为1002002，,1002003
        JSONObject jsonObject = new JSONObject(data);

        String logId = jsonObject.getString("log_id");

        if (logId.equals("1002002")  ){
            return jsonObject.getJSONArray("module_section").getJSONObject(0).getString("request_content");
        } else if (logId.equals("1002003")) {
            if (isJson(jsonObject.getJSONArray("module_section").getJSONObject(0).getString("content_before_rewriting"))){
                JSONObject object = new JSONObject(jsonObject.getJSONArray("module_section").getJSONObject(0).getString("content_before_rewriting"));
                return object.getJSONObject("content")
                        .getJSONArray("parts").getJSONObject(0)
                        .getString("text");
            }else {
                return jsonObject.getJSONArray("module_section").getJSONObject(0).getString("content_before_rewriting");
            }
        }else {
            return result;
        }

    }


}

注：需要在pom文件中导入对应的依赖

    <dependencies>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.3</version>
        </dependency>

        <dependency>
            <groupId>org.json</groupId>
            <artifactId>json</artifactId>
            <version>20231013</version> <!-- 或者其他适用版本 -->
        </dependency>

    </dependencies>

2.在hive中注册该udf函数

首先需要将上述java程序打成jar包，放到hdfs或者云服务器上
然后在hive中进行注册

create function ods.udf_content（函数名称） as 'com.aaa.bbb.getcontent（包名.类名）' 
	using jar 'oss://···（对应的路径）···/udf_content-1.0-SNAPSHOT.jar';

3. 即可在hive或者spark-sql中进行使用

select ods.udf_content(data),data
from ods.ods_log
where stat_date = '20240505' and get_json_object(data,'$.log_id') = '1002003'
limit 2;