Hive SQL实现数组中map的拆分以及计算

一、需求背景

  1. 本次任务主要是完成对表中的每一行数据进行求和计算。
  2. 涉及的表数据如下所示:
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:2","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:0","MoSig:26","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:2","MoSig:15","MoData:8","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:4","MoData:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:2","MoData:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:9","MoData:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:0","MoData:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:13","MoSig:2","MoData:27","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:0","MoData:2","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:4","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]

二、实现方法

(一) 通过对本次的需求任务的解读,将实现方法大概分为以下几个步骤:

  1. 完成数组中括号[]的去除工作。
    ·主要使用的是hive中的正则表达式替换函数:regexp_replace来完成本次的替换;
    ·语法: regexp_replace(string A, string B, string C)
    ·返回值: string
select regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}','') from 5gr1126;

效果图展示:
在这里插入图片描述

  1. 我们将每一行的字符串转换成map形式
    ·我们通过使用hive中的str_to_map函数来完成本次的转换操作;
    ·语法:str_to_map(字符串参数, 分隔符1, 分隔符2)
    ·返回值:使用两个分隔符将文本拆分为键值对
select str_to_map(regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}','')) from 5gr1126;

效果图展示:
在这里插入图片描述

  1. 获取map key-value对中的所有value值
    ·我们通过hive中的map_values函数来完成本次的获取值操作;
    ·语法:map_values(map)
    ·返回值:包含输入映射值的无序数组
select map_values(str_to_map(regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}',''))) from 5gr1126;

效果图展示:
在这里插入图片描述

  1. 每一行数值累加求和
    ·我们通过hive UDF自定义函数来完成本次的数值累加求和
    ·输入参数:数组列表,开始位置,结束位置
    ·输出结果:最终的和
    ·逻辑主要包含对科学技术法数值的转换以及是否为数值的过滤
public class AddNumUDF extends UDF {
    public static void main(String[] args) {//测试用例
        ArrayList<String> strArray = new ArrayList<String>();
        strArray.add("1E2");
        strArray.add("a");
        System.out.println(new AddNumUDF().evaluate(strArray, 0, 2));
    }
    public double evaluate(ArrayList<String> list,
                           int from, int to) {
        double result = 0;
        if (list == null || list.size() < 1) {
            return result;
        }
        List<String> subList;
        if (to == -1) {
            subList = list.subList(from, list.size());
        } else {
            subList = list.subList(from, to);
        }
        for (String i : subList) {//循环遍历
            if (i.contains("e") || i.contains("E") || isInt(i)) {
                BigDecimal bd = new BigDecimal(i);
                String str = bd.toPlainString();
                Double a = Double.parseDouble(str);
                result += a;
            } else {
                i = "0";
                BigDecimal bd = new BigDecimal(i);
                String str = bd.toPlainString();
                Double a = Double.parseDouble(str);
                result += a;
            }
        }
        return result;
    }
    /*该函数主要判断输入的参数是否为数值*/
   	public static boolean isInt(String v) {
        if (v == null)
            return false;
        String regEx1 = "[\\-|\\+]?\\d+";
        Pattern p;
        Matcher m;
        p = Pattern.compile(regEx1);
        m = p.matcher(v);
        if (m.matches()) return true;else return false;
    }
}
  1. 程序打成jar包,上传HS2服务器本地或者HDFS;
  2. 客户端命令行添加jar包到hive的classpath:hive>add jar /xxx/udf.jar;
  3. 注册成为临时函数(给udf命名),create temporary function 函数名 as ‘UDF类全名称’;
  4. 测试
    效果图如下所示:
    在这里插入图片描述
  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值