hive 去重 字符串_Hive解析嵌套结构的json字符串,并去重

本文介绍如何在Hive中处理含有嵌套结构的JSON字符串,并进行去重操作。通过使用explode、regexp_replace和concat_ws等函数,将JSON字段打平,解析键值对,去除非重复项,最后按特定格式拼接返回。
摘要由CSDN通过智能技术生成

需求

一个表有一个abtest字段,是带有嵌套结构的json字符串,里面的key:value可能会重,需要将abtest里的所有的key:value打平去重,去掉双引号后再用逗号拼接返回,并且需要

例子

输入的abtest如下

{"trip_ab_deal_packagerankst":"B",

"trip_ab_poivideo":"C",

"trip_ab_group_fengchao_zhinan":"A",

"trip_ab_BookingProduct":"A",

"trip_ab_OptimalGoods_2018":"B",

"trip_ab_poitoubuyouhua":"B",

"trip_ab_group_fengchao":{"trip_ab_group_fengchao_zhinan":"A"},

"trip_ab_group_BookingProduct":{"trip_ab_BookingProduct":"A"},

"trip_ab_group_common":{"trip_ab_poivideo":"C"},

"trip_ab_group_common":{"trip_ab_poitoubuyouhua":"B"},

"trip_ab_group_common":{"trip_ab_deal_packagerankst":"B"},

"trip_ab_group_OptimalGoods_2018":{"trip_ab_OptimalGoods_2018":"B"}}

要求输出如下

trip_ab_deal_packagerankst:B,trip_ab_poivideo:C,trip_ab_group_fengchao_zhinan:A,trip_ab_BookingProduct:A,trip_ab_OptimalGoods_2018:B,trip_ab_poitoubuyouhua:B

解决方案

将abtest的json字符串按照,打平

lateral view explode(split(abtest,',')) b as f2

将打平后的每个值按照正则表达式解析出"":""格式的数据,并去掉双引号

regexp_replace(

// 正则表达式编写思路:

// 要解析"":""格式的数据:(".*":".*")

// 因为一行可能有多对键值对,所以需要非贪婪匹配:(".*":".*?")

// 因为有"trip_ab_group_OptimalGoods_2018":{"trip_ab_OptimalGoods_2018":"B"}}这种数据,

// 我们实际要解析的是"trip_ab_OptimalGoods_2018":"B",所以需要做兼容:("[^"]*"\:".*?")

regexp_extract(f2,'("[^"]*"\:".*?")',0),

'"','')

使用窗口函数再将这些值使用窗口函数聚合到一起,按照,拼接

concat_ws(',',collect_set(regexp_replace(regexp_extract(f2,'("[^"]*"\:".*?")',0),'"','')))

测试用例

hive> select concat_ws(',',collect_set(regexp_replace(regexp_extract(f2,'("[^"]*"\:".*?")',0),'"','')))

> from (

> select 1 as rk,

> '{"trip_ab_deal_packagerankst":"B","trip_ab_poivideo":"C","trip_ab_group_fengchao_zhinan":"A","trip_ab_BookingProduct":"A","trip_ab_OptimalGoods_2018":"B","trip_ab_poitoubuyouhua":"B","trip_ab_group_fengchao":{"trip_ab_group_fengchao_zhinan":"A"},"trip_ab_group_BookingProduct":{"trip_ab_BookingProduct":"A"},"trip_ab_group_common":{"trip_ab_poivideo":"C"},"trip_ab_group_common":{"trip_ab_poitoubuyouhua":"B"},"trip_ab_group_common":{"trip_ab_deal_packagerankst":"B"},"trip_ab_group_OptimalGoods_2018":{"trip_ab_OptimalGoods_2018":"B"}}' as abtest

> ) a lateral view explode(split(abtest,',')) b as f2

> GROUP BY a.rk;

Query ID = gaowenfeng_20180927192959_991bcfd6-ead7-4ca6-a7b0-c542828f2d0e

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1537411935549_0008, Tracking URL = http://gaowenfengdeMacBook-Pro.local:8088/proxy/application_1537411935549_0008/

Kill Command = /Users/gaowenfeng/software/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1537411935549_0008

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-09-27 19:30:04,677 Stage-1 map = 0%, reduce = 0%

2018-09-27 19:30:08,794 Stage-1 map = 100%, reduce = 0%

2018-09-27 19:30:14,951 Stage-1 map = 100%, reduce = 100%

Ended Job = job_1537411935549_0008

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 12371 HDFS Write: 158 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

OK

trip_ab_deal_packagerankst:B,trip_ab_poivideo:C,trip_ab_group_fengchao_zhinan:A,trip_ab_BookingProduct:A,trip_ab_OptimalGoods_2018:B,trip_ab_poitoubuyouhua:B

Time taken: 16.953 seconds, Fetched: 1 row(s)

hive>

结果sql

select a.event_identifier,

concat_ws(',',collect_set(regexp_replace(regexp_extract(f2,'("[^"]*"\:".*?")',0),'"',''))),

MAX(abtest)

from our_table a lateral view explode(split(abtest,',')) b as f2

WHERE a.datekey = 20180926

GROUP BY a.event_identifier

LIMIT 100

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值