0-问题描述
如下json_str
json_str |
---|
[{“website”:“baidu.com”,“name”:“百度”},{“website”:“google.com”,“name”:“谷歌”}] |
平时在工作中我们经常会处理json格式的数据串,通常是取json的value值,但是我想获取该json中的每一个key值,如何做,筒子们有遇到过这种需求吗?
1-问题解决
(1)先将json_str中的[]及{}去掉
此处利用translate()函数处理,(当然也可以使用regexp_repalce()处理。)
select translate('[{"website":"baidu.com","name":"百度"} ,{"website":"google.com","name":"谷歌"}]'
,'[]{}""','')
结果如下:
OK
website:baidu.com,name:百度,website:google.com,name:谷歌
Time taken: 1.335 seconds, Fetched: 1 row(s)
(2)将逗号替换成冒号:
select regexp_replace(translate('[{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]'
,'[]{}""','') ,'\,','\:')
计算结果如下:
OK
website:baidu.com:name:百度:website:google.com:name:谷歌
Time taken: 0.165 seconds, Fetched: 1 row(s)
(3)将步骤2计算的结果用posexplode()函数展开,获取索引值及具体值
select pos+1 as rn
,val
from(
select regexp_replace(translate('[{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]'
,'[]{}""','') ,'\,','\:') as str
) t1 lateral view posexplode(split(str,':')) t2 as pos,val
计算结果如下:
OK
1 website
2 baidu.com
3 name
4 百度
5 website
6 google.com
7 name
8 谷歌
Time taken: 0.224 seconds, Fetched: 8 row(s)
(4)由于K-V是一组对偶元组,因此K为奇数行,我们只需要取出奇数行的记录即可
select rn
,val as key
from(
select pos+1 as rn
,val
from(
select regexp_replace(translate('[{"website":"baidu.com","name":"百度"},{"website":"google.com","name":"谷歌"}]'
,'[]{}""','') ,'\,','\:') as str
) t1 lateral view posexplode(split(str,':')) t2 as pos,val
) m
where rn%2=1
计算结果如下:
OK
1 website
3 name
5 website
7 name
Time taken: 0.294 seconds, Fetched: 4 row(s)
2-总结
本文给出了一种通过HQL提取JSON中 key值的方法和技巧,主要使用的知识点如下:
- (1)字符替换函数:translate()函数
- (2)字符串替换函数:regexp_replace()函数
- (3) 列转行:lateral view posexplode()函数
- (4)获取奇数行记录:mod(rn,2)=1