hive sql解析Json
简单json
使用hive函数的话,就是get_json_object()
,这里是标准的jsonpath语法,从头开始,就是$
开头,逐层嵌套
WITH json as (
select '{' ||
'"name":"天台",' ||
'"tall":100,' ||
'"model":"M779011",' ||
'"res":"",' ||
'"list":"[]",' ||
'"dict":"{}"'
'}' AS j)
SELECT get_json_object(j, '$.name') as name
, get_json_object(j, '$.tall') as tall
, get_json_object(j, '$.model') as model
, get_json_object(j, '$.res') as res
, get_json_object(j, '$.list') as list
, get_json_object(j, '$.dict') as dict
from json
还有json_tuple()
函数,不过这种就不是jsonpath语法了,是一个一个的取
SELECT tmp.*
from json
lateral view json_tuple(j, 'name', 'tall', 'model', 'res') tmp as name, tall, model, res
列表嵌套json
在这种情况下,有两种取法
第一种就是使用get_json_object()
函数一级一级的取值,但这种取值适用于量级小的情况,当量级大的时候,代码就会十分冗余
第二种就是使用炸裂函数进行拆解,使列表中的json分开
select case when substring(js, length(js)) = '}' then js else concat(js,'}') end singlej
-- select get_json_object(j, '$.[0][0].name')
from (
select '[[{"name":"天台1","tall":100,"model":"M779011"},{"name":"楼顶1","tall":90,"model":"M669011"}]' ||
',[{"name":"天台2","tall":100,"model":"M779011"},{"name":"楼顶2","tall":90,"model":"M669011"}]]' AS j
) t
lateral view explode(split(regexp_replace(j,'\\[|\\]', ''), '},')) tmp as js
json嵌套json
简单列举了两种情况
第一种是直接嵌套;第二种是json嵌套列表,列表中含有json
第一种就一级一级递增就行了;第二种注意列表顺序,使用[index]
来定位下一个json对象位置
select get_json_object(j,'$.j.name') name
, get_json_object(j,'$.lis[0][0].name') name1
from (
select '{
"num":1,
"lis":
[[{"name":"天台1","tall":100,"model":"M779011"},{"name":"楼顶1","tall":90,"model":"M669011"}]
,[{"name":"天台2","tall":100,"model":"M779011"},{"name":"楼顶2","tall":90,"model":"M669011"}]],
"j":{
"name":"jim",
"age":18
}
}' as j
) t
列表json
遇到这种,可以直接转为列表类型进行研究,如果使用jsonpath语法,就是通过下标取值
select get_json_object(lis, '$[0]')
from (
select '["1","2","3"]' lis
) t