处理Json数据的函数
hive中转化 json 格式的函数
- string 类型 → json
- named_struct:生成key和value的struct结构,传递参数为named_struct(key1,value1,key2,value2,…,keyN,valueN))
- to_json:将各种复杂结构转换成json格式
- 若无,可自定义 udf 函数
to_json(collect_list(named_struct("eventType",predicate_label,"eventTime",create_time))) AS json
- map类型 → json
- str_to_map 可以先将string切割成map,再转json
- to_json:将各种复杂结构转换成json格式
- 若无,可自定义 udf 函数
to_json(map类型字段) as field_json
select to_json(str_to_map("aaaa_-100#bbbb_领券29减8#cccc_29分钟#cccc_50分钟",'#','_')) as json_test
-- {"aaaa":"-100","bbbb":"领券29减8","cccc":"50分钟"}
Hive中处理json数据的两种方式
第一种:将json数据作为字符串进行处理
-
get_json_object
- 功能:取单个json字段
tips:如果字段名称错误,不会报错,但结果会是 NULL
-
json_tuple:
- 功能:同时取多个字段
- UDTF函数
tips如果加$.的话会出现null哦~~~
-
案例一:
--创建数据:vim /export/datas/hivedata.json
{"id": 1701439105,"ids": [2154137571,3889177061],"total_number": 493}
{"id": 1701439106,"ids": [2154137571,3889177061],"total_number": 494}
--创建表:
create table tb_json_test1 (
json string
);
--加载数据
load data local inpath '/export/datas/hivedata.json' into table tb_json_test1;
--处理读取
select
get_json_object(t.json,'$.id'),
get_json_object(t.json,'$.total_number')
from
tb_json_test1 t ;
select
t2.*
from
tb_json_test1 t1
lateral view
json_tuple(t1.json, 'id', 'total_number') t2 as c1,c2;
select json_tuple(json, 'id', 'total_number') as (c1,c2) from tb_json_test1;
[{"key":"upload-screenshot","value":false},{"key":"choice","value":"已解决"}]
select reply
, key
, vaule
from (select reply
, reply_2
from (select reply
,regexp_replace(reply_1, '\\}\\,\\{', '\\}\\;\\{') as reply_1
from (select reply
,regexp_replace(reply, '\\[|\\]', '') as reply_1
from (select '[{"key":"upload-screenshot","value":false},{"key":"choice","value":"已解决"}]' as reply)
) c
) a
LATERAL VIEW explode(split(reply_1, '\\;')) b as reply_2
) a
lateral view json_tuple(reply_2, 'key', 'value') b as key,vaule
;
-- reply key vaule
-- [{"key":"upload-screenshot","value":false},{"key":"choice","value":"已解决"}] upload-screenshot false
-- [{"key":"upload-screenshot","value":false},{"key":"choice","value":"已解决"}] choice 已解决
- 案例3
- 不符合JSON规范
- 参考HIVE SQL之JSON字符串解析的坑
第二种:通过专门的解析类直接加载一个json格式的数据到Hive中
- 案例:
--创建文件:vim /export/datas/hivedata.json
{"id": 1701439105,"ids": [2154137571,3889177061],"total_number": 493}
{"id": 1701439106,"ids": [2154137571,3889177061],"total_number": 494}
--添加jar包
add jar /export/datas/json-serde-1.3.7-jar-with-dependencies.jar;
--创建表:
create table tb_json_test2 (
id string,
ids array<string>,
total_number int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
--加载数据
load data local inpath '/export/datas/hivedata.json' into table tb_json_test2;