问题:表中数据像如下这种情况(某个字段是JSON格式),想要单独取出JSON中的某个字段:id,ids,total_number 怎么办?
hive (first_test)> select * from user;
OK
user.id user.infos user.device_id
1 {"id": 1701439105,"ids": [2154137571,3889177061],"total_number": 493} 201
2 {"id": 1701439106,"ids": [2154137571,3889177061],"total_number": 494} 301
Time taken: 0.035 seconds, Fetched: 2 row(s)
1、建表
create table user(
id int,infos string,device_id int)
row format delimited
fields terminated by '\t'
lines terminated by '\n';
2、导入数据
user.txt
1 {"id": 1701439105,"ids": [2154137571,3889177061],"total_number": 493} 201
2 {"id": 1701439106,"ids": [2154137571,3889177061],"total_number": 494} 301
load data local inpath '/opt/soft/hive/testjob/user.txt' into table user;
get_json_object
使用 get_json_object 函数,对json字符串进行解析
使用方式:
select get_json_object(json所在的列名,’$.json中字段名1’,…)
例如:
hive (first_test)> select get_json_object(infos,'$.id') from user;
OK
_c0
1701439105
1701439106
Time taken: 0.071 seconds, Fetched: 2 row(s)
json_tuple (官方推荐:效率高)
使用json_tuple函数,对json字符串进行解析
使用方式:
select json_tuple(json所在的列名,“json中字段名1”,…) from 表名;
例如:
hive (first_test)> select json_tuple(infos,"total_number") from user_tmp;
OK
c0
493
494
Time taken: 0.033 seconds, Fetched: 2 row(s)