hive搞的时候要注意:对于json数组里面只有一个对象的要单独拿出来解析,这个在explode里面会失效
工作中会有这样的数据类型,需要规则化的解析出来方便下游使用
解析如下:
select crawler_data_time
,nav
,model
,author_id
,nick_name
,follower_num
,create_time
,get_json_object(concat('{',report_1,'}'),'$.distribution_key') as distribution_key
,get_json_object(concat('{',report_1,'}'),'$.distribution_value') as distribution_value
,description
,type_display
,dt
from (
select crawler_data_time
,nav
,model
,author_id
,nick_name
,follower_num
,create_time
,get_json_object(fan_distributions,'$.[0].description') as description
,get_json_object(fan_distributions,'$.[0].distribution_list') as distribution_list
,get_json_object(fan_distributions,'$.[0].type_display') as type_display
,dt
from bigdata2c.ods_biz_douyin_author_fans_report
where dt = ${bdp.system.bizdate}
) t1
LATERAL VIEW
explode(split(regexp_replace(regexp_replace(distribution_list,'\\[\\{',''),'}]','}'),'},\\{')) price_info_list as report_1
union all
select crawler_data_time
,nav
,model
,author_id
,nick_name
,follower_num
,create_time
,get_json_object(concat('{',report_1,'}'),'$.distribution_key') as distribution_key
,get_json_object(concat('{',report_1,'}'),'$.distribution_value') as distribution_value
,description
,type_display
,dt
from (
select crawler_data_time
,nav
,model
,author_id
,nick_name
,follower_num
,create_time
,get_json_object(fan_distributions,'$.[1].description') as description
,get_json_object(fan_distributions,'$.[1].distribution_list') as distribution_list
,get_json_object(fan_distributions,'$.[1].type_display') as type_display
,dt
from bigdata2c.ods_biz_douyin_author_fans_report
where dt = ${bdp.system.bizdate}
) t1
LATERAL VIEW
explode(split(regexp_replace(regexp_replace(distribution_list,'\\[\\{',''),'}]','}'),'},\\{')) price_info_list as report_1
当然这样也会有问题,里面每个都需要单独写,如果个数不固定的话解析数据会少,这时候需要个循环操作,有碰到了会用阿里云的循环节点或者pyodps处理下