hive on MapReduce 和 hive on Spark 【解析json串】

qq_35218261

已于 2023-09-01 16:13:35 修改

阅读量114

点赞数

文章标签： hive hadoop spark 大数据数据仓库 hbase 数据库

于 2023-09-01 16:10:51 首次发布

本文链接：https://blog.csdn.net/qq_35218261/article/details/132625235

版权

hive解析json串

场景：解析json串里面的活动id、活动名称、活动开始时间、活动结束时间，活动描述

1.原始数据准备

-- hive 建表语句


create table scene_table as
 select 1 id,
    '[{"icon":"ID","name":"活动ID","type":"TEXT","value":"123131"},
    {"icon":"ACRIVITY_NAME","name":"活动名称","type":"TEXT","value":"超长活动"},
    {"icon":"ACTIVITY_START_TIME","name":"活动开始时间","type":"TEXT","value":"2021-08-20"},
    {"icon":"ACTIVITY_END_TIME","name":"活动结束时间","type":"TEXT","value":"2021-09-10"},
    {"icon":"ACRIVITY_DESC","name":"活动描述","type":"TEXT","value":"活动描述"}]' scene_detail_data

2. hive函数

LATERAL VIEW 和 explode() 函数保持不变，用于展开列中嵌套数组或者结构体。

split() 函数也保持不变，用于将字符串拆分成数组。

concat() 函数用于将多个字符串连接起来，

regexp_replace() 函数用于替换字符串中的特定字符或模式。

3. hive底层是mr和spark的注意点

Spark SQL 中的正则表达式语法与 Hive 有所不同。因此，您需要根据具体情况调整正则表达式参数。在 Spark SQL 中，分号（;）不建议用作分隔符，因为这可能会与 SQL 语句中的分号冲突。如果可能，请考虑使用其他分隔符，例如逗号（,）或者竖线（|）。

样例展示：

hive on mr



select 
    t.id,
    max(case when name ='活动ID' then value end)  activity_id,
    max(case when name='活动名称' then value end) activity_name,
    max(case when name='活动开始时间' then value end) activity_start_time,
    max(case when name='活动结束时间' then value end) activity_end_time
from(
   select 
        aa.id,
        get_json_object(json_str,'$.name')name,
        get_json_object(json_str,'$.value')value
    from(
    select * from scene_table   )aa 
    LATERAL VIEW explode(split(concat(regexp_replace(regexp_replace(scene_detail_data , '\\[|\\]',''),'},\\s*\\{','};{'),';'), ';')) kv AS json_str
    where json_str is not null 
    and  get_json_object(json_str,'$.name') in('活动ID','活动名称','活动开始时间','活动结束时间'))t 
group by 
    t.id;

hive on spark


select 
    t.id,
    max(case when name ='活动ID' then value end)  activity_id,
    max(case when name='活动名称' then value end) activity_name,
    max(case when name='活动开始时间' then value end) activity_start_time,
    max(case when name='活动结束时间' then value end) activity_end_time
from(

SELECT 
		get_json_object(concat(json_str,'}'),'$.name') name,
		get_json_object(concat(json_str,'}'),'$.value')value_s,
		t.*
	from scene_table   t 
	LATERAL VIEW explode(split(concat(regexp_replace(regexp_replace(scene_detail_data , '\\[|\\]',''),'},\\s*\\{','},{'),','),'},')) kv AS json_str
	where json_str !=''
)t 
group by t.id