要处理的数据如下:
{
"IP": "192.168.1.1",
"appName": "sichuan_yunyingyong",
"customEvent": [
{
"eventName": "xx1",
"du": "xx",
"timestamp": "1480521763049",
"eventParams": {
"ContentID": "yixiuge",
"account": "13856976635",
"networkType": "WIFI",
"result": "0",
"type": "11"
}
},
{
"eventName": "xx2",
"du": "xx",
"timestamp": "1480521763049",
"eventParams": {
"ContentID": "yixiuge",
"account": "13856976636",
"networkType": "WIFI",
"result": "0",
"type": "11"
}
}
]
}
posexplode(array)把数组变成(pos, json) 的键值对
select
j1.j1_ip,
j1.j1_appName,
j2.j2_customEvent_json
FROM tab_json s
lateral view json_tuple(s.json, 'IP', 'appName', 'customEvent') j1 as j1_ip, j1_appName, j1_customEvent
lateral view posexplode(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')) j2 as j2_customEvents_pos, j2_customEvent_json
执行结果为:
192.168.1.1 sichuan_yunyingyong {"eventName":"xx1","du":"xx","timestamp":"1480521763049","eventParams":{"ContentID":"yixiuge","account":"13856976635","networkType":"WIFI","result":"0","type":"11"}}
192.168.1.1 sichuan_yunyingyong {"eventName":"xx2","du":"xx","timestamp":"1480521763049","eventParams":{"ContentID":"yixiuge","account":"13856976636","networkType":"WIFI","result":"0","type":"11"}}
这里把json array的格式通过替换变成了 {json1} || {json2} , 再去掉数组的括号,最后根据 || 来拆开,形成了一个有两个元素的数组,接着 posexplode 在把数组变成(pos, json) 的键值对,pos记录了元素的位置,json就是实际的json数据,这样一条数据就变成了两条了
那么现在要获取IP , appName , account 就很简单了:
selectj1.j1_ip,
j1.j1_appName,
j4.j4_account
FROM tab_json s
lateral view json_tuple(s.json, 'IP', 'appName', 'customEvent') j1 as j1_ip, j1_appName, j1_customEvent
lateral view posexplode(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')) j2 as j2_customEvents_pos, j2_customEvent_json
lateral view json_tuple(j2.j2_customEvent_json, 'eventParams') j3 as j3_eventParams
lateral view json_tuple(j3.j3_eventParams, 'account') j4 as j4_account
结果如下:
192.168.1.1 sichuan_yunyingyong 13856976635
192.168.1.1 sichuan_yunyingyong 13856976636
lateral view posexplode(array(split(regexp_replace(regexp_replace(j1.j1_customEvent,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')[1])) j2 as j2_customEvents_pos, j2_customEvent_json
split 跟上数组下标,就能取出某个元素,由于posexplode只接受 array类型的参数,可以使用array函数转换成对应的数组,这样就只有一条数据了,结果如下
192.168.1.1 sichuan_yunyingyong 13856976636
参考:https://blog.csdn.net/lfq1532632051/article/details/63262519