Hive解析Json数据有两种方式
-
使用 get_json_object 解析
优点:可以解析嵌套json
缺点:无法获取key为特俗字符开头得数据,如@开头;
每次只能解析出一个json数据
特殊字符缺点可解决示例:
原始数据样例(已脱敏)
查询sql
select * from ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 1
查询结果
{
"@timestampMillis":"1607183999645",
"@offset":"364045812",
"@timestamp":"2020-12-05T15:59:59.645Z",
"@version":"1",
"message":{
"duration":0,
"traceId":"f7f7e26468b4382c9fb10f00b0d32b6c",
"binaryAnnotations":[
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"901914522",
"key":"bc.pbusiId"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"-1",
"key":"bz.subBusiState"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"192.168.155.102,ORD_769",
"key":"gd.region_id"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"",
"key":"bc.busiId"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"",
"key":"bz.busiType"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"",
"key":"bz.subBusiType"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"-1",
"key":"bz.busiState"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"0",
"key":"bz.resultCode"
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"",
"key":"bz.error"
}
],
"@timestamp":"2020-12-05T15:59:59.645Z",
"@metadata":{
"beat":"filebeat",
"topic":"ETE_SMMDDATA_SMMD",
"type":"doc",
"version":"6.2.4"
},
"name":"mmdRecvTMD",
"annotations":[
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"sr",
"timestamp":1607183990738000
},
{
"endpoint":{
"port":0,
"ip":"192.168.155.102",
"serviceName":"1991.MMD"
},
"value":"ss",
"timestamp":1607183990938000
}
],
"id":"f7f7e26468b4382c9fb10f00b0d32b6c",
"fields":{
"log_topics":"ETE_SMMDDATA_SMMD"
},
"parentId":"",
"timestamp":1607183990738000
}
}
特殊字符key查询
select get_json_object(json,'$.@timestampMillis') as time1,get_json_object(json,'$.message.timestamp') as time,get_json_object(json,'$.message.annotations[0].endpoint.serviceName') as service from ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 10
查询结果
用正则替代优化特殊字符查询
regexp_replace 进行@替换,然后查询
select get_json_object(regexp_replace ( json, '@', 'abc' ), "$.abctimestampMillis" ) as time2,get_json_object(json,'$.message.timestamp') as time,get_json_object(json,'$.message.annotations[0].endpoint.serviceName') as service from ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 10
结果展示
2. 使用 json_tuple 解析
优点:可以直接获取特殊字符开头的数据,可以一次性获解析出多个数据
缺点:无法解析嵌套json
查询示例
select a.* from ete_bo.ete_bo_serv_all lateral view json_tuple(json,'@timestampMillis','@offset') a as f1,f2 where proid like'%8440000' and dt=20201206 limit 100
结果展示
lateral view 这个主要是和 上面几个函数配合使用的函数,相当于建一个临时的视图,这样就能把上面的查询结果当做查询 查询字段查出来
@Cool.R