Hive SQL 解析json获取特殊字符数据

Hive解析Json数据有两种方式

  1. 使用 get_json_object 解析

    优点:可以解析嵌套json

    缺点:无法获取key为特俗字符开头得数据,如@开头;
    每次只能解析出一个json数据
    特殊字符缺点可解决

    示例:

    原始数据样例(已脱敏)

    查询sql

select * from  ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 1

查询结果

{
    "@timestampMillis":"1607183999645",
    "@offset":"364045812",
    "@timestamp":"2020-12-05T15:59:59.645Z",
    "@version":"1",
    "message":{
        "duration":0,
        "traceId":"f7f7e26468b4382c9fb10f00b0d32b6c",
        "binaryAnnotations":[
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"901914522",
                "key":"bc.pbusiId"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"-1",
                "key":"bz.subBusiState"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"192.168.155.102,ORD_769",
                "key":"gd.region_id"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"",
                "key":"bc.busiId"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"",
                "key":"bz.busiType"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"",
                "key":"bz.subBusiType"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"-1",
                "key":"bz.busiState"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"0",
                "key":"bz.resultCode"
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"",
                "key":"bz.error"
            }
        ],
        "@timestamp":"2020-12-05T15:59:59.645Z",
        "@metadata":{
            "beat":"filebeat",
            "topic":"ETE_SMMDDATA_SMMD",
            "type":"doc",
            "version":"6.2.4"
        },
        "name":"mmdRecvTMD",
        "annotations":[
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"sr",
                "timestamp":1607183990738000
            },
            {
                "endpoint":{
                    "port":0,
                    "ip":"192.168.155.102",
                    "serviceName":"1991.MMD"
                },
                "value":"ss",
                "timestamp":1607183990938000
            }
        ],
        "id":"f7f7e26468b4382c9fb10f00b0d32b6c",
        "fields":{
            "log_topics":"ETE_SMMDDATA_SMMD"
        },
        "parentId":"",
        "timestamp":1607183990738000
    }
}

特殊字符key查询

select get_json_object(json,'$.@timestampMillis') as time1,get_json_object(json,'$.message.timestamp') as time,get_json_object(json,'$.message.annotations[0].endpoint.serviceName') as service from  ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 10

查询结果

无法获取@开头的@timestamp数据
用正则替代优化特殊字符查询

regexp_replace 进行@替换,然后查询

select get_json_object(regexp_replace ( json, '@', 'abc' ), "$.abctimestampMillis" ) as time2,get_json_object(json,'$.message.timestamp') as time,get_json_object(json,'$.message.annotations[0].endpoint.serviceName') as service from  ete_bo.ete_bo_serv_all where proid like'%8440000' and dt=20201206 limit 10

结果展示

正常获取数据
2. 使用 json_tuple 解析

优点:可以直接获取特殊字符开头的数据,可以一次性获解析出多个数据

缺点:无法解析嵌套json

查询示例

select a.* from  ete_bo.ete_bo_serv_all  lateral view json_tuple(json,'@timestampMillis','@offset') a as f1,f2 where proid like'%8440000' and dt=20201206 limit 100

结果展示

正常查询

lateral view 这个主要是和 上面几个函数配合使用的函数,相当于建一个临时的视图,这样就能把上面的查询结果当做查询 查询字段查出来

@Cool.R

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值