hive解析json格式数据

 get_json_object函数使用

1)数据xjson

[{"name":"大郎","sex":"","age":"25"},{"name":"西门庆","sex":"","age":"47"}]

2)取出第一个json对象

select get_json_object('[{"name":"大郎","sex":"","age":"25"},{"name":"西门庆","sex":"","age":"47"}]','$[0]');

结果是:{"name":"大郎","sex":"","age":"25"}

3)取出第一个jsonage字段的值

SELECT get_json_object('[{"name":"大郎","sex":"","age":"25"},{"name":"西门庆","sex":"","age":"47"}]',"$[0].age");

结果是:25

5.1.2 创建启动表

1)建表语句

hive (gmall)>

drop table if exists dwd_start_log;

CREATE EXTERNAL TABLE dwd_start_log(

`mid_id` string,

`user_id` string,

`version_code` string,

`version_name` string,

`lang` string,

`source` string,

`os` string,

`area` string,

`model` string,

`brand` string,

`sdk_version` string,

`gmail` string,

`height_width` string, 

`app_time` string,

`network` string,

`lng` string,

`lat` string,

`entry` string,

`open_ad_type` string,

`action` string,

`loading_time` string,

`detail` string,

`extend1` string

)

PARTITIONED BY (dt string)

stored as parquet

location '/warehouse/gmall/dwd/dwd_start_log/'

TBLPROPERTIES('parquet.compression'='lzo');

5.1.3 向启动表导入数据

hive (gmall)>

insert overwrite table dwd_start_log

PARTITION (dt='2020-03-10')

select

    get_json_object(line,'$.mid') mid_id,

    get_json_object(line,'$.uid') user_id,

    get_json_object(line,'$.vc') version_code,

    get_json_object(line,'$.vn') version_name,

    get_json_object(line,'$.l') lang,

    get_json_object(line,'$.sr') source,

    get_json_object(line,'$.os') os,

    get_json_object(line,'$.ar') area,

    get_json_object(line,'$.md') model,

    get_json_object(line,'$.ba') brand,

    get_json_object(line,'$.sv') sdk_version,

    get_json_object(line,'$.g') gmail,

    get_json_object(line,'$.hw') height_width,

    get_json_object(line,'$.t') app_time,

    get_json_object(line,'$.nw') network,

    get_json_object(line,'$.ln') lng,

    get_json_object(line,'$.la') lat,

    get_json_object(line,'$.entry') entry,

    get_json_object(line,'$.open_ad_type') open_ad_type,

    get_json_object(line,'$.action') action,

    get_json_object(line,'$.loading_time') loading_time,

    get_json_object(line,'$.detail') detail,

    get_json_object(line,'$.extend1') extend1

from ods_start_log

where dt='2020-03-10';

3)测试

hive (gmall)> select * from dwd_start_log limit 2;

 

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值