Hive解析Json,URL操作

9 篇文章 0 订阅

在实际开发当中json 很常见的类型,hive处理json字符串成为目标的宽表。

从json ===》 hive table ===》sql

一,解析json
先介绍下用到的hive 内置函数 json_tuple
执行:desc function [extended] json_tuple;
显示:
json_tuple(jsonStr, p1, p2, ..., pn) like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string
说白了就是传入json的那一列,然后p*就是要查询的列名

源数据如下:

{"movie":"2791","rate":"4","time":"978302188","userid":"1"}
{"movie":"2687","rate":"3","time":"978824268","userid":"1"}
{"movie":"2018","rate":"4","time":"978301777","userid":"1"}
{"movie":"3105","rate":"5","time":"978301713","userid":"1"}
{"movie":"2797","rate":"4","time":"978302039","userid":"1"}

①建hive表,以及导入json数据

create table test_json(json string);

load data local inpath '/home/hadoop/data/jsontest.json' into test_json


select json_tuple(json,'movie','rate','time','userid') as (movie_id,rate,time,user_id) from test_json limit 10
在这里插入图片描述
解析ok,在对time字段进行进一步解析,现在是时间戳,需要解析出年月日,时分秒等。
③,用到的函数
int : year(string date)
string : from_unixtime(bigint unixtime[, string format])

select movie_id,rate,time,user_id,
year(from_unixtime(cast(time as bigint)))as year,
month(from_unixtime(cast(time as bigint)))as month,
day(from_unixtime(cast(time as bigint)))as day,
hour(from_unixtime(cast(time as bigint)))as hour,
minute(from_unixtime(cast(time as bigint)))as minute
from 
(
select json_tuple(json,'movie','rate','time','userid') as (movie_id,rate,time,user_id) 
from test_json 
)tmp limit 10;

经过验证数据解析无误
在这里插入图片描述
④,将结果写入宽表

create table rating_width as 
select movie_id,rate,time,user_id,
year(from_unixtime(cast(time as bigint)))as year,
month(from_unixtime(cast(time as bigint)))as month,
day(from_unixtime(cast(time as bigint)))as day,
hour(from_unixtime(cast(time as bigint)))as hour,
minute(from_unixtime(cast(time as bigint)))as minute
from 
(
select json_tuple(json,'movie','rate','time','userid') as (movie_id,rate,time,user_id) 
from rating_json 
)tmp;

⑤,之后所有的操作都是基于宽表进行SQL操作了。

二,解析URL
①,用到的hive内置函数 parse_url_tuple 解析url
执行:desc function [extended] parse_url_tuple;
显示示例:

SELECT parse_url_tuple(a.fullurl, 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'FILE',  'AUTHORITY', 'USERINFO', 'QUERY:k1') as (ho, pa, qu, re, pr, fi, au, us, qk1) from src a

②,将结果写入宽表即可,这里不再做示例。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

冬瓜螺旋雪碧

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值