Hive(四)函数(json_tuple和parse_url_tuple)/topN通用解法/Beeline连接

本文介绍了Hive中用于处理JSON和URL数据的函数json_tuple和parse_url_tuple,以及如何使用它们从数据中提取关键信息。同时,探讨了在Hive中实现TopN通用解法,利用Analytics函数ROW_NUMBER对数据进行分区排名。最后,讨论了如何通过Beeline连接到Hiveserver2进行查询操作。
摘要由CSDN通过智能技术生成

json_tuple

创建一个只有一个string类型的字段来存放json数据的表,将下列类型数据load进表中:

hive (d1_hive)> select * from rating_json limit 10;
OK
rating_json.json
{"movie":"1193","rate":"5","time":"978300760","userid":"1"}
{"movie":"661","rate":"3","time":"978302109","userid":"1"}
{"movie":"914","rate":"3","time":"978301968","userid":"1"}
{"movie":"3408","rate":"4","time":"978300275","userid":"1"}
{"movie":"2355","rate":"5","time":"978824291","userid":"1"}
{"movie":"1197","rate":"3","time":"978302268","userid":"1"}
{"movie":"1287","rate":"5","time":"978302039","userid":"1"}
{"movie":"2804","rate":"5","time":"978300719","userid":"1"}
{"movie":"594","rate":"4","time":"978302268","userid":"1"}
{"movie":"919","rate":"4","time":"978301368","userid":"1"}
Time taken: 0.311 seconds, Fetched: 10 row(s)
hive (d1_hive)>

如何将json数据拆分出来呢?使用json_tuple函数能够简单处理这种事。

hive (d1_hive)> select json_tuple(json,'movie','rate','time','userid') as (movie_id,rate,time,user_id)  from rating_json limit 10;
OK
movie_id	rate	time	user_id
1193	5	978300760	1
661	3	978302109	1
914	3	978301968	1
3408	4	978300275	1
2355	5	978824291	1
1197	3	978302268	1
1287	5	978302039	1
2804	5	978300719	1
594	4	978302268	1
919	4	978301368	1
Time taken: 0.121 seconds, Fetched: 10 row(s)
hive (d1_hive)>

针对上述例子,在生产上一般来说是要将time再次处理成时间戳/年/月/日等,新成一张大宽表,以便后续会用到。

hive (d1_hive)> select movie_id,rate,user_id,
              > from_unixtime(cast(time as BIGINT),'yyyy-MM-dd HH:mm:ss') as time,
              > year(from_
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值