json_tuple
创建一个只有一个string类型的字段来存放json数据的表,将下列类型数据load进表中:
hive (d1_hive)> select * from rating_json limit 10;
OK
rating_json.json
{"movie":"1193","rate":"5","time":"978300760","userid":"1"}
{"movie":"661","rate":"3","time":"978302109","userid":"1"}
{"movie":"914","rate":"3","time":"978301968","userid":"1"}
{"movie":"3408","rate":"4","time":"978300275","userid":"1"}
{"movie":"2355","rate":"5","time":"978824291","userid":"1"}
{"movie":"1197","rate":"3","time":"978302268","userid":"1"}
{"movie":"1287","rate":"5","time":"978302039","userid":"1"}
{"movie":"2804","rate":"5","time":"978300719","userid":"1"}
{"movie":"594","rate":"4","time":"978302268","userid":"1"}
{"movie":"919","rate":"4","time":"978301368","userid":"1"}
Time taken: 0.311 seconds, Fetched: 10 row(s)
hive (d1_hive)>
如何将json数据拆分出来呢?使用json_tuple函数能够简单处理这种事。
hive (d1_hive)> select json_tuple(json,'movie','rate','time','userid') as (movie_id,rate,time,user_id) from rating_json limit 10;
OK
movie_id rate time user_id
1193 5 978300760 1
661 3 978302109 1
914 3 978301968 1
3408 4 978300275 1
2355 5 978824291 1
1197 3 978302268 1
1287 5 978302039 1
2804 5 978300719 1
594 4 978302268 1
919 4 978301368 1
Time taken: 0.121 seconds, Fetched: 10 row(s)
hive (d1_hive)>
针对上述例子,在生产上一般来说是要将time再次处理成时间戳/年/月/日等,新成一张大宽表,以便后续会用到。
hive (d1_hive)> select movie_id,rate,user_id,
> from_unixtime(cast(time as BIGINT),'yyyy-MM-dd HH:mm:ss') as time,
> year(from_