首先在Hive 中查看json_tuple 函数的详细信息
desc function extended json_tuple;
出现解释:
json_tuple(jsonStr, p1, p2, …, pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string. |
jsonStr,就是我们自己的json数据文件名称,p1,p2,p3…等是json 数据的参数
例题
: **json格式的电影评分数据## **
- 首先解析json
create table t_json(json string); - 加载数据到表中
select json_tuple(json,‘movie’,‘rate’,‘timeStamp’,‘uid’) as
(movie,rate,ts,uid) from tb_json limit 20;
查看建表数据
|+--------+-------+------------+------+
| movie | rate | ts | uid |
+--------+-------+------------+------+
| 1193 | 5 | 978300760 | 1 |
| 661 | 3 | 978302109 | 1 |
| 914 | 3 | 978301968 | 1 |
| 3408 | 4 | 978300275 | 1 |
| 2355 | 5 | 978824291 | 1 |
| 1197 | 3 | 978302268 | 1 |
| 1287 | 5 | 978302039 | 1 |
| 2804 | 5 | 978300719 | 1 |
| 594 | 4 | 978302268 | 1 |
| 919 | 4 | 978301368 | 1 |
+--------+-------+------------+------+| |
- 将表数据创建到新表
create table tb_movie_rate as select json_tuple
(json,‘movie’,‘rate’,‘timeStamp’,‘uid’) as (movie,rate,st,uid) from
tb_json;
以上是将建立新表到后面练习可以用
接下来就是练习题:
1、统计每部电影的平均得分
create table tb_avg_rate as select movie, avg(rate) from
tb_movie_rate group by movie;
2、统计数据中一共有多少部电影
create table tb_cout_movie as select count(distinct movie) from
tb_movie_rate;
+--------------------+
| tb_cout_movie._c0 |
+--------------------+
| 3706 |
+--------------------+
3、统计每部电影的被评分次数(热门度)
create table tb_sum_movie as select movie, count(distinct uid) from
tb_movie_rate group by movie;
+---------------------+-------------------+
| tb_sum_movie.movie | tb_sum_movie._c1 |
+---------------------+-------------------+
| 1 | 2077 |
| 10 | 888 |
| 100 | 128 |
| 1000 | 20 |
| 1002 | 8 |
| 1003 | 121 |
| 1004 | 101 |
| 1005 | 142 |