,通过(2)可以看到自定义的udf使用起来还是不是很方便的,Hive的 TRANSFORM 关键字提供了在SQL中调用自写脚本的功能
适合实现Hive中没有的功能又不想写UDF的情况。
一、文件的采集
json文件的上传
1、建表
hive> create table t_json(line string)
> row format delimited;
2、上传文件
load data local inpath '/root/rating.json' into table t_json;
3、json数据串对应的表
create table t_rate(movieid string,rate int,time string, uid string)
row format delimited
fields terminated by '\t';
4、json数据解析的导入
(复杂的json自己可以写udf解析(或者transform通过脚本))
(line对应的是原始json表的字段)
hive> insert overwrite table t_rate
> select get_json_object(line,'$.movie') moveid,get_json_object(line,'$.rate') rate,get_json_object(line,'$.timeStamp') time,
> get_json_object(line,'$.uid') uid from t_json;
5、查看数据:
hive> select * from t_rate limit 10;
OK
1193 5 978300760 1
661 3 978302109 1
914 3 978301968 1
3408 4 978300275 1
2355 5 978824291 1
1197 3 978302268 1
1287 5 978302039 1
2804 5 978300719 1
594 4 978302268 1
919 4 978301368 1
二、transform
1、表创建
CREATE TABLE u_data_new (
movieid INT,
rating INT,
weekday INT,
userid INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
2、t_rate表的记录条数
+----------+--+
| _c0 |
+----------+--+
| 1000209 |
+----------+--+
3、weeday_mapper_new.py脚本
#!/bin/python
import sys
import datetime
for line in sys.stdin:
line = line.strip()
movieid, rating, unixtime,userid = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print '\t'.join([movieid, rating, str(weekday),userid])
4、添加脚本到hive系统路径:
add file /root/weeday_mapper_new.py;
5、日期处理后的表
insert overwrite table u_data_new
select
TRANSFORM(movieid,rate,time,uid)
USING 'python weeday_mapper_new.py'
as(movieid,rate,weekday,uid)
from t_rate;
select * from u_data_new limit 20;
+---------------------+--------------------+---------------------+--------------------+--+
| u_data_new.movieid | u_data_new.rating | u_data_new.weekday | u_data_new.userid |
+---------------------+--------------------+---------------------+--------------------+--+
| 1193 | 5 | 1 | 1 |
| 661 | 3 | 1 | 1 |
| 914 | 3 | 1 | 1 |
| 3408 | 4 | 1 | 1 |
| 2355 | 5 | 7 | 1 |
| 1197 | 3 | 1 | 1 |
| 1287 | 5 | 1 | 1 |
| 2804 | 5 | 1 | 1 |
| 594 | 4 | 1 | 1 |
| 919 | 4 | 1 | 1 |
| 595 | 5 | 7 | 1 |
| 938 | 4 | 1 | 1 |
| 2398 | 4 | 1 | 1 |
| 2918 | 4 | 1 | 1 |
| 1035 | 5 | 1 | 1 |
| 2791 | 4 | 1 | 1 |
| 2687 | 3 | 7 | 1 |
| 2018 | 4 | 1 | 1 |
| 3105 | 5 | 1 | 1 |
| 2797 | 4 | 1 | 1 |
+---------------------+--------------------+---------------------+--------------------+--+