Java将hive数据导入到hdfs_把HDFS上的数据导入到Hive中

最新推荐文章于 2021-12-20 17:47:06 发布

weixin_35944650

最新推荐文章于 2021-12-20 17:47:06 发布

阅读量213

点赞数

文章标签： Java将hive数据导入到hdfs

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35944650/article/details/114742467

版权

1. 首先下载测试数据，数据也可以创建

http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

2. 数据类型与字段名称

movies.csv(电影元数据)

movieId,title,genres

ratings.csv(用户打分数据)

userId,movieId,rating,timestamp

3. 先把数据存放到HDFS上

hdfs dfs -mkdir /hive_operate

hdfs dfs-mkdir /hive_operate/movie_table

hdfs dfs-mkdir /hive_operate/rating_table

hdfs dfs-put movies.csv /hive_operate/movie_table

hdfs dfs-put ratings.csv /hive_operate/rating_table

4. 创建movie_table和rating_table

]$ cat create_movie_table.sql

create external table movie_table

(

movieId STRING,

title STRING,

genres STRING

)

row format delimited fields terminated by','storedastextfile

location'/hive_operate/movie_table';

]$ cat create_rating_table.sql

create external table rating_table

(userId STRING,

movieId STRING,

rating STRING,

ts STRING

)

row format delimited fields terminated by','storedastextfile

location'/hive_operate/rating_table';

其中字段名为timestamp为hive的保留字段，执行的时候会报错，需用反引号或者修改字段名，我这边修改的字段名

5. 执行

可以通过复制命令到终端执行，也可以通过hive -f movie_table_e来创建表

6. 查看

hive>show tables;

OK

movie_table

rating_table

hive> select * from rating_table limit 10;

OK1 31 2.5 1260759144

1 1029 3.0 1260759179

1 1061 3.0 1260759182

1 1129 2.0 1260759185

1 1172 4.0 1260759205

1 1263 2.0 1260759151

1 1287 2.0 1260759187

1 1293 2.0 1260759148

1 1339 3.5 1260759125

1 1343 2.0 1260759131

7. 生成新表(行为表)

create table behavior_table as

selectB.userid, A.movieid, B.rating, A.titlefrommovie_table A

join rating_table B

on A.movieid== B.movieid;

8. 把Hive表数据导入到本地

table->local file

insert overwrite local directory'/root/hive_test/1.txt' select * from behavior_table;

9. 把Hive表数据导入到HDFS上

table->hdfs file

insert overwrite directory'/root/hive_test/1.txt' select * from behavior_table;

10. 把本地数据导入到Hive表中

local file ->table

LOAD DATA LOCAL INPATH'/root/hive_test/a.txt' OVERWRITE INTO TABLE behavior_table;

11. 把HDFS上的数导入到HIve表中

hdfs file ->table

LOAD DATA INPATH'/a.txt' OVERWRITE INTO TABLE behavior_table;

weixin_35944650

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Java将hive数据导入到hdfs_把HDFS上的数据导入到Hive中

1. 首先下载测试数据，数据也可以创建http://files.grouplens.org/datasets/movielens/ml-latest-small.zip2. 数据类型与字段名称movies.csv(电影元数据)movieId,title,genresratings.csv(用户打分数据)userId,movieId,rating,timestamp3. 先把数据存放到HDFS上hd...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。