Hive表数据加载和导出

最新推荐文章于 2022-12-12 17:15:49 发布

不会敲代码的小力

最新推荐文章于 2022-12-12 17:15:49 发布

阅读量2.5k

点赞数

分类专栏： Hive

本文链接：https://blog.csdn.net/weixin_45492179/article/details/108012856

版权

Hive 专栏收录该内容

10 篇文章 1 订阅

订阅专栏

本文详细介绍了Hive中的数据加载方法，包括直接插入、查询插入、多插入模式、创建表并加载数据以及通过location指定数据路径。此外，还探讨了Hive数据的导出，包括多种导出到本地和HDFS的方法，以及使用Hadoop和Hive Shell命令。同时，文章提到了清空Hive表的`truncate`语句和Hive查询语句的使用，如`like`、`rlike`和通配符。最后，讨论了Hive配置文件的优先级和Hive支持的数据存储格式，推荐在项目中使用orc或parquet格式，并采用snappy压缩。

摘要由CSDN通过智能技术生成

Hive表数据加载

五种情况

1、直接向分区表中插入数据
insert into table score3 partition(month =‘201807’) values (‘001’,‘002’,‘100’);
2、通过查询插入数据
（linux ） load data local inpath ‘/export/servers/hivedatas/score.csv’ overwrite into table score partition(month=‘201806’);
(HDFS) load data inpath ‘/export/servers/hivedatas/score.csv’ overwrite into table score partition(month=‘201806’);
3、多插入模式
from score
insert overwrite table score_first partition(month=‘201806’) select s_id,c_id
insert overwrite table score_second partition(month = ‘201806’) select c_id,s_score;
4、查询语句中创建表并加载数据（as select）
create table score5 as select * from score;
5、创建表时通过location指定加载数据路径
create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by ‘\t’ location ‘/myscore6’;

Hive数据的导出

7种方法

1、将查询的结果导出到本地
insert overwrite local directory ‘/export/servers/exporthive/a’ select * from score;
2、将查询的结果格式化导出到本地
insert overwrite local directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’
collection items terminated by ‘#’ select * from student;
3、将查询的结果导出到HDFS上(没有local)
insert overwrite directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’ collection items terminated by ‘#’ select * from score;
4、Hadoop命令导出到本地
dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;
5 、 hive shell 命令导出
bin/hive -e “select * from yhive.score;” > /export/servers/exporthive/score.txt
6、export导出到HDFS上（全表导出）
export table score to ‘/export/exporthive/score’;
7、SQOOP导出

清空hive数据表

truncate table score5;

Hive查询语句

like select * from tablename where name like ‘张%’
Rlike匹配正则
select * from tablename where name rlike ‘[]’
通配符
% 多个 100
_ 一个 1

Hive修改配置文件优先级

1、通过配置文件及逆行设置
2、hive进入shell 时，加参数 -hiveconf（命令行参数）
hive -hiveconf hive.root.logger=INFO；
3、在Hiveshell 内进行设置（参数声明）
set mapred.reduce.tasks=100;
参数声明 > 命令行参数 > 配置文件

Hive支持的数据存储格式

可支持Text，
SequenceFile，
ParquetFile，
ORC格式
RCFILE等

在实际的项目开发当中，hive表的数据存储格式一般选择：orc或parquet。压缩方式一般选择snappy。

不会敲代码的小力

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive表数据加载和导出

Hive表数据加载和导出Hive表数据加载Hive数据的导出清空hive数据表Hive查询语句Hive修改配置文件优先级Hive支持的数据存储格式Hive表数据加载五种情况1、直接向分区表中插入数据insert into table score3 partition(month =‘201807’) values (‘001’,‘002’,‘100’);2、通过查询插入数据（linux ） load data local inpath ‘/export/servers/hivedatas/sc
复制链接

扫一扫

专栏目录