08 hive中表数据的加载和导出

最新推荐文章于 2022-12-12 17:15:49 发布

莹火虫的另一半

最新推荐文章于 2022-12-12 17:15:49 发布

阅读量234

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/woshilovetg/article/details/111876101

版权

hive 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

一、hive中表数据的加载

hive中表数据的加载有三种方式

1.1 insert into （了解即可，一般不使用）

create table score3 like score;

insert into table score3 partition(month ='202007') values ('001','002','100');

这种方式，底层会转换成 MR 执行，没执行一次，都会产生一个小文件，在进行数据插入的时候，一般一次性插入N条数据，批量加载过程。

一次性可以给多个表插入数据（不常用，了解即可）

from score insert overwrite table score_first partition(month='202006')
select s_id,c_id insert overwrite table score_second partition(month = '202006') select c_id,s_score;
等价于：
insert overwrite table score_first partition(month='202006') select s_id,c_id from score;
insert overwrite table score_second partition(month = '202006') select c_id,s_score from score;

1.2 通过查询的方式进行数据的加载（比较常用）

create table score4 like score;

insert overwrite table score4 partition(month = '202006') select s_id,c_id,s_score from score;

注意事项： select 数据表结构要和 insert 的表结构一致，字段的数量，字段的类型，字段的顺序保证一致。

1.3 使用load 方式加载数据（比较常用）

load data local inpath '/export/server/hivedatas/score.csv' overwrite into table score partition(month='202006');

注意事项：

1. load方式加载数据可以加载除了分桶表以外的数据表

2. 只能使用普通文件类型，底层运行的hdfs的命令，不加载MR命令

3. 如果有local ：表示从本地来读取，这个本地指的是 hiveserver2的服务器的本地；

如果不加local 从HDFS上读取数据。

区别：

1. 如果不加local ：底层执行 hdfs dfs -put 操作

2.如果添加local ：底层执行 hdfs dfs -mv 操作

一般情况下：

一般从数据源到数据仓库的 ODS 层使用 load 加载

从 ods 将数据加载数据仓库层 DW 层，一般使用 insert + select 语句

1.4 import 导入 hive表数据（内部表操作）

create table teacher2 like teacher; # 此路径不能是本地路径只能是 hdfs 路径 export table teacher to '/export/teacher'; import table teacher2 from '/export/teacher';

需要注意：此路径为 hdfs 的路径。

二、hive 如何导出数据

将hive表中的数据导出到其他任意目录，例如linux本地磁盘，例如hdfs，例如mysql等等

有local为导出到本地，没有则导出到hdfs

# insert 导出
insert overwrite [local] directory '路径' select 语句;
insert overwrite local directory '/export/server/exporthive' select * from score;

# insert 带格式化导出
insert overwrite [local] directory '路径' [row format delimited fields terminated by '\t'] [collection items terminated by '#'] select 语句;
insert overwrite local directory '/export/server/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;

# insert 导出到 hdfs 上 insert overwrite directory '/exporthive' row format delimited fields terminated by '\t' select * from score;

# 使用shell 命令将数据导出到文件中 hive -e "select * from myhive.score;" > /export/server/exporthive/score.csv

# 使用shell 命令将数据导出到文件中
hive -e "select * from myhive.score;" > /export/server/exporthive/score.csv

注意：导出默认的分割符号 \001

莹火虫的另一半

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
08 hive中表数据的加载和导出

一、hive中表数据的加载hive中表数据的加载有三种方式1.1 insert into （了解即可，一般不使用）create table score3 like score;insert into table score3 partition(month ='202007') values ('001','002','100');这种方式，底层会转换成 MR 执行，没执行一次，都会产生一个小文件，在进行数据插入的时候，一般一次性插入N条数据，批量加载过程。一次性...
复制链接

扫一扫