HiveQL的DML操作（一）——数据导入

最新推荐文章于 2023-03-14 16:45:40 发布

攻城狮Kevin

最新推荐文章于 2023-03-14 16:45:40 发布

阅读量443

点赞数 1

分类专栏： Hive 文章标签： Hive

本文链接：https://blog.csdn.net/wx1528159409/article/details/87990623

版权

Hive 专栏收录该内容

66 篇文章 17 订阅

订阅专栏

本篇整理向hive表中导入数据的各种操作，

数据操纵语言（Data Manipulation Language, DML）。

1. load

加载linux本地文件或HDFS文件到hive表中，

其中HDFS文件load方式是剪切模式，覆盖hive表已有数据关键词overwrite

（1）创建表
hive (hive_db1)> create table student(id int, name string) row format delimited fields terminated by '\t';

（2）加载本地文件到hive表
hive (hive_db1)> load data local inpath '/opt/module/datas/student1.txt' into table student;

（3）加载HDFS文件到hive表
首先上传本地文件到HDFS
hive (hive_db1)> dfs -put /opt/module/datas/student2.txt /user/kevin;
然后加载HDFS文件
hive (hive_db1)> load data inpath '/user/kevin/student2.txt' into table student;

ps：需要注意的是这种方式是剪切，load后HDFS上文件会消失。

（4）加载数据覆盖hive表已有数据
关键词overwrite
hive (hive_db1)> load data inpath '/user/kevin/student3.txt' overwrite into table student;

2. Insert

往hive表中插入数据，

或依据已有hive表，将查询结果插入目标hive表的分区中。

insert into：以追加数据的方式插入到hive表中，原有数据不会删除；

insert overwrite：会覆盖hive表中原有分区数据

插入多级分区，partition（month = '10' , day = '01'）

（1）创建一张分区表
hive (hive_db1)> create table student2(id int , name string) partitioned by (month string) row format delimited fields terminated by '\t';


（2）插入一个分区的数据
hive (hive_db1)> insert into table student2 partition(month = '10') values(1,'kevin'),(2,'jay'),(3,'daniel');

+--------------+----------------+-----------------+--+
| student2.id  | student2.name  | student2.month  |
+--------------+----------------+-----------------+--+
| 1            | kevin          | 10              |
| 2            | jay            | 10              |
| 3            | daniel         | 10              |
+--------------+----------------+-----------------+--+


（3）通过已有hive表分区数据，插入hive表新增分区中
hive (hive_db1)> insert overwrite table student2 partition(month = '11') select id,name from student2 where month = '10';

+--------------+----------------+-----------------+--+
| student2.id  | student2.name  | student2.month  |
+--------------+----------------+-----------------+--+
| 1            | kevin          | 10              |
| 2            | jay            | 10              |
| 3            | daniel         | 10              |
| 1            | kevin          | 11              |
| 2            | jay            | 11              |
| 3            | daniel         | 11              |
+--------------+----------------+-----------------+--+

（4）通过已有hive表分区数据，批量插入hive表新增分区汇中
hive (hive_db1)> from student2
               > insert overwrite table student2 partition(month = '1')
               > select id,name where month = '10'
               > insert into table student2 partition(month = '2')
               > select id,name where month = '10';

+--------------+----------------+-----------------+--+
| student2.id  | student2.name  | student2.month  |
+--------------+----------------+-----------------+--+
| 1            | kevin          | 1               |
| 2            | jay            | 1               |
| 3            | daniel         | 1               |
| 1            | kevin          | 10              |
| 2            | jay            | 10              |
| 3            | daniel         | 10              |
| 1            | kevin          | 11              |
| 2            | jay            | 11              |
| 3            | daniel         | 11              |
| 1            | kevin          | 2               |
| 2            | jay            | 2               |
| 3            | daniel         | 2               |
+--------------+----------------+-----------------+--+

3. As select

依据已有hive表的查询结果创建新表，将数据添加到新hive表中

0: jdbc:hive2://hadoop100:10000> create table if not exists student3 as select id,name from student2;

4. Location

在HiveQL的DDL操作（二）中有提过，若hdfs的某目录下本身有文件，

可以通过创建外部表external的方式，指定表生成路径location ‘HDFS~path’，

这样外部表为虚表，查询其中数据即为HDFS指定目录下文件数据。

hive (hive_db1)> create external table if not exists student5(id int, name string)
              row format delimited fields terminated by '\t'
              location '/student;

5. Import

（1）这种方式可以用sqoop将mysql等关系型数据库中表，导入hive表中；

RDBMS到Hive

（2）也可以从HDFS中将数据导入hive表。

hive (hive_db1)> import table student2 partition(month='201709') from
 '/user/hive/warehouse/export/student';

攻城狮Kevin

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HiveQL的DML操作（一）——数据导入

本篇整理向hive表中导入数据的各种操作，数据操纵语言（Data Manipulation Language,DML）。目录1. load2. Insert3. As select4. Location5. Import1. load加载linux本地文件或HDFS文件到hive表中，其中HDFS文件load方式是剪切模式，覆盖hive表已有数据关键词ov...
复制链接

扫一扫

专栏目录