本篇整理向hive表中导入数据的各种操作,
数据操纵语言(Data Manipulation Language, DML)。
目录
1. load
加载linux本地文件或HDFS文件到hive表中,
其中HDFS文件load方式是剪切模式,覆盖hive表已有数据关键词overwrite
(1)创建表
hive (hive_db1)> create table student(id int, name string) row format delimited fields terminated by '\t';
(2)加载本地文件到hive表
hive (hive_db1)> load data local inpath '/opt/module/datas/student1.txt' into table student;
(3)加载HDFS文件到hive表
首先上传本地文件到HDFS
hive (hive_db1)> dfs -put /opt/module/datas/student2.txt /user/kevin;
然后加载HDFS文件
hive (hive_db1)> load data inpath '/user/kevin/student2.txt' into table student;
ps:需要注意的是这种方式是剪切,load后HDFS上文件会消失。
(4)加载数据覆盖hive表已有数据
关键词overwrite
hive (hive_db1)> load data inpath '/user/kevin/student3.txt' overwrite into table student;
2. Insert
往hive表中插入数据,
或依据已有hive表,将查询结果插入目标hive表的分区中。
insert into:以追加数据的方式插入到hive表中,原有数据不会删除;
insert overwrite:会覆盖hive表中原有分区数据
插入多级分区,partition(month = '10' , day = '01')
(1)创建一张分区表
hive (hive_db1)> create table student2(id int , name string) partitioned by (month string) row format delimited fields terminated by '\t';
(2)插入一个分区的数据
hive (hive_db1)> insert into table student2 partition(month = '10') values(1,'kevin'),(2,'jay'),(3,'daniel');
+--------------+----------------+-----------------+--+
| student2.id | student2.name | student2.month |
+--------------+----------------+-----------------+--+
| 1 | kevin | 10 |
| 2 | jay | 10 |
| 3 | daniel | 10 |
+--------------+----------------+-----------------+--+
(3)通过已有hive表分区数据,插入hive表新增分区中
hive (hive_db1)> insert overwrite table student2 partition(month = '11') select id,name from student2 where month = '10';
+--------------+----------------+-----------------+--+
| student2.id | student2.name | student2.month |
+--------------+----------------+-----------------+--+
| 1 | kevin | 10 |
| 2 | jay | 10 |
| 3 | daniel | 10 |
| 1 | kevin | 11 |
| 2 | jay | 11 |
| 3 | daniel | 11 |
+--------------+----------------+-----------------+--+
(4)通过已有hive表分区数据,批量插入hive表新增分区汇中
hive (hive_db1)> from student2
> insert overwrite table student2 partition(month = '1')
> select id,name where month = '10'
> insert into table student2 partition(month = '2')
> select id,name where month = '10';
+--------------+----------------+-----------------+--+
| student2.id | student2.name | student2.month |
+--------------+----------------+-----------------+--+
| 1 | kevin | 1 |
| 2 | jay | 1 |
| 3 | daniel | 1 |
| 1 | kevin | 10 |
| 2 | jay | 10 |
| 3 | daniel | 10 |
| 1 | kevin | 11 |
| 2 | jay | 11 |
| 3 | daniel | 11 |
| 1 | kevin | 2 |
| 2 | jay | 2 |
| 3 | daniel | 2 |
+--------------+----------------+-----------------+--+
3. As select
依据已有hive表的查询结果创建新表,将数据添加到新hive表中
0: jdbc:hive2://hadoop100:10000> create table if not exists student3 as select id,name from student2;
4. Location
在HiveQL的DDL操作(二)中有提过,若hdfs的某目录下本身有文件,
可以通过创建外部表external的方式,指定表生成路径location ‘HDFS~path’,
这样外部表为虚表,查询其中数据即为HDFS指定目录下文件数据。
hive (hive_db1)> create external table if not exists student5(id int, name string)
row format delimited fields terminated by '\t'
location '/student;
5. Import
(1)这种方式可以用sqoop将mysql等关系型数据库中表,导入hive表中;
(2)也可以从HDFS中将数据导入hive表。
hive (hive_db1)> import table student2 partition(month='201709') from
'/user/hive/warehouse/export/student';