![ba3368fab619139cc310a2b36a36cf17.png](https://i-blog.csdnimg.cn/blog_migrate/edf466c759adf55fb518b47b4128d206.jpeg)
一、Hive数据操作---DML数据操作
1、数据导入
第一种方式:向表中装载数据(Load)
//语法
hive> load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];
- load data:表示加载数据
- local:表示从本地加载数据到hive表;否则从HDFS加载数据到hive表
- inpath:表示加载数据的路径
- overwrite:表示覆盖表中已有数据,否则表示追加
- into table:表示加载到哪张表
- student:表示具体的表
- partition:表示上传到指定分区
操作步骤:
(1)创建一张数据表student5
hive (default)> create table student5(id int, name string)
> row format delimited fields terminated by 't';
OK
Time taken: 0.114 seconds
hive (default)>
(2)加载本地文件到hive
hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' into table default.student5;
Loading data to table default.student5
Table default.student4 stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.378 seconds
hive (default)>
(3)加载数据覆盖表中已有的数据
hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' overwrite into table default.student5;
Loading data to table default.student5
Table default.student5 stats: [numFiles=1, numRows=0, totalSize=39, rawDataSize=0]
OK
Time taken: 0.461 seconds
hive (default)>
//之后,再查询这张表,有数据
hive (default)> select * from student5;
OK
student4.id student5.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.102 seconds, Fetched: 3 row(s)
hive (default)>
//再加载一次(加载本地文件到hive)
hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' into table default.student5;
Loading data to table default.student5
Table default.student5 stats: [numFiles=2, numRows=0, totalSize=78, rawDataSize=0]
OK
Time taken: 0.426 seconds
hive (default)>
//再查询这张表,数据有所增加(原本3条,现在6条)
hive (default)> select * from student5;
OK
student5.id student5.name
1001 zhangshan
1002 lishi
1003 zhaoliu
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.099 seconds, Fetched: 6 row(s)
hive (default)>
//若执行,加载数据覆盖表中已有的数据,(本来6条数据会被覆盖)
hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' overwrite into table default.student5;
Loading data to table default.student5
Table default.student5 stats: [numFiles=1, numRows=0, totalSize=39, rawDataSize=0]
OK
Time taken: 0.479 seconds
hive (default)>
hive (default)> select * from student5;
OK
student5.id student5.name
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.102 seconds, Fetched: 3 row(s)
hive (default)>
(4)我们需要上传student.txt到HDFS到根目录,执行命令如下:
[root@hadoop101 ~]# cd /usr/local/hadoop/module/datas/
[root@hadoop101 datas]# hadoop fs -put student.txt /
[root@hadoop101 datas]#
![d47e6a9c9fd1d47303233e27d5856126.png](https://i-blog.csdnimg.cn/blog_migrate/f90a82d058dc7d96afe7d046ceacb424.jpeg)
从HDFS的根目录加载本地文件到hive
hive (default)> load data inpath '/student.txt' into table default.student5;
Loading data to table default.student5
Table default.student5 stats: [numFiles=2, numRows=0, totalSize=78, rawDataSize=0]
OK
Time taken: 0.432 seconds
hive (default)>
//查询一下数据信息(从下面得知,又多了3条数据)
hive (default)> select * from student5;
OK
student5.id student5.name
1001 zhangshan
1002 lishi
1003 zhaoliu
1001 zhangshan
1002 lishi
1003 zhaoliu
Time taken: 0.091 seconds, Fetched: 6 row(s)
hive (default)>
原本三条数据,现如今有6条,同时,再HDFS文件系统查看student.txt是否还存在
![4ca5878f86b7ab4ddfe6a6647bcec48b.png](https://i-blog.csdnimg.cn/blog_migrate/3bb148a8448b6411a51aef775475f791.jpeg)
第二种方式:通过查询语句向表中插入数据(Insert)
操作步骤:
//查看当前的数据表中是否有分区表
hive (default)> show tables;
OK
tab_name
db_hive1
dept
dept_partition2
emp
hive_test
sqoop_test
stu2
stu_partition
student
student1
student3
student4
student5
Time taken: 1.484 seconds, Fetched: 13 row(s)
hive (default)>
hive (default)> desc stu2;
OK
col_name data_type comment
id int
name string
month string
day string
# Partition Information
# col_name data_type comment
month string
day string
Time taken: 0.69 seconds, Fetched: 10 row(s)
hive (default)>
由上,stu2是一个分区表,若没有分区表需要创建,创建sql命令如下:
hive (default)> create table student(id int, name string) partitioned by (month string)
row format delimited fields terminated by 't';
自己本身有分区表就忽略过了。接着,需要向分区表(stu2)基本插入数据
hive (default)> insert into table stu2 partition(month='202006',day='26') values(1,'wangwu');
Query ID = root_20200102135829_bda9ac50-448a-4038-b617-33a3d1f448b4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577971593473_0001, Tracking URL = http://hadoop101:8088/proxy/application_1577971593473_0001/
Kill Command = /usr/local/hadoop/module/hadoop-2.7.2/bin/hadoop job -kill job_1577971593473_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-01-02 13:58:52,919 Stage-1 map = 0%, reduce = 0%
2020-01-02 13:59:05,561 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.8 sec
MapReduce Total cumulative CPU time: 2 seconds 800 msec
Ended Job = job_1577971593473_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop101:9000/user/hive/warehouse/stu2/month=202006/day=26/.hive-staging_hive_2020-01-02_13-58-29_834_7263697218847751236-1/-ext-10000
Loading data to table default.stu2 partition (month=202006, day=26)
Partition default.stu2{month=202006, day=26} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.8 sec HDFS Read: 3660 HDFS Write: 97 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 800 msec
OK
_col0 _col1
Time taken: 38.147 seconds
再次查询stu2表
hive (default)> select * from stu2;
OK
stu2.id stu2.name stu2.month stu2.day
1001 zhangshan 202006 23
1002 lishi 202006 23
1003 zhaoliu 202006 23
1 wangwu 202006 26
Time taken: 0.336 seconds, Fetched: 4 row(s)
hive (default)>
//允许插入重复的数据
hive (default)> insert into table stu2 partition(month='202006',day='26') values(1,'wangwu');
hive (default)> select * from stu2;
OK
stu2.id stu2.name stu2.month stu2.day
1001 zhangshan 202006 23
1002 lishi 202006 23
1003 zhaoliu 202006 23
1 wangwu 202006 26
1 wangwu 202006 26
Time taken: 0.112 seconds, Fetched: 5 row(s)
hive (default)>
![bbbe24ca2415a0d064ac7aac67114c6e.png](https://i-blog.csdnimg.cn/blog_migrate/4545f1874635a16d6692618fe51e1cd3.jpeg)
基本模式插入(根据单张表查询结果)
hive (default)> insert into table stu2 partition(month=202006,day=29)
> select * from student;
![46f3c144938010d9e7bc624829929afc.png](https://i-blog.csdnimg.cn/blog_migrate/18420bc1a19e96c9bf89b3223ef4c52a.jpeg)
第三种方式:查询语句中创建表并加载数据(As Select)
根据查询结果创建表(查询的结果会添加到新创建的表中)
create table if not exists student3
as select id, name from student;
第四种方式:创建表时通过Location指定加载数据路径
(1)创建一张表
hive (default)> create table student2 like student;
OK
Time taken: 0.247 seconds
hive (default)>
![d6aa39f674331a79de98de2cb0d5e41c.png](https://i-blog.csdnimg.cn/blog_migrate/0caee28ff0e6893af6af55a250daa67b.jpeg)
接着,在本地的/usr/local/hadoop/module/datas/student.txt上传到HDFS文件系统到/user/hive/warehouse/student2下
hive (default)> dfs -put /usr/local/hadoop/module/datas/student.txt /user/hive/warehouse/student2 ;
![61b917426c378c6d7baf0fdd846a04e3.png](https://i-blog.csdnimg.cn/blog_migrate/56653a2085c42a0d0d948ae05089400d.jpeg)
student2这张表是可以查询到数据的
hive (default)> select * from student2;
OK
student2.id student2.name student2.age
1001 zhangshan NULL
1002 lishi NULL
1003 zhaoliu NULL
Time taken: 0.081 seconds, Fetched: 3 row(s)
hive (default)>
接着,在HDFS文件系统的/user目录下创建一个MrZhou文件夹
hive (default)> dfs -mkdir -p /user/MrZhou;
hive (default)>
![6068325045625822aafd9b55a87cc017.png](https://i-blog.csdnimg.cn/blog_migrate/5047e334f62c347c93565a6050cf246d.jpeg)
接着,把本地的文件(usr/local/hadoop/module/datas/student.txt)上传到HDFS文件系统(user/MrZhou)文件目录下
hive (default)> dfs -put /usr/local/hadoop/module/datas/student.txt /user/MrZhou;
hive (default)>
接着,当我们创建一个新数据表,需要指定创建的这个文件路径,后面查询还是可以拿到数据的
hive (default)> create table student6 like student
> location '/user/MrZhou';
OK
Time taken: 0.181 seconds
hive (default)>
hive (default)> select * from student6;
OK
student6.id student6.name student6.age
1001 zhangshan NULL
1002 lishi NULL
1003 zhaoliu NULL
Time taken: 0.118 seconds, Fetched: 3 row(s)
hive (default)>
第五种方式:Import数据到指定Hive表中
hive (default)> import table student from '/user/MrZhou';
FAILED: SemanticException [Error 10027]: Invalid path
hive (default)>
//上述错误说,导入路径不可用
2、数据导出
(1) Insert导出 (将查询的结果导出到本地)
hive (default)> insert overwrite local directory '/usr/local/hadoop/module/datas/stu1'
> select * from student;
Query ID = root_20200102224956_684283c4-a161-44d6-883e-b888b0b1e5d7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577971593473_0006, Tracking URL = http://hadoop101:8088/proxy/application_1577971593473_0006/
Kill Command = /usr/local/hadoop/module/hadoop-2.7.2/bin/hadoop job -kill job_1577971593473_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-01-02 22:50:09,422 Stage-1 map = 0%, reduce = 0%
2020-01-02 22:50:19,536 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.41 sec
MapReduce Total cumulative CPU time: 1 seconds 410 msec
Ended Job = job_1577971593473_0006
Copying data to local directory /usr/local/hadoop/module/datas/stu1
Copying data to local directory /usr/local/hadoop/module/data