hive往分区表里插入数据之后查不到_Hive入门实战(三)DML数据操作

ba3368fab619139cc310a2b36a36cf17.png

一、Hive数据操作---DML数据操作

1、数据导入

第一种方式:向表中装载数据(Load)

//语法
hive> load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];
  • load data:表示加载数据
  • local:表示从本地加载数据到hive表;否则从HDFS加载数据到hive表
  • inpath:表示加载数据的路径
  • overwrite:表示覆盖表中已有数据,否则表示追加
  • into table:表示加载到哪张表
  • student:表示具体的表
  • partition:表示上传到指定分区

操作步骤:

(1)创建一张数据表student5

hive (default)> create table student5(id int, name string)
              > row format delimited fields terminated by 't';
              
OK
Time taken: 0.114 seconds
hive (default)> 

(2)加载本地文件到hive

hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' into table default.student5;

Loading data to table default.student5
Table default.student4 stats: [numFiles=1, totalSize=39]
OK
Time taken: 0.378 seconds
hive (default)> 

(3)加载数据覆盖表中已有的数据

hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' overwrite   into table default.student5;

Loading data to table default.student5
Table default.student5 stats: [numFiles=1, numRows=0, totalSize=39, rawDataSize=0]
OK
Time taken: 0.461 seconds
hive (default)> 

//之后,再查询这张表,有数据
hive (default)> select * from student5;

OK
student4.id	student5.name
1001	zhangshan
1002	lishi
1003	zhaoliu
Time taken: 0.102 seconds, Fetched: 3 row(s)
hive (default)> 

//再加载一次(加载本地文件到hive)
hive (default)>  load data local inpath '/usr/local/hadoop/module/datas/student.txt' into table default.student5;

Loading data to table default.student5
Table default.student5 stats: [numFiles=2, numRows=0, totalSize=78, rawDataSize=0]
OK
Time taken: 0.426 seconds
hive (default)> 


//再查询这张表,数据有所增加(原本3条,现在6条)
hive (default)> select * from student5;

OK
student5.id	student5.name
1001	zhangshan
1002	lishi
1003	zhaoliu
1001	zhangshan
1002	lishi
1003	zhaoliu
Time taken: 0.099 seconds, Fetched: 6 row(s)
hive (default)> 


//若执行,加载数据覆盖表中已有的数据,(本来6条数据会被覆盖)
hive (default)> load data local inpath '/usr/local/hadoop/module/datas/student.txt' overwrite   into table default.student5;

Loading data to table default.student5
Table default.student5 stats: [numFiles=1, numRows=0, totalSize=39, rawDataSize=0]
OK
Time taken: 0.479 seconds
hive (default)> 

hive (default)> select * from student5;
OK
student5.id	student5.name
1001	zhangshan
1002	lishi
1003	zhaoliu
Time taken: 0.102 seconds, Fetched: 3 row(s)
hive (default)> 

(4)我们需要上传student.txt到HDFS到根目录,执行命令如下:

[root@hadoop101 ~]# cd /usr/local/hadoop/module/datas/
[root@hadoop101 datas]# hadoop fs -put student.txt /
[root@hadoop101 datas]# 

d47e6a9c9fd1d47303233e27d5856126.png

HDFS的根目录加载本地文件到hive

hive (default)> load data  inpath '/student.txt' into table default.student5;

Loading data to table default.student5
Table default.student5 stats: [numFiles=2, numRows=0, totalSize=78, rawDataSize=0]
OK
Time taken: 0.432 seconds
hive (default)> 

//查询一下数据信息(从下面得知,又多了3条数据)
hive (default)>  select * from student5;

OK
student5.id	student5.name
1001	zhangshan
1002	lishi
1003	zhaoliu
1001	zhangshan
1002	lishi
1003	zhaoliu
Time taken: 0.091 seconds, Fetched: 6 row(s)
hive (default)> 

原本三条数据,现如今有6条,同时,再HDFS文件系统查看student.txt是否还存在

4ca5878f86b7ab4ddfe6a6647bcec48b.png

第二种方式:通过查询语句向表中插入数据(Insert)

操作步骤:

//查看当前的数据表中是否有分区表
hive (default)> show tables;

OK
tab_name
db_hive1
dept
dept_partition2
emp
hive_test
sqoop_test
stu2
stu_partition
student
student1
student3
student4
student5
Time taken: 1.484 seconds, Fetched: 13 row(s)
hive (default)> 

hive (default)> desc stu2;

OK
col_name	data_type	comment
id                  	int                 	                    
name                	string              	                    
month               	string              	                    
day                 	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
month               	string              	                    
day                 	string              	                    
Time taken: 0.69 seconds, Fetched: 10 row(s)
hive (default)> 

由上,stu2是一个分区表,若没有分区表需要创建,创建sql命令如下:

hive (default)> create table student(id int, name string) partitioned by (month string) 
row format delimited fields terminated by 't';

自己本身有分区表就忽略过了。接着,需要向分区表(stu2)基本插入数据

hive (default)> insert into table stu2 partition(month='202006',day='26') values(1,'wangwu');

Query ID = root_20200102135829_bda9ac50-448a-4038-b617-33a3d1f448b4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577971593473_0001, Tracking URL = http://hadoop101:8088/proxy/application_1577971593473_0001/
Kill Command = /usr/local/hadoop/module/hadoop-2.7.2/bin/hadoop job  -kill job_1577971593473_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-01-02 13:58:52,919 Stage-1 map = 0%,  reduce = 0%
2020-01-02 13:59:05,561 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.8 sec
MapReduce Total cumulative CPU time: 2 seconds 800 msec
Ended Job = job_1577971593473_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop101:9000/user/hive/warehouse/stu2/month=202006/day=26/.hive-staging_hive_2020-01-02_13-58-29_834_7263697218847751236-1/-ext-10000
Loading data to table default.stu2 partition (month=202006, day=26)
Partition default.stu2{month=202006, day=26} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.8 sec   HDFS Read: 3660 HDFS Write: 97 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 800 msec
OK
_col0	_col1
Time taken: 38.147 seconds

再次查询stu2表

hive (default)> select * from stu2;

OK
stu2.id	stu2.name	stu2.month	stu2.day
1001	zhangshan	202006	23
1002	lishi	202006	23
1003	zhaoliu	202006	23
1	wangwu	202006	26
Time taken: 0.336 seconds, Fetched: 4 row(s)
hive (default)> 

//允许插入重复的数据
hive (default)> insert into table stu2 partition(month='202006',day='26') values(1,'wangwu');

hive (default)> select * from stu2;
OK
stu2.id	stu2.name	stu2.month	stu2.day
1001	zhangshan	202006	23
1002	lishi	202006	23
1003	zhaoliu	202006	23
1	wangwu	202006	26
1	wangwu	202006	26
Time taken: 0.112 seconds, Fetched: 5 row(s)
hive (default)> 

bbbe24ca2415a0d064ac7aac67114c6e.png

基本模式插入(根据单张表查询结果)

hive (default)> insert into table stu2 partition(month=202006,day=29)
              > select * from student;

46f3c144938010d9e7bc624829929afc.png

第三种方式:查询语句中创建表并加载数据(As Select)

根据查询结果创建表(查询的结果会添加到新创建的表中)

create table if not exists student3
as select id, name from student;

第四种方式:创建表时通过Location指定加载数据路径

(1)创建一张表

hive (default)> create table student2 like student; 
OK
Time taken: 0.247 seconds
hive (default)> 

d6aa39f674331a79de98de2cb0d5e41c.png

接着,在本地的/usr/local/hadoop/module/datas/student.txt上传到HDFS文件系统到/user/hive/warehouse/student2

hive (default)> dfs -put /usr/local/hadoop/module/datas/student.txt /user/hive/warehouse/student2 ;

61b917426c378c6d7baf0fdd846a04e3.png

student2这张表是可以查询到数据的

hive (default)> select * from student2;

OK
student2.id	student2.name	student2.age
1001	zhangshan	NULL
1002	lishi	NULL
1003	zhaoliu	NULL
Time taken: 0.081 seconds, Fetched: 3 row(s)
hive (default)> 

接着,在HDFS文件系统的/user目录下创建一个MrZhou文件夹

hive (default)> dfs -mkdir -p /user/MrZhou;
hive (default)> 

6068325045625822aafd9b55a87cc017.png

接着,把本地的文件(usr/local/hadoop/module/datas/student.txt)上传到HDFS文件系统(user/MrZhou)文件目录下

hive (default)> dfs -put  /usr/local/hadoop/module/datas/student.txt /user/MrZhou;
hive (default)> 

接着,当我们创建一个新数据表,需要指定创建的这个文件路径,后面查询还是可以拿到数据的

hive (default)> create table student6 like student
              > location '/user/MrZhou';
              
OK
Time taken: 0.181 seconds
hive (default)> 
hive (default)> select * from student6;

OK
student6.id	student6.name	student6.age
1001	zhangshan	NULL
1002	lishi	NULL
1003	zhaoliu	NULL
Time taken: 0.118 seconds, Fetched: 3 row(s)
hive (default)> 

第五种方式:Import数据到指定Hive表中

hive (default)> import table student from  '/user/MrZhou';
FAILED: SemanticException [Error 10027]: Invalid path
hive (default)> 

//上述错误说,导入路径不可用

2、数据导出

(1) Insert导出 (将查询的结果导出到本地)

hive (default)>  insert overwrite local directory '/usr/local/hadoop/module/datas/stu1'
              > select * from student;
              
Query ID = root_20200102224956_684283c4-a161-44d6-883e-b888b0b1e5d7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1577971593473_0006, Tracking URL = http://hadoop101:8088/proxy/application_1577971593473_0006/
Kill Command = /usr/local/hadoop/module/hadoop-2.7.2/bin/hadoop job  -kill job_1577971593473_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-01-02 22:50:09,422 Stage-1 map = 0%,  reduce = 0%
2020-01-02 22:50:19,536 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.41 sec
MapReduce Total cumulative CPU time: 1 seconds 410 msec
Ended Job = job_1577971593473_0006
Copying data to local directory /usr/local/hadoop/module/datas/stu1
Copying data to local directory /usr/local/hadoop/module/data
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值