DML 数据操作

最新推荐文章于 2023-12-18 17:29:54 发布

宝哥大数据

最新推荐文章于 2023-12-18 17:29:54 发布

阅读量866

点赞数 1

分类专栏： # hive

本文链接：https://blog.csdn.net/wuxintdrh/article/details/81463056

版权

hive 专栏收录该内容

38 篇文章 6 订阅

订阅专栏

文章目录

一、数据导入
二、导出数据
三、清除表中数据

一、数据导入

1.1、从文件中加载数据

1.1.1、语法格式

hive> load data [local] inpath '/input/test' overwrite | into table student [partition (partcol1=val1,…)];
（1）load data:表示加载数据
（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
（3）inpath:表示加载数据的路径
（4）overwrite:表示覆盖表中已有数据，否则表示追加
（5）into table:表示加载到哪张表
（6）student:表示具体的表
（7）partition:表示上传到指定分区

1.1.2、实例

创建一个表
create table student(id string, name string) 
row format delimited fields terminated by '\t';

本地文件内容
[root@dn2 test]# vi person
chb 18 male
chb2 20 male

加载本地文件
load data local inpath '/uardata1/hivetest/student' into table student;

加载hdfs上文件， 不用 local

1.2、通过查询插入表

1.2.1、准备数据

1、普通表
create table pt2(id int, name string, gender string);
-- 插入数据
insert into table pt2 values (1, 'chb', 'male'), (2, 'ss', 'female'); 
2、分区表
	create table pt1(id int, name string) partitioned by (gender string);

1.2.2、将普通表中数据，插入分区表, 需要指定分区，查询的字段与分区字段保持一致。



不指定分区
hive> insert into pt1 select * from pt2;
FAILED: SemanticException 1:12 Need to specify partition columns 
because the destination table is partitioned. Error encountered near token 'pt1'
字段数不匹配

hive> insert into pt1 partition(gender='gd1') select * from pt2;
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table 
because column number/types are different ''gd1'': Table insclause-0 has 2 columns, 
but query has 3 columns.

#普通表向分区表插入语句
hive> insert into pt1 partition(gender='female') select id,name from pt2 where gender='female';
分区表向普通表插入
hive> insert into pt2 select * from pt1;

1.3、查询语句中创建表并加载数据（As Select）

根据查询结果创建表（查询的结果会添加到新创建的表中）
create table if not exists student3 as select id, name from student;

1.4、创建表时通过Location指定加载数据路径

1．创建表，并指定在hdfs上的位置
	hive (default)> create table if not exists student5(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/user/hive/warehouse/student';
2．上传数据到hdfs上
	hive (default)> dfs -put /uardata1/hivetest/student /user/hive/warehouse/student;
3．查询数据
	hive (default)> select * from student5;

1.5、Import数据到指定Hive表中

注意：先用export导出后，再将数据导入。

-- 首先到处
hive > export table student to '/tmp/hivetest/export/student';

-- 从到处的数据，导入
hive > import table student6  from '/tmp/hivetest/export/student';

##二、动态加载分区
创建一个分区表t12

create table t12(id int,name string) 
partitioned by (year int,month int) 
row format delimited fields terminated by '\t';

向t12中加载数据

load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=8); 
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=9); 
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=10); 
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=11);

这里写图片描述

创建另外一个分区，两者的表结构一致

create table t13(id int,name string) 
partitioned by (year int,month int) 
row format delimited fields terminated by '\t';

###如果直接向t13中插入数据，会报错

hive> insert into t13 select * from t12;
FAILED: SemanticException 1:12 Need to specify partition columns 
because the destination table is partitioned. Error encountered near token 't13'
hive>

###但是如果指定分区，必须指定一个固定值的分区，注意month没有指定，所有在查询中要添加month字段。


insert into table t13 partition(year=2018,month) select id,name,month from t12 where year=2018;

###如果分区比较多，上述方法需要重复多次，想一次全部插入

insert into table t13 partition(year,month) select id,name,year, month from t12;


###结果
hive> insert into table t13 partition(year,month) select id,name,year, month from t12;
FAILED: SemanticException [Error 10096]: 
Dynamic partition strict mode requires at least one static partition column.   原因
To turn this off set hive.exec.dynamic.partition.mode=nonstrict		           解决办法
hive>

二、导出数据

2.1、Insert导出

1．将查询的结果导出到本地
insert overwrite local directory '/tmp/hivetest/export/studentbak' select * from student;
2．将查询的结果格式化导出到本地
	hive(default)>insert overwrite local directory '/tmp/hivetest/export/student1'
           ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             
           select * from student;
3．将查询的结果导出到HDFS上(没有local)
	hive (default)> insert overwrite directory '/tmp/hivetest/export/student2'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
             select * from student;

2.2、Hadoop命令导出到本地

hive (default)> dfs -get /user/hive/warehouse/hivetest2.db/student/student /tmp/hivetest/export/student3.txt;

2.3、Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

[root@chb2 ~]# bin/hive -e 'select * from hivetest.student;' > /tmp/hivetest/export/student4.txt;

2.4、Export导出到HDFS上

hive)> export table default.student to  '/uardata1/hivetest/export/student';

2.5、Sqoop导出

在hdfs复制文件(夹)
$ hadoop fs -cp source destination

使用DIRECTORY
hive>INSERT OVERWRITE [LOCAL] DIRECTORY '...' SELECT ...FROM...WHERE ...;



保存select查询结果的几种方式：
1、将查询结果保存到一张新的hive表中
create table t_tmp
as
select * from t_p;

2、将查询结果保存到一张已经存在的hive表中
insert into  table t_tmp
select * from t_p;

3、将查询结果保存到指定的文件目录（可以是本地，也可以是hdfs）
insert overwrite local directory '/home/hadoop/test'
select * from t_p;


insert overwrite directory '/aaa/test'
select * from t_p;