文章目录
一、数据导入
1.1、从文件中加载数据
1.1.1、语法格式
hive> load data [local] inpath '/input/test' overwrite | into table student [partition (partcol1=val1,…)];
(1)load data:表示加载数据
(2)local:表示从本地加载数据到hive表;否则从HDFS加载数据到hive表
(3)inpath:表示加载数据的路径
(4)overwrite:表示覆盖表中已有数据,否则表示追加
(5)into table:表示加载到哪张表
(6)student:表示具体的表
(7)partition:表示上传到指定分区
1.1.2、实例
创建一个表
create table student(id string, name string)
row format delimited fields terminated by '\t';
本地文件内容
[root@dn2 test]# vi person
chb 18 male
chb2 20 male
加载本地文件
load data local inpath '/uardata1/hivetest/student' into table student;
加载hdfs上文件, 不用 local
1.2、通过查询插入表
1.2.1、准备数据
1、普通表
create table pt2(id int, name string, gender string);
-- 插入数据
insert into table pt2 values (1, 'chb', 'male'), (2, 'ss', 'female');
2、分区表
create table pt1(id int, name string) partitioned by (gender string);
1.2.2、将普通表中数据,插入分区表, 需要指定分区, 查询的字段与分区字段保持一致。
不指定分区
hive> insert into pt1 select * from pt2;
FAILED: SemanticException 1:12 Need to specify partition columns
because the destination table is partitioned. Error encountered near token 'pt1'
字段数不匹配
hive> insert into pt1 partition(gender='gd1') select * from pt2;
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table
because column number/types are different ''gd1'': Table insclause-0 has 2 columns,
but query has 3 columns.
#普通表向分区表插入语句
hive> insert into pt1 partition(gender='female') select id,name from pt2 where gender='female';
分区表向普通表插入
hive> insert into pt2 select * from pt1;
1.3、查询语句中创建表并加载数据(As Select)
根据查询结果创建表(查询的结果会添加到新创建的表中)
create table if not exists student3 as select id, name from student;
1.4、创建表时通过Location指定加载数据路径
1.创建表,并指定在hdfs上的位置
hive (default)> create table if not exists student5(
id int, name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student';
2.上传数据到hdfs上
hive (default)> dfs -put /uardata1/hivetest/student /user/hive/warehouse/student;
3.查询数据
hive (default)> select * from student5;
1.5、Import数据到指定Hive表中
注意:先用export导出后,再将数据导入。
-- 首先到处
hive > export table student to '/tmp/hivetest/export/student';
-- 从到处的数据,导入
hive > import table student6 from '/tmp/hivetest/export/student';
##二、动态加载分区
创建一个分区表t12
create table t12(id int,name string)
partitioned by (year int,month int)
row format delimited fields terminated by '\t';
向t12中加载数据
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=8);
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=9);
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=10);
load data local inpath '/uardata/test/t12' into table t12 partition(year=2018,month=11);
创建另外一个分区 , 两者的表结构一致
create table t13(id int,name string)
partitioned by (year int,month int)
row format delimited fields terminated by '\t';
###如果直接向t13中插入数据, 会报错
hive> insert into t13 select * from t12;
FAILED: SemanticException 1:12 Need to specify partition columns
because the destination table is partitioned. Error encountered near token 't13'
hive>
###但是如果指定分区, 必须指定一个固定值的分区, 注意month没有指定, 所有在查询中要添加month字段。
insert into table t13 partition(year=2018,month) select id,name,month from t12 where year=2018;
###如果分区比较多,上述方法需要重复多次, 想一次全部插入
insert into table t13 partition(year,month) select id,name,year, month from t12;
###结果
hive> insert into table t13 partition(year,month) select id,name,year, month from t12;
FAILED: SemanticException [Error 10096]:
Dynamic partition strict mode requires at least one static partition column. 原因
To turn this off set hive.exec.dynamic.partition.mode=nonstrict 解决办法
hive>
二、导出数据
2.1、Insert导出
1.将查询的结果导出到本地
insert overwrite local directory '/tmp/hivetest/export/studentbak' select * from student;
2.将查询的结果格式化导出到本地
hive(default)>insert overwrite local directory '/tmp/hivetest/export/student1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;
3.将查询的结果导出到HDFS上(没有local)
hive (default)> insert overwrite directory '/tmp/hivetest/export/student2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;
2.2、Hadoop命令导出到本地
hive (default)> dfs -get /user/hive/warehouse/hivetest2.db/student/student /tmp/hivetest/export/student3.txt;
2.3、Hive Shell 命令导出
基本语法:(hive -f/-e 执行语句或者脚本 > file)
[root@chb2 ~]# bin/hive -e 'select * from hivetest.student;' > /tmp/hivetest/export/student4.txt;
2.4、Export导出到HDFS上
hive)> export table default.student to '/uardata1/hivetest/export/student';
2.5、Sqoop导出
在hdfs复制文件(夹)
$ hadoop fs -cp source destination
使用DIRECTORY
hive>INSERT OVERWRITE [LOCAL] DIRECTORY '...' SELECT ...FROM...WHERE ...;
保存select查询结果的几种方式:
1、将查询结果保存到一张新的hive表中
create table t_tmp
as
select * from t_p;
2、将查询结果保存到一张已经存在的hive表中
insert into table t_tmp
select * from t_p;
3、将查询结果保存到指定的文件目录(可以是本地,也可以是hdfs)
insert overwrite local directory '/home/hadoop/test'
select * from t_p;
insert overwrite directory '/aaa/test'
select * from t_p;
三、清除表中数据
注意:Truncate只能删除管理表,不能删除外部表中数据
hive (default)> truncate table student;