初识Hive之Hive的基本操作命令

阿年、嗯啊

已于 2022-08-13 18:53:02 修改

阅读量479

点赞数 2

分类专栏： hive 文章标签：数据库 hive 大数据 HiveQL

于 2021-04-12 19:19:33 首次发布

本文链接：https://blog.csdn.net/qq_45796486/article/details/115177084

版权

hive 专栏收录该内容

20 篇文章 3 订阅

订阅专栏

本文详细介绍了Hive数据库的基本操作，包括查看、创建、删除数据库与表，以及分区、数据导入导出、表属性修改等。内容涵盖DML操作，如数据插入、更新与删除，以及内外部表、分桶表的管理和使用。Hive内部执行流程包括解析、编译、优化和执行四个阶段，将SQL语句转化为MapReduce任务执行。

摘要由CSDN通过智能技术生成

查看数据库：

show databases;
desc database test;  查看数据库详细信息
desc database extended test; 查看数据库扩展信息

创建数据库：

create database [if not exists] test01;
create database test02 comment 'this is a database' location '/myCreateDatabase/';

其中方括号中的内容为可选项，意思是：如果不存在数据库test01则创建。

第二条创建语句comment选项后指定的是该数据库的备注信息，原生的hive不支持中文，想要支持中文要修改国际化的配置文件，location后指定的是数据库的存储路径，该路径为hdfs上的路径。

删除数据库：

drop database test;  删除空的数据库
drop database test cascade;  删除数据库（不管是不是空的数据库，都能删除）

查看数据库信息的命令：

describe database test01;

查看表：

show tables;
desc test;  查看表信息
desc formatted test;  查看表的详细信息

创建表：

use test01;
create table table01(name String , age int);
create table table02(name String , age int) row format delimited fields terminated by ',';

其中第一条创建语句是创建一个简单的表，第二条创建语句是创建带有指定列分隔符的表，这个分隔符是在数据导入的时候有用。

create table test
(id int comment "ID",name string comment "Name")
comment "Test Table"
row format delimited fields terminated by "\t"
location "/test.table"
tblproperties("aaa"="bbb");

comment是注释，tblproperties是给表设置的属性值

修改表：

alter table test rename to test2;  将表test重命名为test2
alter table test2 change id id string;   将表id列的数据类型改为string类型
alter table test2 add columns(class string);   增加class列，数据类型为string
alter table test2 replace columns(id double,name string);  替换列

删除表：

drop table test2;

创建外部表：

外部表：：删除表后，表的数据还在
建表的时候加上external关键字

create external table test
(id int,name string)
row format delimited fields terminated by "\t";

内部表转换成外部表：

alter table test set tblproperties("external"="true");

外部表转换成内部表：

alter table test set tblproperties("external"="false");

一级分区表：

create table stu_par
(id int,name string)
partitioned by (class string)
row format delimited fields terminated by "\t";

向分区表中插入数据必须指定分区

load data local inpath "/student.txt" into table stu_par partition(class="01");

创建分区表的时候，分区表字段不能使用前面使用过的字段，并且要写上字段类型。这里以class作为分区。

查看表的分区：

show partitions stu_par;

有分区数据，但是没有元数据：

1.给表添加分区:
alter table stu_par add partition(class="03");

2.命令行直接修复所有不正确的分区：
msck repair table stu_par;

3.上传的时候直接指定分区：
load data local inpath "/student.txt" into table sut_par partition(class="04");

二级分区表：

创建二级分区表：

create table stu_par2
(id int,name string)
partitioned by(grade string,class string)
row format delimited fields terminated by "\t";

向二级分区表中插入数据：

load data local inpath "/student.txt" into table stu_par2 partition(grade="01",class="01");

分区的增删改查：

--增加分区
alter table stu_par add partition(class="05");
--一次增加多个分区
alter table stu_par add partition(class="06") partition(class="07");
--删除分区
alter table stu_par drop partition(class="05");
--一次删除多个分区
alter table stu_par drop partition(class="06") , partition(class="07");

DML数据导入：

一、从本地磁盘或者HDFS导入数据
load data [local] inpath '/student.txt' [overwrite] into table student
[partition(partcol1=val1,...)];

可选项： local：加上local是从本地导入数据，不加是从HDFS上导入数据。 overwrite：覆盖数据，会把以前的数据都覆盖掉，不加是追加数据。 HDFS的导入是移动数据，而本地导入是复制。

二、insert导入
insert into table student select id,name from stu_par where class="01";

三、建表的时候用as select
create table student2
as select id, name from student;

四、建表的时候通过location加载（一般建立的是外部表），路径是HDFS上的路径
create external table student3
(id int ,name string)
row format delimited fields terminated by '\t'
location '/xxx'

数据导出：

一、insert导出（这种导出，不带格式，也就是没有分隔符）
insert overwrite local directory '/opt/module/datas/export/student'
select * from student;

二、带格式导出
insert overwrite local directory '/opt/module/datas/export/student1'
row format delimited fields terminated by '\t'
select * from student;

三、#bash命令行导出
hive -e 'select * from default.student;' >/opt/module/datas/export/test.txt

四、整张表export到HDFS
export table student to '/export/student';       这个路径是HDFS上的路径
import table student3 from '/export/student';  从导出结果再导入到Hive

数据删除：

truncate table student;   只删表数据，不删除表本身

创建分桶表：

create table table04(name String ,age int) clustered by (name) into 3 buckets;

以name分桶，分3个桶。分桶表会将数据拆成多个文件进行存储。

抽样查询：

select * from student tablesample(bucket 1 out of 4 on id);
含义是：将前面的查询结果按照 id 分成 4 份，从中取出第一份。

查看表的详细信息：

desc查看表的信息不详细，分桶表和普通的表看不出区别,所以使用desc formatted table04查看。

Table Type: MANAGED_TABLE 表示这是一个内部表。

Num Buckets: 3 表示有3个分桶。

Bucket Columns: [name] 分桶字段。

修改表的属性

内部表和外部表相互转换
```
 alter table table01 set tblproperties ('EXTERNAL'='TRUE');
```
将table01转换为外部表，内部表的表类型是MANAGED_TABLE
table02表中原来分区字段是age,在age中再新添加一个分区 age=8
```
 alter table table02 add partition (age='8');
```
查看分区
```
 show partitions table02;
```

删除分区

 alter table table02 drop partition (age=8);

查看分区

 	show partitions table02;

删除表

drop table [if exists] table01;

方括号中的内容为可选项

HiveQL是怎么转换成MR程序的

Hive内部执行流程：解析器（解析SQL语句）、编译器（把SQL语句编译成MapReduce程序）、优化器（优化MapReduce程序）、执行器（将MapReduce程序运行的结果提交到HDFS）