hive数据库及表操作

最新推荐文章于 2024-09-05 12:30:40 发布

EsmeZhao

最新推荐文章于 2024-09-05 12:30:40 发布

阅读量6.1k

点赞数 27

分类专栏：数据仓库hive 文章标签： hive big data 数据库

本文链接：https://blog.csdn.net/m0_37658639/article/details/120779232

版权

数据仓库hive 专栏收录该内容

7 篇文章 5 订阅

订阅专栏

1.数据库操作

1.1 创建数据库

（1）创建一个名为school数据库，数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db

hive (default)> create database school;

(2) 避免要创建的数据库已经存在错误，增加if not exists判断

hive (default)> create database if not exists school_1;

（3）创建一个数据库，指定数据库在HDFS上存放的位置（注意：路径要精确到数据库名字）

hive (default)> create database if not exists school_2 location '/school.db';

1.2 查询数据库

（1）查询数据库

hive (default)> show databases;

（2）过滤显示查询的数据库

hive (default)> show databases like "school*";
OK
database_name
school
school_1
school_2
Time taken: 0.036 seconds, Fetched: 3 row(s)

（3）显示数据库详细信息

hive (default)> desc database school;

1.3 切换数据库

从当前数据库default切换到school数据库

hive (default)> use school;
OK
Time taken: 0.036 seconds
hive (school)>

1.4 删除数据库

（1）删除空数据库

hive (school)> drop database school_1;

（2）如果删除的数据库不存在命令会报错，最好采用if exists判断数据库是否存在

hive (school)> drop database if exists school_1;

（3）如果数据库不为空，可以采用cascade命令，强制删除

hive (school)> drop database school_2 cascade;

2.表操作

创建表语法

create [external] table [if not exists] table_name [(col_name data_type[comment col_comment],...)]

[comment table_comment]

[partitioned by(col_namedata_type[comment col_comment],...)]

[clustered by(col_name,col_name,...)]

[stored as file_format]

[location hdfs_path]

字段解释说明

（1）create table创建一个指定名字的表。如果相同名字的表已经存在，则抛出异常；用户可以用if not exists选项来忽略这个异常。

（2）external关键字可以让用户创建一个外部表，在建表的同时指定一个指向实际数据的路径（location），Hive创建内部表时，会将数据移动到数据仓库指向的路径；若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。

（3）comment：为表和列添加注释。

（4）partitioned by：创建分区表。

（5）clustered by：创建分桶表。

（6）stored as：指定存储文件类型。常用的存储文件类型：sequencefile（二进制序列文件）、textfile（文本）、rcfile（列式存储格式文件）如果文件数据是纯文本，可以使用stored as textfile。如果数据需要压缩，使用stored as sequence file。

（7）location：指定表在hdfs上的存储位置。

（8）like：允许用户复制现有的表结构，但是不复制数据

2.1 创建内部表

（1）创建内部表stu1，stu1表中包含有int类型字段id、string类型字段name，字段之间的分隔符为制表符","

hive (school)> create table if not exists stu1(id int,name string) row format delimited fields terminated by ',';

（2）根据查询结果创建表（查询的结果会添加到新创建的表中）

hive (school)> create table if not exists stu2 as select id,name from stu1;

（3）根据已经存在的表结构创建表

hive (default)> create table if not exists stu3 like stu1;

（4）查询表的详细信息

hive (school)> desc formatted stu1;

2.2 创建外部表

创建外部表tea，tea表中包含有int类型字段id、string类型字段name，字段之间的分隔符为制表符","

hive (school)> create external table if not exists tea(id int,name string) row format delimited fields terminated by ',';

2.3 内部表与外部表之间的转换

(1) 修改内部表stu1为外部表

hive (school)> alter table stu1 set tblproperties('EXTERNAL'='TRUE');

(2) 修改外部表tea为内部表

hive (school)> alter table tea set tblproperties('EXTERNAL'='FALSE');

2.4 加载数据

（1）将本地系统数据文件/export/data/student.txt加载到表stu1

hive (school)> load data local inpath'/export/data/student.txt' into table stu1;

(2) 将HDFS上数据文件 /teacher.txt加载到表tea

hive (school)> load data inpath'/teacher.txt' into table tea;

2.5 修改表

(1) 重命名

将表stu1重命名为stu

hive (school)> alter table stu1 rename to stu;

（2）添加列

给表stu添加int类型字段class

hive (school)> alter table stu add columns(class int);

（3）更换列

将stu表中id字段更换为string类型字段number

hive (school)> alter table stu change column id number string;
OK
Time taken: 0.123 seconds
hive (school)> desc stu;
OK
col_name	data_type	comment
number              	string              	                    
name                	string              	                    
class               	int

（4）替换列

将stu表中字段替换成int类型字段id、string类型字段name和int类型字段class

hive (school)> alter table stu replace columns(id int,name string,class int);
OK
Time taken: 0.107 seconds
hive (school)> desc stu;
OK
col_name	data_type	comment
id                  	int                 	                    
name                	string              	                    
class               	int

2.6 删除表

（1）删除表

hive (school)> drop table tea;

（2）清除表中数据，只能清除内部表，不能清除外部表

hive (school)> truncate table stu;

2.7 分区表

（1）创建分区表student,表中包含有int类型字段id、string类型字段name，分区字段int类型字段class，字段之间的分隔符为制表符","

hive (school)> create table student(id int,name string) partitioned by(class int) row format delimited fields terminated by ',';

（2）添加分区

添加单个分区

alter table student add partition(class=1)；

添加多个分区，分区之间空格隔开

hive (school)> alter table student add partition(class=2) partition(class=3);

（3）查看表中分区信息

hive (school)> show partitions student;
OK
partition
class=1
class=2
class=3

（4）通过where子查询加载对应分区数据

hive (school)> insert into table student partition(class=1) select id,name from stu where class=1;

（5）查询指定分区

hive (school)> select * from student where class = 1;
OK
student.id	student.name	student.class
1	xiaoming	1
2	xiaohong	1
3	xiaogang	1

（6）删除分区

删除单个分区

hive (school)> alter table student drop partition(class=1);

删除多个分区，分区之间用逗号隔开

hive (school)> alter table student drop partition(class=2),partition(class=3);

2.8 分桶表

（1）开启hive分桶功能

hive (school)> set hive.enforce.bucketing = true;

（2）创建分区表stu——cluster,表中包含有int类型字段id、string类型字段name和int类型字段class，分桶字段为class，分为3个桶，字段之间的分隔符为制表符","

hive (school)> create table stu_cluster(id int,name string,class int) clustered by(class) into 3 buckets row format delimited fields terminated by ',';

（3）通过中间表对数据进行分区

hive (school)> insert overwrite table stu_cluster select * from stu cluster by(class);

EsmeZhao

关注

27
点赞
踩
52

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录