Hive数据库及表操作(一)

最新推荐文章于 2024-04-07 19:42:46 发布

蜗牛杨哥

最新推荐文章于 2024-04-07 19:42:46 发布

阅读量2.2k

点赞数 1

分类专栏：大数据架构 Hive数据库及表操作

本文链接：https://blog.csdn.net/u014635374/article/details/105025132

版权

大数据架构同时被 2 个专栏收录

35 篇文章 0 订阅

订阅专栏

Hive数据库及表操作

1 篇文章 0 订阅

订阅专栏

Hive数据库操作

1. 创建数据库

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name

[COMMENT database_comment]

[LOCATION hdfs_path]

[WITH DBPROPERTIES (property_name=property_value,...)];

关键字含义解析如下:

IF NOT EXISTS 当数据库不存在时进行创建,存在时则忽略本次操作

COMMENT 添加注释

LOCATION 指定数据库在HDFS中的地址。不指定默认使用数据仓库地址

WITH DBPROPERTIES 指定数据库的属性信息，属性名与属性值均可自定义

DATABASE和SCHEMA关键字功能一样且可以互换,都代表数据库

例如:创建数据库表db_hive,若数据库已经存在则抛出异常

Hive> CREATE DATABASE db_hive

创建数据库db_hive,若数据库已经存在则不创建(不会抛出异常)

Hive>create database if not exists db_hive;

创建数据库db_hive2,并指定在HDFS上的存储位置:

Hive> create database db_hive2 location '/input/db_hive.db';

创建数据库db_hive,并定义相关属性

create database if not exists db_hive with dbproperties('creator'='hadoop','date'='2019-02-12');

2. 修改数据库

(1) 修改自定义属性

修改数据库的自定义属性的操作语法如下：
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES

(property_name=property_vlaue,...);

关键字SET DBPROPERTIES表示添加自定义属性

例如,创建数据库testdb,然后使用desc命令查看testdb的数据库默认描述信息(为了使操作结果的显示更加直观,此处在beeline CLI查看),如下:

0: jdbc:hive2://centoshadoop1:10000> desc database extended testdb;

+----------+----------+---------------------------------------------------+-------------+-------------+-------------+

+----------+----------+---------------------------------------------------+-------------+-------------+-------------+

+----------+----------+---------------------------------------------------+-------------+-------------+-------------+

执行以下命令,给数据库testdb添加自定义属性createtime:

alter database testdb set dbproperties('createtime'='2019-02-32');

在次查询数据库的描述如下:

(2)修改数据库的所有者

alter (database|schema) database_name set owner [user|role] user_or_role;

列如修改testdb的所有者为用户root,命令如下:

alter database testdb set owner user root;

3. 选择数据库

选择某一个数据库作为后续HiveQL的执行数据库,

USE database_name;

4. 删除数据库

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

关键字含义解析如下:

IF EXISTS 当数据库不存在时,忽略本次操作,不抛出异常

RESTRICT|CASCADE 约束|级联。默认为约束,即如果被删除的数据库中的表有数据,则删除失败。如果指定为级联,无论数据库中是否有表数据,都将强制删除。

drop database if exists testdb; 如数据库不存在则忽略本次操作

drop database testdb 若数据库中无数据则删除成功,若数据库中有表数据则抛出异常

drop database testdb cascade 无论数据库中有无表数据将强制删除

5. 显示数据库

显示当前Hive中的所有的数据库,命令如下

Hive> show databases;

过滤显示数据库前缀为db_hive的所有数据库

0: jdbc:hive2://centoshadoop1:10000> show databases like 'db_hive*';

+----------------+

| database_name |

+----------------+

| db_hive |

| db_hive2 |

+----------------+

查看当前所使用的数据库

Hive> select current_database();

显示数据库的属性描述信息

desc database extended testdb;

Hive表的操作

1. Hive的表有实际存储的数据和元数据组成。实际数据一般存储于HDFS中,元数据一般存储于关系型数据库中。

Hive中创建的表的语法如下:

create [temporary] [external] table [if not exists] [db_name.]table_name

[(col_name data_type[comment col_comment],...[constraint_specification])]

[comment table_comment]

[partitioned by (col_name data_type [comment col_comment], ...)]

[clustered by (col_name,col_name,...) [sorted by (col_name [asc|desc], ...)]

Into num_buckets buckets]

[skewed by (col_name,col_name, ...)

ON ((col_name,col_name,...),(col_value,col_value, ...), ...)

[stored as directories]

[

[row format row_format]

[stored as file_format]

| stored by ‘storage.handler.class.name’ [with serdeproperties (...)]

]

[location hdfs_path]

[tblproperties (property_name=property_value, ...)]

[as select_statement]

常用关键字含义解析如下:

Create table 创建表,后面跟上指定的表名

Temporary 声明临时表

External 声明外部表

If not exists 如果存在表,则忽略本次操作,且不抛出异常

Comment 为表和列添加注释

Partitioned by 创建分区

Clustered by 创建分桶

Sorted by 在桶中按照每个字段排序

Skewed by on 将特定字段的特定值标记为倾斜数据

Row format 自定义SerDe(Serializer/DEserializer的简称,序列化/反序列化)格式或使用默认的SerDe格式。若不指定或设置为DELIMITED将使用默认SerDe格式。在指定表的列的同时也可以指定自定义的SerDe。

Stored as 数据文件存储格式。Hive支持内置和定制开发的文件格式,常用内置的文件格式有:textfile(文本文件,默认为此格式),sequencefile(压缩序列文件),orc(orc文件),avro(avro文件),jsonfile(json文件)。

Stored by 用户自己指定的非原生数据格式

With serdeproperties 设置SerDe的属性

Location 指定表在HDFS上的存储位置

Tblproperties 自定义表的属性

也可以使用”like”关键字复制另外一张表的表结构到新表中,但不复制数据,语法如下:

create [temporary] [external] table [if not exists] [db_name.]table_name

like existing_table_or_view_name

[location hdfs_path]

需要注意的是,在创建表时,若要指定表所在的数据库有两种方法: 第一,在创建表之前使用use命令指定当前使用的数据库;第二,在表名前添加数据库声明,例如database_name.table_name。

2. 内部表

Hive中默认创建的普通表被称为管理表或内部表。内部表的数据有hive进行管理,默认存储于数据仓库目录/home/hadoop/hive/data/中，可在hive配置文件hive-site.xml中对数据仓库目录进行更改(配置属性hive.metastore.warehouse.dir)。

删除内部表时,表数据和元数据将一起被删除。

2.1 创建表

执行以下命令,使用数据库test_db;

use testdb;

create table student(id INT,name STRING);

查看数据仓库目录生成的文件,可以看到,在数据仓库目录中的testdb.db文件夹下生成了一个名为student的文件夹,该文件夹正是表”student”的数据存储目录

hadoop fs -ls -R /home/hadoop/hive/data/

drwxrwxrwx - hadoop supergroup 0 2020-03-22 10:36 /home/hadoop/hive/data/testdb.db

drwxrwxrwx - hadoop supergroup 0 2020-03-22 10:36 /home/hadoop/hive/data/testdb.db/student

2.2 查看表结构

hive (testdb)> desc student;

col_name data_type comment

id int

name string

执行以下命令,将显示详细表结构,包括表的类型以及在数据仓库的位置等信息;

desc fromatted student;

hive (testdb)> desc formatted student;

col_name data_type comment

# col_name data_type comment

id int

name string

# Detailed Table Information

Database: testdb

Owner: hadoop

CreateTime: Sun Mar 22 10:36:40 CST 2020

LastAccessTime: UNKNOWN

Retention: 0

Location: hdfs://mycluster/home/hadoop/hive/data/testdb.db/student

Table Type: MANAGED_TABLE

2.3 向表中插入数据

insert into student values(1000,'xiaoming');

Hive将insert 插入语句转成了MapReduce任务执行.查看数据仓库目录生成的文件,可以看到，在数据仓库目录中的表student对应的文件夹下生成一个名为00000_0的文件。

-rw-r--r-- 3 hadoop supergroup 14 2020-03-22 10:52 /home/hadoop/hive/data/testdb.db/student/.hive-staging_hive_2020-03-22_10-51-10_686_3386880682452879695-1/_tmp.-ext-10002/000000_0

执行以下命令,查看文件00000_0中的内容:

hadoop fs -cat /home/hadoop/hive/data/testdb.db/student/

.hive-staging_hive_2020-03-22_10-51-10_686_3386880682452879695-1/_tmp.-ext-10002/000000_0

1000 xiaoming

2.4 查询表中数据

select * from student;

hive (testdb)> select * from student;

student.id student.name

1001 Xiaoming

2.5 将本地文件导入Hive

我们可以将本地文件的数据直接导入到Hive表中,但是本地文件中数据的格式需要在创建表的时候指定.

（1）新建学生成绩表score,其中学号sno为整形,姓名name为字符串,得分score为整形,并指定以Tab键作为字段分隔符:

hive (testdb)> create table score(

> sno INT,

> name STRING,

> score INT)

> row format delimited fields terminated by '\t';

Time taken: 0.388 seconds

（2）在本地目录/home/hadoop中创建文件score.txt,并写入以下内容,列之间用tab键隔开：

hive (testdb)> load data local inpath '/home/hadoop/score.txt' into table score;

Loading data to table testdb.score

（3）查询表score的所有数据

hive (testdb)> select * from score;

score.sno score.name score.score

1001 张三 98

1002 李四 92

1003 王五 87

(4)查看HDFS数据仓库中对应的数据文件,可以看到,score.txt已被上传到了文件夹score中

hadoop fs -ls -R /home/hadoop/hive/data

drwxrwxrwx - hadoop supergroup 0 2020-03-22 11:19 /home/hadoop/hive/data/testdb.db

drwxrwxrwx - hadoop supergroup 0 2020-03-22 11:25 /home/hadoop/hive/data/testdb.db/score

-rwxrwxrwx 3 hadoop supergroup 45 2020-03-22 11:25 /home/hadoop/hive/data/testdb.db/score/score.txt

(5)执行以下命令,查看score.txt的内容

[hadoop@centoshadoop1 ~]$ hadoop fs -cat /home/hadoop/hive/data/testdb.db/score/score.txt

1001 张三 98

1002 李四 92

1003 王五 87

3. 删除表

执行以下命令,删除testdb数据库中的学生表student;

hive (testdb)> drop table if exists testdb.student;

[hadoop@centoshadoop1 ~]$ hadoop fs -ls -R /home/hadoop/hive/data

drwxrwxrwx - hadoop supergroup 0 2020-03-22 11:35 /home/hadoop/hive/data/testdb.db

drwxrwxrwx - hadoop supergroup 0 2020-03-22 11:25 /home/hadoop/hive/data/testdb.db/score

-rwxrwxrwx 3 hadoop supergroup 45 2020-03-22 11:25 /home/hadoop/hive/data/testdb.db/score/score.txt

注意: Hive LOAD语句只是将数据复制或移动到数据仓库中Hive表对应的位置,不会在加载数据的时候做任何转换工作。因此,如果手动将数据复制到表的相应位置与执行LOAD加载操作所产生的效果是不一样的。

蜗牛杨哥

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hive数据库及表操作(一)

Hive数据库操作1. 创建数据库CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name[COMMENT database_comment][LOCATION hdfs_path][WITH DBPROPERTIES (property_name=property_valu...
复制链接

扫一扫