4）Hive（DDL：数据定义语言）

最新推荐文章于 2021-05-09 19:30:03 发布

念达

最新推荐文章于 2021-05-09 19:30:03 发布

阅读量140

点赞数

分类专栏：大数据之Hive

本文链接：https://blog.csdn.net/weixin_44757575/article/details/102556551

版权

大数据之Hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

创建数据库

创建一个数据库，数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db
create database if not exists db_hive;
创建一个数据库，指定数据库在HDFS上存放的位置
create database if not exists db_hive2 location '/test/db_hive.db'

查询数据库

过滤显示数据库：
show databases like 'db_hive*;'
查看数据库详情:
desc database db_hive;//显示数据库信息
desc database extended db_hive;//显示数据库详细信息
切换当前数据库:
use db_hive;

修改数据库

用户可以使用alter database 命令为某个数据库的dbproperties设置键-值对属性值，来描述这个数据库的属性信息(注：数据库的其他元数据信息都是不可更改的，包括数据库名和数据库所在的目录位置)：
alter database db_hive set dbproperties('createtime'='20060606');

删除数据库

删除空数据库：
drop database if exists db_hive2;
删除非空数据库：
drop database db_hive2 cascade;

创建表

create [external] table [if not exists] tb_name
[(col_name data_type [comment col_comment], ...)]
[comment tb_comment]
[partitioned by (col_name data_type [comment col_comment], ...)]
[clustered by (col_name,col_name, ...)]
[sorted by (col_name [asc|desc], ...) into num_buckets buckets]
[row format row_format]
[stored as file_name]
[location hdfs_path]

字段解释：
①create table创建一个指定名字的表。如果相同名字的表已经存在，则抛出异常；用户可以用 if not exists 选项来忽略这个异常
②external关键字表示创建的是外部表，在创建表的同时指向实际数据的路径（location）
③comment：为表和列添加注释
④partitioned by：创建分区表
⑤clustered by：创建分桶表
⑥sorted by：排序（不常用）

⑦row format
delimited [fields terminated by ‘char’] [collection items terminated by ‘char’]
[map keys terminated by ‘char’]
[lines terminated by ‘char’]
| serde serde_name [with serdeproperties
(property_name=property_value, property_name=property_value, …)]
用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED，将会使用自带的SerDe。在建表的时候，用户还需要为表指定列，用户在指定表的列的同时也会指定自定义的SerDe，Hive通过SerDe确定表的具体的列的数据

⑧stored by：指定存储的文件类型（常用的存储文件类型：sequencefile序列化文件、textfile文本文件、rcfile列式存储格式文件）
⑨location：指定表在hdfs的存储位置

简单示例：

create table if not exit student(id int,name string)
row format delimited fields terminated by '\t'
store as textfile
location '/user/hive/warehouse/student';

管理表（内部表）：
- Hive在创建内部表时会将数据移动到数据仓库指向的路径，在删除表时，内部表的元数据和原数据都会被删除
- 根据查询结果创建表（查询的结果会添加到新创建的表中）
  create table if not exists stu2 as select * from stu1;
- 根据已经存在的表结构创建表
  create table if not exists stu3 like stu;
- 查询表的类型
  desc formatted stu;
  Table Type: MANAGED_TABLE
外部表：
- 创建外部表，仅记录数据所在的路径，不会对数据的位置做任何改变；在删除外部表时，只会删除元数据而不会删除原数据
- 创建外部表：
  create external table if not exists student;
- 管理表和外部表的使用场景：
  如：每天将收集到的网站日志定期流入HDFS文本文件。在外部表（原始日志表）的基础上做大量的统计分析；用到的中间表、结果表使用内部表存储，数据通过SELECT+INSERT进入内部表；
- 内部表和外部表的相互转换：
  alter table student set tblproperties('EXTERNAL'='FALSE'); //外转内（格式固定区分大小写）
  alter table student set tblproperties('EXTERNAL'='TRUE'); //内转外（格式固定区分大小写）

分区表

基本介绍：
- 分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录；
- 把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多；
分区表基本操作：

① 引入分区表：

/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log

② 创建分区表语法：

create table if not exists dept_partition
(deptno int, dname string, loc string)
partitioned by (month string)
row format delimited fields terminated by '\t';

③加载数据到分区表中：

load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709')
load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708')
load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707')

④ 查询分区表中数据：
select * from dept_partition where month='201707';
⑤ 多分区联合查询：

select * from dept_partition where month='201709'
union 
select * from dept_partition where month='201708'
union 
select * from dept_partition where month='201707';

⑥ 增加分区：
增加单个(多个)分区：
alter table dept_partition add partition(month='201706');
alter table dept_partition add partition(month='201706') partition(month='201705');
⑦删除单个（多个）分区：
alter table dept_partition drop partition(month='201706');
alter table dept_partition drop partition(month='201706') partition(month='201705');
⑧查看分区表有多少分区：
show partitions dept_partition;
⑨查看分区表结构：
desc formatted dept_partition;

分区表注意事项：
①创建二级分区：

create table if not exists ept_partition2(
   deptno int, dname string, loc string
)
partitioned by partition(month string,day string)
row format delimited fields terminated by '\t';

②正常的加载数据：

load data local inpath '/opt/module/hive/student.txt' 
into table default.student partition(month='201709',day='14');

//查询分区数据：
select * from dept_partition2 where month='201709' and day='14'；

③把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式：
1）上传数据后修复
msck repair table dept_partition;
2）上传数据后添加分区
alter table dept_partition add partition(month='201709', day='11');
3）创建文件夹后load数据到分区（常用）
load data local inpath '/opt/module/datas/dept.txt' into table dept_partition partition(month='201709',day='10');

修改表

重命名表：alter table student1 rename to student2;
更新列:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
增加和替换列:
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
注：ADD是代表新增一字段，字段位置在所有列后面(partition列前)，REPLACE则是表示替换表中所有字段

删除表

drop table tb_name;

念达

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
4）Hive（DDL：数据定义语言）

创建数据库创建一个数据库，数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.dbcreate database if not exists db_hive;创建一个数据库，指定数据库在HDFS上存放的位置create database if not exists db_hive2 location '/test/db_hive.db'查询数据库过滤显...
复制链接

扫一扫