Hive~库表操作DDL

最新推荐文章于 2024-07-30 15:56:26 发布

17245

最新推荐文章于 2024-07-30 15:56:26 发布

阅读量388

点赞数

分类专栏： hive

本文链接：https://blog.csdn.net/houkai18792669930/article/details/105902356

版权

hive 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1.库DDL

创建库

create database test;
create database if not exists test;
create database if not exists test location '/hive'; # 自定义存储目录

查询库

show databases;

库详情信息

desc database test;
desc database extended test;

切换库

use test;

修改库

alter database test set dbproperties('createtime'='20200503');

删除库

#空数据库
drop database test;
#非空数据库
drop database test cascade;

2.hive的数据类型

基本数据类型

Hive 数据类型	Java 数据类型	长度	例子
TINYINT	byte	1byte 有符号整数	20
SMALINT	short	2byte 有符号整数	20
INT	int	4byte 有符号整数	20
BIGINT	long	8byte 有符号整数	20
BOOLEAN	boolean	布尔类型，true 或者 false	TRUE FALSE
FLOAT	float	单精度浮点数	3.14159
DOUBLE	double	双精度浮点数	3.14159
STRING	string	字符系列。可以指定字符集。可以使用单引号或者双引号。	‘now is the time’ “for all good men”
TIMESTAMP		时间类型
BINARY		字节数组

集合数据类型

Hive 有三种复杂数据类型 ARRAY、MAP 和 STRUCT。ARRAY 和 MAP 与 Java 中的 Array 和 Map 类似，而 STRUCT 与 C 语言中的 Struct 类似，它封装了一个命名字段集合，复杂数据类型允许任意层次的嵌套。

3.HIVE表的DDL重要相关概念

内外表：默认为内部表，EXTERNAL 关键字可以创建一个外部表，在建表的同时指定一个指向实际数据的路径（LOCATION），Hive 创建内部表（管理表）时，会将数据移动到数据仓库指向的路径；若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。使用场景在外部表（原始日志表）的基础上做大量的统计分析，用到的中间表、结果表使用内部表存储，数据通过 SELECT+INSERT 进入内部表。

分区表：分区表实际上就是对应一个 HDFS 文件系统上的独立的文件夹，Hive 中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过 WHERE 子句中的表达式选择查询所需要的指定的分区，提高查询效率。

创建表语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

4.实例

①.创建表SQL

create external table if not exists test_table(
name string comment '姓名',
friends array<string> comment '朋友',
properties map<string, string> comment'属性',
father struct<name:string, age:int> comment'父亲'
)
partitioned by (month string)
row format delimited fields terminated by ',' #属性的分隔符为’，‘
collection items terminated by '_' #MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)
map keys terminated by ':' #map格式为key：value
lines terminated by '\n'; #数据行之间的分隔符

② 导入数据

#数据
touch test.txt
vim test.txt
zhagnsan,lisi_wangwu,age:18_gender:man,zhangshanfather_39
lisi,zhangsan_wangwu,age:19_gender:woman,lisifather_40
#导入数据：
load data local inpath 'test.txt' into table test_table partition(month='202005');

③.查询数据

select * from test_table;
select * from test_table where month='202005';
select * from test_table where month='202005' union select * from test_table where month='202004';

④.管理表与外部表的互相转换

#修改内部表为外部表
alter table test_table set tblproperties('EXTERNAL'='TRUE');
#修改外部表为内部表
alter table test_table set tblproperties('EXTERNAL'='FALSE');
#注意：('EXTERNAL'='TRUE')和('EXTERNAL'='FALSE')为固定写法，区分大小写！向test_table 表中插入数据，注意表设计有分区，插入的时候需要设置分区

⑤ 分区操作

#创建单个分区
alter table test_table add partition(month='202004');
#同时创建多个分区
alter table test_table add partition(month='202003') partition(month='202002');
#删除单个分区
alter table test_table drop partition(month='202004');
#同时删除多个分区
alter table test_table drop partition(month='202003'), partition (month='202002');
#查看分区表有多少分区
show partitions test_table;
#查看分区表结构
desc formatted test_table;
#创建二级分区表
create table test_partitions(name string) partitioned by (month string, day string)
row format delimited fields terminated by '\t';
#二级分区加载数据
touch test_partition.txt
vim test_partition.txt
zhagnsan
lisi
#加载数据
load  data local inpath 'test_partition.txt' into table test_partitions partition(month='202005', day='3');
# 二级分区查询分区数据
select * from test_partitions where month=202005 and day=3;

⑥.重命名表

语法：ALTER TABLE table_name RENAME TO new_table_name
alter table test_table rename to test1;

⑦.增加列

ALTER TABLE table_name ADDCOLUMNS (col_name data_type[COMMENT col_comment], ...) 
#注：ADD 是代表新增一字段，字段位置在所有列后面(partition 列前)
alter table test_table add columns(tabledesc string);
备注：ALTER TABLE  table_name  add columns( desc1 string COMMENT 'tttt') CASCADE；加分区表字段需要加上CASCADE

⑧.替换列

ALTER TABLE table_name replace [CLOUMN] col_old_name col_new_name column_type [CONMMENT col_conmment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
alter table test_table replace columns(tabledesc string, tabledesc1 string);

⑨.更新列

语法: ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
alter table test_table change column tabledesc tabledesc1 string;
注意：列的数据类型需要兼容

⑩. 删除表