Hive入门

最新推荐文章于 2024-06-17 10:37:13 发布

孤狼25

最新推荐文章于 2024-06-17 10:37:13 发布

阅读量418

点赞数 1

分类专栏： Hive

本文链接：https://blog.csdn.net/Batista25/article/details/73403319

版权

0 篇文章 0 订阅

订阅专栏

 
 一、Hive的数据类型 

基本数据类型
1. 整数型：tinyint/smallint/int/bigint
2. 浮点型：float/double
3. 布尔类型：boolean
4. 字符串类型：string
复杂数据类型
1. Array
2. Map
3. Struct

 
 二、Hive的数据模型 

内部表
1. 与数据库中table在概念上是类似的
2. 每个table在Hive中都有一个相应的目录存储数据
3. 所有table数据（不包括External Table）都保存在这个目录中
4. 删除表时，元数据与数据都会被删除
分区表
1. Partition对应于数据库中的Partition列的密集索引
2. 在Hive中，表的一个Partition对应于表的一个目录，所有的Partition的数据都存储在对应的目录中

  create table partition_table 

  (id int,name string) 

 
 partitioned by (gender string) 

  row format delimited fields terminated by ','; 

外部表（External Table）
1. 指向已经在HDFS中存在的数据，可以创建Partition
2. 它和内部表在元数据的组织上是相同的，而实际数据存储则有较大差异
3. 外部表只有一个过程，加载数据和创建表同时完成，并不会移动到数据仓库目录中，只是与外部数据建立一个链接。当删除外部表时，仅删除该链接。

 
 create external table external_student 

 
 (id int, name string, age int) 

 
 location '/input/student'; 

 
 create table bucket_table 

 
 (id int,name string,age int) 

 
 clustered by (name) into 5 buckets; 

视图（View）
1. 视图是一种虚表，是一个逻辑概念；可以跨越多张表
2. 视图建立在已有表的基础上，视图赖以建立的这些表称为基表
3. 视图可以简化复杂的查询

 
 create view view_test 

as

 
 select p.phone,g.gender 

 
 from yxyx_user p,partition_table g 

 
 where p.user_id = g.id; 

关注

专栏目录