Hive基本介绍(3)

最新推荐文章于 2024-07-26 16:28:09 发布

不会敲代码的小力

最新推荐文章于 2024-07-26 16:28:09 发布

阅读量2.3k

点赞数

分类专栏： Hive

本文链接：https://blog.csdn.net/weixin_45492179/article/details/107957918

版权

Hive 专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Hive基本介绍3

Hive的数据类型
- 基本数据类型
- 复杂数据类型
加载数据
外部表
内部表
分区表
- 特别强调：
- 作用：
分桶表
- 特别强调：
- - 分桶逻辑：
  - 分桶的作用和好处

Hive的数据类型

基本数据类型

INT BIGINT FLOAT DOUBLE DEICIMAL STRING VARCHAR CHAR BINARY TIMESTAMP DATE INTERVAL ARRAY

复杂数据类型

MAP STRUCT UNION

create table stu3 as select * from stu2; 复制数据复试表结构
create table stu4 like stu2; 不复制数据复试表结构

加载数据

从linux中加载数据到hive
load data local inpath ‘数据路径’ into table 表名；
从hdfs中加载数据到hive,并覆盖
load data inpath ‘数据路径’ overwrite into table 表名；

外部表

create external table techer (t_id string,t_name string) row format delimited fields terminated by ‘\t’;

加载数据
load data local inpath ‘/export/servers/hivedatas/techer .csv’ into table techer ;
在hdfs查看表中的数据
hadoop fs -ls /user/hive/warehouse/myhive.db/techer
在hive中查询
select * from techer
删除数据表techer
drop table techer;
再次查看
hadoop fs -ls /user/hive/warehouse/myhive.db/techer（数据依然存在）

内部表

create table student(t_id string,t_name string) row format delimited fields terminated by ‘\t’;
加载数据
load data local inpath ‘/export/servers/hivedatas/student .csv’ into table student;

在hdfs查看表中的数据
hadoop fs -ls /user/hive/warehouse/myhive.db/student
在hive中查询
select * from student
删除数据表techer
drop table student;
再次查看
hadoop fs -ls /user/hive/warehouse/myhive.db/student（数据不存在）

分区表

企业常见的分区规则：按天进行分区（一天一个分区）

创建分区表的语句
create table score(s_id string,c_id string,s_score int) partitioned by (month string) row format delimitedfieldsterminated by ‘\t’;

create table score2 (s_id string,c_id string,s_score int) partitioned by (year string,month string,day string) row formatdelimited fields terminated by ‘\t’;

数据加载

load data local inpath ‘/opt/hive/score.csv’ into table score partition (month=‘201806’);

load data local inpath ‘/opt/hive/score.csv’ into table score2 partition(year=‘2018’,month=‘06’,day=‘02’);

特别强调：

*!!!分区字段绝对不能出现在数据表以有的字段中。

作用：

将数据按区域划分开，查询时不用扫描无关的数据，加快查询速度。

分桶表

是在已有的表结构之上新添加了特殊的结构/

开启hive的桶表功能
set hive.enforce.bucketing=true;
设置桶(reduce)的个数
set mapreduce.job.reduces=3;
建分桶表
create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by ‘\t’;
创建基本表
create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by ‘\t’;
基本表添加数据
load data local inpath ‘/export/servers/hivedatas/course.csv’ into table course_common;
在基本表中查询数据插入到分桶表
insert overwrite table course select * from course_common cluster by(c_id);
确认分桶内的数据
[root@node01 hive]# hadoop fs -cat /user/hive/warehouse/course/000000_0 03 英语 03 [root@node01
hive]# hadoop fs -cat /user/hive/warehouse/course/000001_0 01 语文 02 [root@node01 hive]# hadoop fs -
cat /user/hive/warehouse/course/000002_0 02 数学 01

特别强调：

分桶字段必须是表中的字段。

分桶逻辑：

对分桶字段求哈希值，用哈希值与分桶的数量取余，余几，这个数据就放在那个桶内。

分桶的作用和好处

1、对于join的需求，能够起到优化加速的作用。（前提是，join字段设置为分桶字段）
2、用于数据取样（获取/提取数据样本）
将数据编号作为分桶字段。这样每个桶内各种“级别”的数据都有

不会敲代码的小力

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive基本介绍(3)

Hive基本介绍3Hive的数据类型基本数据类型复杂数据类型加载数据外部表内部表分区表特别强调：作用：分桶表特别强调：分桶逻辑：分桶的作用和好处Hive的数据类型基本数据类型INT BIGINT FLOAT DOUBLE DEICIMAL STRING VARCHAR CHAR BINARY TIMESTAMP DATE INTERVAL ARRAY复杂数据类型MAP STRUCT UNIONcreate table stu3 as select * from stu2; 复制数据复试表结构c
复制链接

扫一扫

专栏目录