Hadoop专栏（十二）——HIVE详解上

最新推荐文章于 2024-06-27 00:44:04 发布

斯特凡今天也很帅

最新推荐文章于 2024-06-27 00:44:04 发布

阅读量310

点赞数

分类专栏：大数据 HIVE 文章标签： hive mysql 大数据

本文链接：https://blog.csdn.net/weixin_41311528/article/details/110880515

版权

大数据同时被 2 个专栏收录

85 篇文章 2 订阅

订阅专栏

HIVE

14 篇文章 0 订阅

订阅专栏

一、准备

向mysql数据库添加hive用户

grant all privileges on *.* to 'hive'@'%' identified by 'hive' with grant option;
flush privileges; 刷新权限

进入hive

beeline
!connect jdbc:hive2://hadoop101:10000

在这里插入图片描述

数据表分为内部表和外部表

内部表（管理表）

HDFS中为所属数据库目录下的子文件夹
数据完全由Hive管理，删除表（元数据）会删除数据

外部表（External Tables）

数据保存在指定位置的HDFS路径中
Hive不完全挂历数据，删除表（元数据）不会删除数据

在这里插入图片描述

Hive 建表语句

建立一个内部表

create table if not exits student(
id int,name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '存储路径'

查询表的类型

desc formatted student;
show create table student;

Hive建表语句解析

在这里插入图片描述

创建一个外部表

create external table if not exits employee_external(
name string,
work place array<string>,
sex_age struct<sex:string,age:int>,
skill_score map<string,int>,
depart_title map<string,arraylist<string>>
)
row format delimited
fields torminated by '|'
collection items terminated by ','
map keys terminated by ':'
store as textfile
location '/data/hive/employee external';

查询表信息

select * from employee_external

上传文件到目录下

hdfs dfs -put employee.dat（上传数据） 上传路径

查看完整的创表语句

show create table employee_external

查看表元数据信息

desc formattedemployee_external

Hive建表——分隔符

Hive中默认分隔符

字段：^A(\001)
集合：^B(\002)
映射：^C(\003)

在Hive中建表指定分隔符语法

--列
row format delimited
--
fields torminated by '|'
--集合
collection items terminated by ','
--map
map keys terminated by ':'

插入数据

insert into table employee2
select * from employee_external;

下载数据并查看

hdfs dfs -get  存储地址

数据如下
在这里插入图片描述

在这里插入图片描述
以OpenCSV举例

创建临时表

临时表是应用程序自动管理在复杂查询期间生成的中间数据的方法

表只对session有效，session退出后自动退出表空间位于/tmp/hive-<user_name>(安全考虑)
如果创建的临时表表名已存在，实际用的是临时表

create temporary table if not exits employee;

Hive建表高阶语句——CTAS and WITH

CTAS——as select方式建表

create table ctas_employee as select * from employee;

CTE(CTAS with Common Table Expression 通用表表达式)

create table cte_employee as
with
r1 as (select name from employee_exteral where name='Michael'),
r2 as (select name from employee_exteral where sex_age.sex='Male'),
r3 as (select name from employee_exteral where sex_age.sex='Male')
select * from r1 union all select * from r3'

like

create table employee_like like employ;

在这里插入图片描述

删除表/修改表

删除表

drop table if exits employee [purge];
--purge会直接删除（可选），否则会直接放到.Trash目录
truncate table employee; --清空数据

修改表（Alter针对元数据）
修改表名

alter table employee rename to new_employee;

修改表的属性

alter table c_employee set TBLPROPERTIES('comment'='New name,comments');

在这里插入图片描述

装载数据：LOAD

LOAD用于在Hive中移动数据

load data local inpath '填入装载数据的路径' 
overwrite into table employee; 
--加LOCAL关键字，表示原始文件位于Linux本地，执行后为拷贝数据
LOAD DATA LOCAL INPATH '路径'
OVERWRITE INTO TABLE 表名 PARTITION(year=2014,month=12);
--没有LOCAL关键字，表示文件位于HDFS文件系统中，执行后直接移动数据
LOAD DATA INPATH '/tmp/employee.txt'
OVERWRITE INTO TABLE employee_partitioned PARTITION(year=2017,month=12);

LOCAL:指定文件位于本地系统，执行后为拷贝数据
OVERWRITE:表示覆盖表中现有数据

Hive分区

分区主要用于提高性能
分区列的值将表划分为一个个的文件夹
分为静态分区和动态分区

静态分区

创建单机分区表

create table dept_partition(
deptno int,
dname string,
loc sting)
partitioned by(month string)
row format delimited fields terminated by '\t';

在这里插入图片描述
添加分区

alter table dept_partition add partition(month='201906');
alter table dept_partition add partition(month='201906')partition(month='201904');

删除分区

alter table dept_partition drop partition(month='201906');
alter table dept_partition drop partition(month='201906')partition(month='201904');

查看分区表有多少分区

show partitons dept_partition;

向分区表中导入数据

load data local inpath '/opt/datas/employee.txt'
into table employee_partition2 partition(month='202012',date='01');

创建二级分区

create table dept_partition2(
deptno int,dname string,loc string)
partitioned by(month string,day string)
row format delimited fields terminated by '\t';

加载数据到二级分区表中

load data local inpath '/文件路径/文件名'into table dept_partition2
partition(month='201905',day='13');

动态分区

在这里插入图片描述

Hive分桶

分区对应的是文件夹
分桶对应于HDFS的文件

更高的查询处理效率
使抽样（sampling）更高效
一般根据“桶列”的哈希函数将数据进行分桶

分桶只有动态分桶

SET hive.enforce.bucketing=true;

定义分桶

CLUSTERED BY(employee_id)INTO 2 BUCKETS
--分桶的列是表中已有的列分桶最好是2的n次方

必须使用INSERT方式加载数据
在这里插入图片描述

斯特凡今天也很帅

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Hadoop专栏（十二）——HIVE详解 上

一、准备

数据表分为内部表和外部表

Hive 建表语句

创建一个外部表

Hive建表——分隔符

创建临时表

Hive建表高阶语句——CTAS and WITH

删除表/修改表

装载数据：LOAD

Hive分区

静态分区

动态分区

Hive分桶

Hadoop专栏（十二）——HIVE详解上