Apache Hive基础

最新推荐文章于 2020-12-21 19:20:50 发布

习惯de味道

最新推荐文章于 2020-12-21 19:20:50 发布

阅读量112

点赞数 1

分类专栏： hive 文章标签： hive hsqldb

本文链接：https://blog.csdn.net/timicai/article/details/108620690

版权

hive 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Apache Hive基础

Hive的优势和特点

提供了一个简单的优化模型
HQL类SQL语法，简化MR开发
支持在不同的计算框架上运行
支持在HDFS和HBase上临时查询数据
支持用户自定义函数、格式
成熟的JDBC和ODBC驱动程序，用于ETL和BI
稳定可靠（真实生产环境）的批处理
有庞大活跃的社区

Hive体系构架

在这里插入图片描述

命令窗口模式（Hive Interface）

有两种工具：Beeline和Hive命令行（CLI）
有两种模式：命令行模式和交互模式
命令行模式
交互模式

hiveserver（hiveserver1）和beeline（hiveserver2）的区别：

hive不需要启动服务再访问
beeline需要先启动服务端，再访问客户端
beeline在查询效率上比hive高，beeline不支持update和delete

Hive数据类型

原始类型

类似于SQL数据类型
注：黑体部分是常用类型

复杂数据类型

ARRAY：存储的数据为相同类型
MAP：具有相同类型的键值对
STRUCT：封装了一组字段

Hive元数据结构

在这里插入图片描述

数据库（Database)

表的集合，HDFS中表现为一个文件夹
默认在hive.metastore.warehouse.dir属性目录下
如果没有指定数据库，默认使用default数据库

create database 库名; //创建库
show databases; //查看库
use database 库名; //使用库

在这里插入图片描述

数据表（Tables)

分为内部表和外部表
内部表（管理表）
HDFS中为所属数据库目录下的子文件夹
数据完全由Hive管理，删除表(元数据)会删除数据
外部表（External Tables）
数据保存在指定位置的HDFS路径中
Hive不完全管理数据，删除表(元数据)不会删除数据

create [external] table employee (
    name string,     //列出需要的列和数据类型
    address array<string>,
    personalInfo struct<sex:string,age:int>,
    technol map<string,int>,
    jobs map<string,string>)
row format delimited
fields terminated by '|'    //分割字段
collection items terminated by ','   //分割array
map keys terminated by ':'    //分割map
lines terminated by '\n'    //分割行
[store as textfile    //文件存储格式
location '/user/root/employee';]   //文件在HDFS中的存储路径 

desc 表名;  //查看表结构
select * from 表名;  //查看表内容

创建内部表：
在这里插入图片描述
创建外部表
上传数据到hdfs

查看表结构
在这里插入图片描述

将文档内容添加到HDFS的对应目录中
查看表内容

创建临时表

临时表是应用程序自动管理在复杂查询期间生成的中间数据的方法
表只对当前session有效，session退出后自动删除
表空间位于/tmp/hive-<user_name>(安全考虑)
如果创建的临时表表名已存在，实际用的是临时表

create temporary table tmp_table_name1 (c1 string);
create temporary table tmp_table_name2 AS..
create temporary table tmp_table_name3 LIKE..

create external table emp_hr(
name string,
id int,
phone string,
date string)
row format delimited
fields terminated by '|';

表操作

删除表

drop table 表名;  //删除表
truncate table 表名;  //清空表数据

修改表

alter table 表名 rename to 新表名;  //修改表名
alter table 表名 change 列名 新列名 新列属性; //修改列
alter table 表名 add 列名 列属性;  //增加列

在这里插入图片描述

Hive分区

分区主要用于提高性能
分区列的值将表划分为segments(文件夹)
查询时使用“分区”列和常规列类似
查询时Hive自动过滤掉不用于提高性能的分区
分为静态分区和动态分区

定义分区

create table employee_partition(
name string,
address array<string>,
info struct<gender:string,age:int>,
technol map<string,int>,
jobs map<string,string>)
partitioned by (country string,`add` string)  //定义分区
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

静态分区

alter employee_partitioned add2 partition (country='china',add='LiaoNing') partition (country='china',city='GuangZhou');  //增加分区
alter table employee_partitioned drop partition (country='china',city='GuangZhou');  //删除分区

示例：
读取本地文件插入表格进行分区

load data local inpath '/root/employee.txt'
into table employee_partition
partition (country='china',add='LiaoNing');

在这里插入图片描述

从hdfs中读取插入表格进行分区

hadoop fs -put employee.txt /opt/hive/warehouse/hivetest.db/employee    //上传本地文件到hdfs
load data inpath '/opt/hive/warehouse/hivetest.db/employee/employee.txt'    //读取hdfs中的文本
into table employee_partition                    //插入表格
partition (country='china',add='ninjing');       //进行分区

在这里插入图片描述

进入50070端口查看
在这里插入图片描述

动态分区

//使用动态分区需设定属性
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.modenonstrict;

动态分区设置方法

insert into p_test partition (person='sam') values(1,'a')
,(2,'b'),(3,'c');
insert into p_test partition (person='bob') values(4,'d')
,(5,'e'),(6,'f');

在这里插入图片描述

Hive分桶

分桶对应于HDFS中的文件
更高的查询处理效率
使抽样（sampling）更高效
根据“桶列”的哈希函数将数据进行分桶
分桶只有动态分桶
set hive.enforce.bucketing = true;
定义分桶
CLUSTERED BY (employee_id) INTO 2 BUCKETS
必须使用INSERT方式加载数据

create external table emp_bucket(
name string,
id int,
address array<string>,
personalInfo struct<sex:string,age:int>,
workAndSal map<string,int>,
jobAndRole map<string,string>)
clustered by(id) into 3 buckets
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
stored as textfile
location '/usr/test/bucket';

set hive.enforce.bucketing = true;

在这里插入图片描述

习惯de味道

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache Hive基础

Apache Hive基础Hive的优势和特点Hive体系构架命令窗口模式（Hive Interface）Hive数据类型Hive元数据结构数据库（Database)数据表（Tables)创建临时表表操作Hive分区Hive分桶Hive的优势和特点提供了一个简单的优化模型HQL类SQL语法，简化MR开发支持在不同的计算框架上运行支持在HDFS和HBase上临时查询数据支持用户自定义函数、格式成熟的JDBC和ODBC驱动程序，用于ETL和BI稳定可靠（真实生产环境）的批处理有庞大活跃的社区
复制链接

扫一扫

专栏目录