hive总结

最新推荐文章于 2024-04-09 20:59:13 发布

qq_40127822

最新推荐文章于 2024-04-09 20:59:13 发布

阅读量461

点赞数

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/qq_40127822/article/details/85235103

版权

hive 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Hive基本环境：完好可用的hadoop集群（存放数据），一个数据库，一般选用mysql关系型数据库（存放元数据）。
Hive四大概念：库，表，分区，分桶
Hive数据类型，复杂数据类型。

Hive基础命令：
进入hive命令行：hive
查看所有数据库：show datavases;
创建数据库：create database log;
使用数据库：use log;
语句与sql语法基本一致。

创建一个hive数据表：
create table t_emp(
eid string,
ename string,
did string
)
row format delimited
fields terminated by ‘,’;
查看创建表信息：desc t_emp;
查看创建表的详细信息：desc extended t_emp;

Hive加载数据的方式：
1，从本地加载
(1)load data local inpath …
(2)Eg：LOAD DATA LOCAL INPATH ‘/usr/software/hive-test.txt’ INTO TABLE t_emp;
2，从hdfs加载
(1)load data inpath …
(2)Eg：LOAD DATA INPATH ‘/flume/hive-test.txt’ INTO TABLE t_emp;
3，在创建表是就指定加载数据文件的位置
(1)create table … location ‘path’
(2)Eg： create table t_emp(
> eid string,
> ename string,
> did string
> )
> row format delimited
> fields terminated by ‘,’
> location ‘/flume’;

4，先把数据文件对应一个原始表，再根据需求创建结果表的描述，再从原始表中使用insert into … select from …，做到数据和结果表的对应
(1)创建一个原始表：使用3中（2）命令。
(2)创建一个结果表：去除did字段。
(3) create table t_result(
> eid string,
> ename string
> )
> row format delimited
> fields terminated by ‘,’
(4)插入数据到结果表：
Insert into t_result select eid,ename from t_emp;

5，分区
(1)静态分区：
create table t_emp(
name string,
age string
)
partitioned by (sex string)
row format delimited
fields terminated by ‘,’;

加载数据：
LOAD DATA LOCAL INPATH ‘/usr/software/hive-test.txt’ INTO TABLE t_emp partition (sex=‘man’);

linux本地文件：
aa01,20
aa02,23
aa03,18
aa04,22
aa05,19
aa06,20

使用静态分区，在插入数据时需要人为进行数据分类，而且插入数据不能包含分区信息。
(2)动态分区：
1)创建分区表：
create table t_emp(
name string,
age string
)
partitioned by (sex string)
row format delimited
fields terminated by ‘,’;
2)启动分区
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
3)插入数据。动态分区只有在SQL执行时才能决定。所以插入语句只能是sql，即动态分区需要建立原始表和结果表。
a.创建原始表
create table t_emp01(
name string,
sex string,
age string
)
row format delimited
fields terminated by ‘,’
location ‘/flume’;
b.插入数据
insert into table t_emp
partition (sex)
select name,age,sex
from t_emp01;
分区数据再sql语句中放到最后位置。

6，分桶
(1)创建表：指定了根据age分成4个桶。
只是说明了表会分桶，具体的分区需要在导入数据时产生。最好的导入数据方式是 insert into table;分桶一般选择不会重复的数据。

create table t_emp(
name string,
sex string,
age string
)
clustered by (age) sorted by(age) into 4 buckets
row format delimited
fields terminated by ‘,’;
(2)插入数据
insert into table t_emp
select name,sex,age
from t_emp01;

7，分区分桶，结合以上即可。

qq_40127822

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hive总结

Hive基本环境：完好可用的hadoop集群（存放数据），一个数据库，一般选用mysql关系型数据库（存放元数据）。Hive四大概念：库，表，分区，分桶Hive数据类型，复杂数据类型。Hive基础命令：进入hive命令行：hive查看所有数据库：show datavases;创建数据库：create database log;使用数据库：use log;语句与sql语法基本一致。...
复制链接

扫一扫