Hive DDL DML及SQL操作

最新推荐文章于 2024-03-01 14:18:50 发布

zjh_746140129

最新推荐文章于 2024-03-01 14:18:50 发布

阅读量372

点赞数

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/zjh_746140129/article/details/79720985

版权

hive 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

一、Hive DDL 操作

1.数据准备：student.txt（学生）、score.txt（成绩）、subject.txt（科目）

2.创建Hive表

(1) 创建学生表：

create table student(
id int,
name string,
sex string,
age int
)
row format delimited fields terminated by ',';

(2) 创建成绩表：

create table score(
id int,
sid int,
cids array<int>,
scores array<int>
)
row format delimited fields terminated by ','
collection items terminated by ' ';

(3) 创建科目表：

create table subject(
id int,
name string
)
row format delimited fields terminated by '\t';

创建成功效果如图1、图2所示：

图1

图2

3.导入数据到Hive表

(1) 导入student.txt到student表：

load data local inpath '/home/hadoop/student.txt' into table student;

(2) 导入score.txt到score表：

load data local inpath '/home/hadoop/score.txt' into table score;

(3) 导入subject.txt到subject表：

load data local inpath '/home/hadoop/subject.txt' into table subject;

导入成功效果如图1、图2所示：

图1

图2

二、Hive DML 操作

三、Hive SQL 操作

Table（表）

Hive 中的表又分为内部表和外部表
Hive 中的每张表对应于HDFS上的一个目录，HDFS目录为：
/user/hadoop/hive/warehouse/[databasename.db]/table

Partition（分区）
Hive 中每个分区对应于HDFS上表文件夹的一个子文件夹，比如order_partition表中有一个分区event_month=2014-05，则分区的数据在hdfs中的存放目录为为/user/hadoop/hive/warehouse/[databasename.db]/order_partition/event_month=2014-05

Bucket（桶）
对指定的列计算其hash，根据hash值切分数据，目的是为了并行，每一个桶对应一个文件。比如将emp表empno列分散至10个桶中，首先对id列的值计算hash，对应hash值为0和10的数据存储的HDFS目录为：
/user/hadoop/hive/warehouse/[databasename.db]/emp/part-00000

zjh_746140129

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive DDL DML及SQL操作

一、Hive DDL 操作1.数据准备：student.txt（学生）、score.txt（成绩）、subject.txt（科目）2.创建Hive表(1) 创建学生表：create table student(id int,name string,sex string,age int)row format delimited fields terminated by ',';(2) 创...
复制链接

扫一扫