HIVE 基础（二）

最新推荐文章于 2024-07-11 16:34:57 发布

快乐画饼

最新推荐文章于 2024-07-11 16:34:57 发布

阅读量595

点赞数 1

分类专栏： hive Hadoop 大数据文章标签：大数据 hadoop hive

本文链接：https://blog.csdn.net/qq_57151931/article/details/129089023

版权

大数据同时被 3 个专栏收录

47 篇文章 2 订阅

订阅专栏

Hadoop

16 篇文章 3 订阅

订阅专栏

hive

12 篇文章 2 订阅

订阅专栏

Hive数据类型 - 集合数据类型

ARRAY：存储的数据为相同类型

MAP：具有相同类型的键值对

STRUCT：封装了一组字段

类型	格式	定义	示例
ARRAY	['Apple','Orange','Mongo']	ARRAY<string>	a[0] = 'Apple'
MAP	{'A':'Apple','O':'Orange'}	MAP<string,string>	b['A'] = 'Apple'
STRUCT	{'Apple',2}	STRUCT<fruit:string,weight:int>	c.weight = 2

Hive数据结构

数据结构	描述	逻辑关系	物理存储（HFDS）
Database	数据库	表的集合	文件夹
Table	表	行数据的集合	文件夹
Partition	分区	用于分割数据	文件夹
Buckets	分桶	用于分布数据	文件
Row	行	行记录	文件中的行
Columns	列	列记录	每行中指定的位置
Views	视图	逻辑概念，可跨越多张表	不存储数据
Index	索引	记录统计数据信息	文件夹

在opt下面创建一个 employee.txt 文件，把数据加到txt文件中

数据内容：

Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

创建静态表

create table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

row format delimited 含义是：分隔符设置开始语句

fields terminated by '|' 含义是：设置字段与字段之间的分隔符为“|”
collection items terminated by ',' 含义是：设置一个复杂类型（array,struct)字段的各个item之间的分隔符为 “,”
map keys terminated by ':' 含义是：设置一个复杂类型(Map)字段的key value之间的分隔符为 “:”
lines terminated by '\n'; 含义是：设置行与行之间的分隔符为 “\n”

加载本地文件

0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee;

加载HDFS文件

这步需要先把txt文件上传到hdfs上面

0: jdbc:hive2://192.168.152.192:10000> load data inpath '/employee.txt' into table employee;

加载数据后覆盖原数据

0: jdbc:hive2://192.168.152.192:10000> load data inpath '/employee.txt' overwrite into table employee;

创建分区表

create table employee2(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

partitioned by (age int) 含义是：创建分区以age分区

分区表插入数据

0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);

0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);

查看分区表信息：

show partitions employee2;

多字段分区

create table employee3(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
partitioned by (age int , gender string)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

插入数据

0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='0');

0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='1');

数据表

数据表分为内部表和外部表

内部表（管理表）

HDFS中为所属数据库目录下的子文件夹
数据完全由Hive管理，删除表(元数据)会删除数据

外部表（External Tables）

数据保存在指定位置的HDFS路径中
Hive不完全管理数据，删除表(元数据)不会删除数据

上传数据内容

hdfs dfs -put ./employee.txt /tmp/hivedata/employee/

创建外部

create external table if not exists employee(
    name string,
    work_place array<string>,
    gender_age struct<gender:string,age:int>,
    skills_score map<string,int>,
    depart_title map<string,string>
)
row format delimited 
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';