目录
Hive数据类型 - 集合数据类型
ARRAY:存储的数据为相同类型
MAP:具有相同类型的键值对
STRUCT:封装了一组字段
类型 | 格式 | 定义 | 示例 |
---|---|---|---|
ARRAY | ['Apple','Orange','Mongo'] | ARRAY<string> | a[0] = 'Apple' |
MAP | {'A':'Apple','O':'Orange'} | MAP<string,string> | b['A'] = 'Apple' |
STRUCT | {'Apple',2} | STRUCT<fruit:string,weight:int> | c.weight = 2 |
Hive数据结构
数据结构 | 描述 | 逻辑关系 | 物理存储(HFDS) |
---|---|---|---|
Database | 数据库 | 表的集合 | 文件夹 |
Table | 表 | 行数据的集合 | 文件夹 |
Partition | 分区 | 用于分割数据 | 文件夹 |
Buckets | 分桶 | 用于分布数据 | 文件 |
Row | 行 | 行记录 | 文件中的行 |
Columns | 列 | 列记录 | 每行中指定的位置 |
Views | 视图 | 逻辑概念,可跨越多张表 | 不存储数据 |
Index | 索引 | 记录统计数据信息 | 文件夹 |
在opt下面创建一个 employee.txt 文件,把数据加到txt文件中
数据内容:
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
创建静态表
create table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
row format delimited 含义是:分隔符设置开始语句
fields terminated by '|' 含义是:设置字段与字段之间的分隔符为“|”
collection items terminated by ',' 含义是:设置一个复杂类型(array,struct)字段的各个item之间的分隔符为 “,”
map keys terminated by ':' 含义是:设置一个复杂类型(Map)字段的key value之间的分隔符为 “:”
lines terminated by '\n'; 含义是:设置行与行之间的分隔符为 “\n”
加载本地文件
0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee;
加载HDFS文件
这步需要先把txt文件上传到hdfs上面
0: jdbc:hive2://192.168.152.192:10000> load data inpath '/employee.txt' into table employee;
加载数据后覆盖原数据
0: jdbc:hive2://192.168.152.192:10000> load data inpath '/employee.txt' overwrite into table employee;
创建分区表
create table employee2(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
partitioned by (age int)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
partitioned by (age int) 含义是:创建分区 以age分区
分区表插入数据
0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=20);
0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee2 partition(age=30);
查看分区表信息:
show partitions employee2;
多字段分区
create table employee3(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
partitioned by (age int , gender string)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
插入数据
0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='0');
0: jdbc:hive2://192.168.152.192:10000> load data local inpath '/opt/employee.txt' into table employee3 partition(age=20,gender='1');
数据表
数据表分为内部表和外部表
内部表(管理表)
- HDFS中为所属数据库目录下的子文件夹
- 数据完全由Hive管理,删除表(元数据)会删除数据
外部表(External Tables)
- 数据保存在指定位置的HDFS路径中
- Hive不完全管理数据,删除表(元数据)不会删除数据
上传数据内容
hdfs dfs -put ./employee.txt /tmp/hivedata/employee/
创建外部
create external table if not exists employee(
name string,
work_place array<string>,
gender_age struct<gender:string,age:int>,
skills_score map<string,int>,
depart_title map<string,string>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n
location '/tmp/hivedata/employee';
创建外部表要在create后面加上一个 external
location '/tmp/hivedata/employee'; 含义是:指定数据存储路径(HDFS)