Hive中对库和表的操作
一.前提条件
hive安装配置完成
hive安装教程链接 Hive单机版安装
二.对库的操作
- 建库 (库名hivetest)
create database hivetest;
- 删库
drop database hivetest;
- 进入库
use hivetest;
- hive对库的操作和mysql相同
三.对表的操作
- 建表语句
create table employee(
name string,
address array<string>,
personnalinfo array<string>,
technol map<string,int>,
job map<string,string>)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
- 将素材上传至根目录下
- 素材内容
Michael|Montreal,Toronto|Male,30|DB:80|Product:DeveloperLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89|Sales:Lead
- 将素材内容导入建好的表employee中
hdfs dfs -put employee.txt /opt/hive/warehouse/hivetest.db/employee
- 文件路径从网页获得: 50070端口
- 查询
select * from employee;
select name,address[0],job["Product"] from employee where job["Product"] is not null;
- 查看表结构
desc employee;
- 改变列名,并修改属性
alter table employee change personnalinfo info struct<gender:string,age:int>;
四.Hive分区(Partitions)
-
分区主要用于提高性能
分区列的值将表划分为setments(文件夹)
查询时使用分区列和常规列类似
查询时Hive自动过滤掉不用于提高性能的分区 -
分为静态分区和动态分区
-
Hive分区操作
create table employee_partition(
name string,
address array<string>,
info struct<gender:string,age:int>,
technol map<string,int>,
jobs map<string,string>)
partitioned by (country string,add string)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';
- 将根目录下的employee.txt文件导入employee_partition表中
load data local inpath '/root/employee.txt' into table employee_partition partition (country='china',add='liaoning');
load data inpath '/opt/hive/warehouse/hivetest.db/employee/employee.txt' into table employee_partition partition (country="china",add="nanjing");
select * from employee_partition where add="nanjing";