hive1基础

最新推荐文章于 2024-08-24 14:51:48 发布

罗刹海是市式市世视士

最新推荐文章于 2024-08-24 14:51:48 发布

阅读量43

点赞数

文章标签：大数据数据库 hadoop hive

本文链接：https://blog.csdn.net/berbai/article/details/132797403

版权

一、Hive数据类型 - 基本数据类型

*类似于SQL数据类型

类型	示例	类型	示例
Tinyint	10	smallint	10
int	10	double	1.234
float	1.342	binary	1010
decimal	3.14	string	‘'Book' or "Book"
char	'YES' or "YES"	varchar	'Book' or "Book"
date	'2013-01-31'	timestamp	'2020-01-31 00:13:00.345'
boolean	true	bigint	100L

二、Hive数据类型 - 集合数据类型

Array：存储的数据为相同类型

MAP：具有相同类型的键值对

Struct:封装了一组字段

类型	格式	定义	示例
array	[‘apple’,’orange’]	Array<string>	a[0]=’apple’
map	{‘a’:’apple’,’o’:’pen’}	MAP<string,string>	b[‘a’]=’apple’
struct	{‘apple’,2}	Struct<name:string,id:int>	c.id=999

三、Hive数据结构

数据结构	描述	逻辑关系	物理存储（HDFS）
database	数据库	表的集合	文件夹
table	表	行数据的集合	文件夹
partition	分区	用于分割数据	文件夹
buckets	分桶	用于分布数据	文件
row	行	行记录	文件中的行
columns	列	列记录	每行中指定的位置
views	视图	逻辑概念，可跨越多张表	不存储数据
index	索引	记录统计数据信息	文件夹

数据库（database）

show databases ;

数据表（table）
use kb23db; --使用kb23db图表

（1）以demo表为例
create table demo (id int, name string);             --创建demo表和列名id、name
insert into demo values(1,"zhangsan");             --新增/添加数据
insert into demo values(2,"lisi");                    --新增数据
insert overwrite table demo values(3,"wangwang"); --覆盖数据
show create table demo;
describe demo;
select * from demo;

（2）以demo2表修改为stu表为例
create table if not exists demo2(id int, name string);         --创建demo表和列名id、name
show create table demo2;
desc demo2;
alter table demo2 rename to stu;                            --修改demo表名为stu
alter table stu change name uname string;                  --将表名改为uname
alter table stu add columns(age int comment 'user age ');   --添加age列
alter table stu add columns(address string,emile string);    --添加多列
alter table stu replace columns(id int,uname string,age int,address string);   --全部替换
select * from stu ;

四、数据表分类

分为内部表和外部表:

内部表（管理表）	外部表（External Table）
a、Hdfs中为所属数据库目录下的子文件夹	a、数据保存在指定位置的hdfs路径中
b、数据完全由hive管理，删除表（元数据）会删除数据	b、Hive不完全管理数据，删除表（元数据）不会删除数据

（一）内部表(管理表)

use kb23hivedb;	选定数据库
create table if not exists student(	If not exists可选，如果表存在，则忽略
id int,	列出所有列和数据类型 *最后一组没有，逗号
name string,
hobbies array<string>,
address map<string,string>
)
comment ‘this is an external table’	Comment 可选
row format delimited fields terminated by ','	指定列分割符（例，，\|，^A,\001）
collection items terminated by '-'	指定集合和映射元素间的分隔符（例-，^B,\002）
map keys terminated by ':'	指定map的键和值之间的分隔符(例：)
lines terminated by '\n';	行与行之间的分隔符只能使用“\n”
这种强大的定制功能是的可以很容易的使用hive来处理那些由其他工具和各种各样的ETL（也就是数据抽取、数据转换和数据装载过程）程序产生的文件
truncate table student;	清空表数据
drop table student;	删除表
方式一 hdfs路径加载数据 ①虚拟机上传至hdfs路径：hdfs dfs -put ./student.txt /hive312/warehouse/kb23hivedb.db/student 绿色路径：为本地路径源文件红色路径：为hdfs加载路径
方式二 hdfs异目录路径加载数据 ①虚拟机上传至hdfs异路径：hdfs dfs -put ./student.txt /kb23/hadoopstu ②hive加载数据：load data inpath '/kb23/hadoopstu/student.txt' into table student; 黄色路径：为hdfs的其他路径
方式三本地加载数据 ①load data local inpath '/opt/kb23/student.txt' into table student;
select * from student;

注：hive中默认分隔符：字段:^A、(\001）、集合:^B、(\002）、映射:^C、(\003）、其他，\|-*

（二）外部表(External Table)

create external table if not exists student_external(
    id int,
    name string,
    hobbies array<string>,
    address map<string,string>
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n'
location '/opt/kb23/hadoopstu/stufile';
select * from student_external;
show create table student_external;
show tables;
drop table if exists student_external;

五、Hive分区（Patition）

（一）分区主要用于提高性能

分区列的值将表划分为一个个的文件
查询时语法使用“分区”列和常规列类似
查询时hive会从指定分区查询数据，提高查询效率

分为静态分区和动态分区
定义分区：patitioned by()

（1）静态分区

create table student2(
    id int,
    name string,
    hobbies array<string>,
    address map<string,string>
)
partitioned by (age int)   --单分区表
partitioned by (age int, gender string，、、、) --多分区表

row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

（2）静态分区操作

--添加分区add

alter table student2 add partition (age =20); --单分区表
alter table student2 add partition (gender="man",age =20); --多分区表

--删除分区drop

alter table student2 drop partition (age =20); --单分区表
alter table student2 drop partition (gender="man",age =20); --多分区表

--查看有多少分区

show partitions student2;

--加载数据时，指定分区字段值

load data local inpath '/opt/kb23/student.txt' into table student2 partition( age = 20 );

--加载数据到多级分区表中

load data local inpath '/opt/kb23/student.txt' into table student2 partition( age = 20 ,gender="man");

（3）动态分区表

create table studenttp(
id int,
name string,

age int,

gender string,
hobbies array<string>,
address map<string,string>
)

partitioned by (age int) --单分区表
partitioned by (age int, gender string，、、、) --多分区表
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';

*修改数据student.txt添加age和gender两个属性*

（4）动态分区操作

koad加载数据

load data local inpath '/opt/kb23/student.txt' into table studenttp;

Set设定动态分区的属性

set hive.exec.dynamic.partition=true; -- 开启动态分区
set hive.exec.dynamic.partition.mode=nonstrict; -- 默认 strict，设置分区模式为非严格模式

Insert方式添加动态分区数据

insert into studenttp1 partition (age, gender)
select id,name,hobbies,address,gender from studenttp;

select查询数据

select id,name,hobbies,address,gender from studenttp;

select id,name,hobbies,address,gender from studenttp ;

六、Hive建表高阶语句-ctas and withws

Ctas-临时表

创建临时表并查看数据，方式一-（ctas）

create temporary table tmp_employee as select name,work_place from employee_external;

show tables;

select * from employee_external where name='Will'

union

select * from employee_external where gender_age.gender='Male'

union

select * from employee_external where gender_age.gender='Female';

创建临时表并查看数据，方式二CTE-ctas公用表表达式（CTAS with Common Table Expression）

create temporary table ctas_employee as

with

r1 as (select * from employee_external where name='Will'),

r2 as (select * from employee_external where gender_age.gender='Male'),

r3 as (select * from employee_external where gender_age.gender='Female')

select * from r1 union select * from r2 union select * from r3;

select * from ctas_employee;

七、Hive语句-like and rlike

like

匹配后面字段

select name,brithdate from fiftysql_student where name like '%四';

匹配前面字段

select name,brithdate from fiftysql_student where name like '李%';

匹配前/后字段

select name,brithdate from fiftysql_student where name like '%李%';

rlike

匹配名字中有李或者兰字的名字1---rlike写法

select name,brithdate from fiftysql_student where name rlike '.*(李|兰).*';

匹配名字中有李或者兰字的名字2---like写法

select name,brithdate from fiftysql_student where name like '%李%' or name like '%兰%';

注：

（.）：表示和任意的字符匹配

（*）：表示匹配零次到无数次

（李|兰）：表达式（x|y）表示和x或者y匹配

八、Hive其语句

列一：case

-- xm1 1 0
-- xm2 1 0
select name,gender,
       case when gender="boy" then 1 else 0 end as man,
       case when gender="girl" then 1 else 0 end as woman,
       case when gender="boy" or gender="girl" then 1 else 0 end as tag
from studenttp;

列二：in、exists

in 小数据表使用
exists 为真继续执行，为假不执行，大数据表使用
select * from A where id=1 and age>20 and exists (select * from A where name='zhangsan');

列三：split

select split(line,' ') as word from docs;

罗刹海是市式市世视士

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive1基础

这种强大的定制功能是的可以很容易的使用hive来处理那些由其他工具和各种各样的ETL（也就是数据抽取、数据转换和数据装载过程）程序产生的文件。指定集合和映射元素间的分隔符（例-，^B,\002）匹配名字中有李或者兰字的名字1---rlike写法。匹配名字中有李或者兰字的名字2---like写法。b、数据完全由hive管理，删除表（元数据）b、Hive不完全管理数据，删除表（元数据）--创建demo表和列名id、name。--创建demo表和列名id、name。分区列的值将表划分为一个个的文件。
复制链接

扫一扫