全方位揭秘！大数据从0到1的完美落地之Hive数据类型

最新推荐文章于 2024-10-02 00:04:29 发布

拥抱大数据

最新推荐文章于 2024-10-02 00:04:29 发布

阅读量45

点赞数

文章标签： hive 大数据 hadoop 数据仓库数据分析

本文链接：https://blog.csdn.net/weixin_42044506/article/details/130781901

版权

v2-4b22527985384cedba02ae95e6d244cc_250x0

Hive数据类型

在hive中，数据类型分为基础数据类型和复杂数据类型两大类型

数据类型

分类	类型	描述	字面量示例
基本类型	BOOLEAN	true/false	TRUE
	TINYINT	1字节的有符号整数 -128~127	1Y
	SMALLINT	2个字节的有符号整数，-32768~32767	1S
	INT	4个字节的带符号整数	1
	BIGINT	8字节带符号整数	1L
	FLOAT	4字节单精度浮点数	1.0
	DOUBLE	8字节双精度浮点数	1.0
	DEICIMAL	任意精度的带符号小数	1.0
	STRING	字符串，变长	“a”,’b’
	VARCHAR	变长字符串,要设置长度	“a”,’b’
	CHAR	固定长度字符串	“a”,’b’
	BINARY	字节数组	无法表示
	TIMESTAMP	时间戳，纳秒精度	122327493795，另一种“yyyy-MM-dd HH:mm:ss”
	DATE	日期	‘2016-03-29’
复杂类型	ARRAY	有序的的同类型的集合	array(1,2)
	MAP	key-value,key必须为原始类型，value可以任意类型	map(‘a’,1,’b’,2)
	STRUCT	字段集合,类型可以不同	struct(‘1’,1,1.0), named_stract(‘col1’,’1’,’col2’,1,’clo3’,1.0)
	UNION	在有限取值范围内的一个值	create_union(1,’a’,63)

示例:

create table if not exists datatype1(
id1 tinyint,
id2 smallint,
id3 int,
id4 bigint,
slary float,
comm double,
isok boolean,
content binary,
dt timestamp
)
row format delimited 
fields terminated by ','
;

233,12,342523,455345345,30000,60000,nihao,helloworld,2017-06-02
126,13,342526,455345346,80000,100000,true,helloworld1,2017-06-02 11:41:30

-timestamp 如果是年月日时分秒的格式，必须是写全，才能映射成功。

load data local inpath './data/datatype.txt' into table datatype1;

- 在做运算时，小范围类型都可以自动转为大范围类型做运算

复杂数据类型之array

定义格式如下：

crate table if not exists 表名(
  列名 列的数据类型,
  ........ 
  列名 array<数据类型> 
)row format delimited fields terminated by '数据分隔符' 
collection items terminated by '数组的分隔符'

说明：下标从0开始，越界不报错，以null代替

案例准备：

zhangsan	78,89,92,96
lisi	67,75,83,94
王五	23,12

create table if not exists arr1(
name string,
scores array<String>
)
row format delimited 
fields terminated by '\t'
;

drop table arr2;
create table if not exists arr2(
name string,
scores array<String>
)
row format delimited 
fields terminated by '\t'
collection items terminated by ',';

load data local inpath './data/arr1.txt' into table arr1;
load data local inpath './data/arr1.txt' into table arr2;

查询语句：

select * from arr1;
//获取一个元素使用下标 
select name,score[1] from arr2;
//下标越界不会出现报错 而是返回NULL值
select name,score[10] from arr2;
//获取数据的长度 
select size(score) from arr2; 

select name,scores[1] from arr2 where size(scores) > 3;

--统计arr2中的每个人的总成绩 --nvl函数为判断是否为空值
select scores[0]+scores[1]+nvl(scores[2],0)+nvl(scores[3],0) from arr2;

想要一种效果：也就是将数组类型的数据元素展开，换句话说，就是列转行

zhangsan	78,89,92,96
lisi	67,75,83,94
王五	23,12
将上述效果转成下面的效果，更方便统计每个人的总成绩。
zhangsan        78
zhangsan        89
zhangsan        92
zhangsan        96
lisi	67
lisi	75
lisi	83
lisi	94
王五	23
王五	12

展开函数的使用

简介

explode：
展开函数(UDTF函数中的一种),作用是：接受一个数据行，然后返回产生多个数据行
lateral(拉凑) view:虚拟表。
会将UDTF函数生成的结果放到一个虚拟表中，然后这个虚拟表会和输入行进行join来达到数据聚合的目的

上案例：

- select explode(score) score from arr2;

- select name,cj from arr2  lateral view explode(scores) mytable as cj;

- 统计每个学生的总成绩：
select name,sum(cj) as totalscore from arr2 lateral view explode(scores) mytable as cj 
group by name;

需求：向array字段中动态加载数据，不直接load加载，而是insert。

准备数据

将虚拟表中数据转换到一个实体表
create table arr_temp
as
select name,cj from arr2 lateral view explode(scores) score as cj;

借助collect_set函数，列转行函数，有去重效果。collect_list函数没有去重效果

drop table arr3;
create table if not exists arr3(
name string,
scores array<string>
)
row format delimited 
fields terminated by ' '
collection items terminated by ','
;

将数据写成array格式：
insert into arr3
select name,collect_set(cj) from arr_temp group by name;
数据: 
shangsan 78 
shangsan 89
shangsan 92
shangsan 96 
转成: shangsan [78,89,92,96]

查询每个人的最后一科的成绩
select name,scores[size(scores)-1] lastsubject from arr3;

复杂数据类型之map

定义格式如下：

create table if not exists 表名(
列名 列的数据类型, 
........ 
列名 map<数据类型,数据类型> 
)row format delimited fields terminated by '数据分隔符' 
collection items terminated by '数组的分隔符' 
map keys terminated by 'map的分隔符'

案例准备：

zhangsan	chinese:90,math:87,english:63,nature:76
lisi	chinese:60,math:30,english:78,nature:0
wangwu	chinese:89,math:25

create table if not exists map1(
name string,
score map<string,int>
)
row format delimited 
fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':'
;

load data local inpath './data/map1.txt' into table map1;

查询语句：

select * from map1; 
取出摸一个key对应的value值 map的列的名字 score[字符串key的名字] 
select name,score['chinese'] from map1; 
传入一个不存在的key会返回NULL 
select name,score['XXXXX'] from map1;
#查询数学大于35分的学生的英语和自然成绩：
select 
m.name,
m.score['english'] ,
m.score['nature']
from map1 m
where m.score['math'] > 35
;

#查看每个人的前两科的成绩总和
select 
m.name,
m.score['chinese']+m.score['math']
from map1 m;

展开查询

- 展开效果
zhangsan	chinese		90
zhangsan	math	87
zhangsan	english 	63
zhangsan	nature		76

- explode展开数据：
select explode(score) as (m_object,m_score) from map1;

- 使用lateral view explode 结合查询：
select name,m_object,m_score from map1 lateral view explode(score) score as 
m_object,m_score;


- 统计每个人的总成绩
select name,sum(m_score)
from map1 lateral view explode(score) score as 
m_object,m_score
group by name;

将数据动态写入map字段中

将下面的数据格式
zhangsan        chinese 90
zhangsan        math    87
zhangsan        english 63
zhangsan        nature  76
lisi    chinese 60
lisi    math    30
lisi    english 78
lisi    nature  0
wangwu  chinese 89
wangwu  math    25
wangwu  english 81
wangwu  nature  9
转成：
zhangsan chinese:90,math:87,english:63,nature:76
lisi chinese:60,math:30,english:78,nature:0
wangwu chinese:89,math:25,english:81,nature:9

准备数据

create table map_temp
as
select name,m_subject,m_score from map1 lateral view explode(score) t1 as m_subject,m_score;

开始写：

第一步：将科目和成绩组合在一起，concat
select name,concat(m_subject,':',m_score) as score from map_temp;

第二步: 将所有属于同一个人的数据组合在一起
select name,collect_set(concat(m_subject,":",m_score)) 
from map_temp
group by name
;

第三步:将数组变成一个字符串concat_ws
select name,concat_ws(",",collect_set(concat(m_subject,":",m_score)))
from map_temp
group by name
;

第四步:将字符串转成map 使用函数str_to_map(text, delimiter1, delimiter2)
text：是字符串
delimiter1：多个键值对之间的分隔符
delimiter2：key和value之间的分隔符

select
name,
str_to_map(concat_ws(",",collect_set(concat(m_subject,":",m_score))),',',':')
from map_temp
group by name
;

第五步：存储准备的表中
create table map2 as
select
name,
str_to_map(concat_ws(",",collect_set(concat(m_subject,":",m_score))),',',':') score
from map_temp
group by name
;

复杂数据类型 struct

简介

struct类型，类似于java编程语言中对象实例的模板，即类的结构体。如地址类型的结构体:

public class Address{
	String provinces;
	String city;
	String street;
	.......
}

使用struct类型来定义一个字段的类型，语法格式为:

create table if not exists 表名( 
  列名 列的数据类型, 
  ........ 
  列名 struct<数据名:数据类型,数据名:数据类型.....> 
)row format delimited fields terminated by '数据分隔符' 

ps:根据数据的不同添加不同方式 
collection items terminated by '数组的分隔符' 
map keys terminated by 'map的分隔符'

调用语法：

colName.subName

案例演示：

1）数据准备

zhangsan	90,87,63,76
lisi	60,30,78,0
wangwu	89,25,81,9

create table if not exists struct1(
name string,
score struct<chinese:int,math:int,english:int,natrue:int>
)
row format delimited 
fields terminated by '\t'
collection items terminated by ',';

导入数据：
load data local inpath './data/arr1.txt' into table struct1;

**2）需求：**查询数学大于35分的学生的英语和语文成绩：

select name,
score.english,
score.chinese
from str2
where score.math > 35
;

拥抱大数据

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫