Hive实验2:Hive的复杂数据类型、内置函数与自定义函数

最新推荐文章于 2023-08-07 10:28:22 发布

所念皆星河115

最新推荐文章于 2023-08-07 10:28:22 发布

阅读量2.2k

点赞数 2

文章标签： hive hadoop big data

本文链接：https://blog.csdn.net/qq_60980343/article/details/122072661

版权

本文档详述了一次Hive实验，涵盖Array、Map、Struct数据类型的使用，包括创建表、导入数据及各种操作，如查询长度、是否包含特定元素等。此外，还探讨了Hive的内置函数应用，如数学、字符、聚合函数，并介绍了自定义UDF的创建和使用方法。

摘要由CSDN通过智能技术生成

二. 【实验步骤】Array数据类型的使用

5）查询name为lisi的所有成绩，并用UDTF把结果变成多行

6）使用下标访问map：查询所有学员的数学成绩（map类型的数据可以通过'列名['key']的方式访问）

7）使用if判断结果

8）求所有学员数学成绩的平均分

9）map_values(map字段名)：取map字段的全部value值

10）size()查看map长度即有多少键值对

四. 【实验步骤】Struct数据类型的使用

3）struct类型的字段通过“字段名.成员名”访问具体每一项。

六.hive自定义函数UDF(User Defined Function)

一. 【实验准备】

1.启动Hadoop集群

参考《Hadoop安装部署》实验,安装和启动hadoop集群

2.安装Hive

参考《Hive安装部署》实验，安装Hive。

在master节点上输入如下命令启动 Hive服务：

hive --service hiveserver2 &

接着在master节点上输入如下命令进入beeline终端并连接Hive:

beeline
!connect jdbc:hive2://localhost:10000

输入账号/密码：root/root

3.工作目录

本实验的工作目录为/course/hive/hive_type，在master节点上使用以下命令创建和初始化工作目录：

mkdir -p /course/hive/hive_type
cd /course/hive/hive_type

4.解压Hive

桌面右键打开终端，使用如下命令将Hive解压到~/dev目录下：

cd ~/dev
cp /cgsrc/apache-hive-1.2.1-bin.tar.gz ./
tar -zxvf apache-hive-1.2.1-bin.tar.gz
rm -rf apache-hive-1.2.1-bin.tar.gz

二. 【实验步骤】Array数据类型的使用

1.创建数据库表

在hive console上执行如下命令创建表array_test:

create table array_test(  
    name string,  
    score array<int>  
)row format delimited fields terminated by '@'  
collection items terminated by ',';

注意：需要指明分隔符

2.导入数据

在桌面右键打开终端，使用如下命令进入master节点：

docker exec -it --privileged master bash

在目录/course/hive/hive_type下创建array_test.txt文件,并写入如下数据:

zhangshan@89,88,97,80
lisi@90,95,99,97
wangwu@90,77,88,79
zhaoliu@91,79,98,89
zhuba@89,99,98,90

在hive console上执行如下命令将master节点上的array_test.txt导入到表array_test中：

load data local inpath '/course/hive/hive_type/array_test.txt' into table array_test;

3.操作表

1）查询表：

select * from array_test;

2）访问array类型的数据

array类型的数据可以通过'数组名[index]'的方式访问，index从0开始：查询表中array数据类型字段的指定列或满足条件的列

select score[1] from array_test where score[0]>90;

3）查询array数据类型字段的长度

select size(score) from array_test;

4）查看表结构

desc array_test;

5）查询array中是否包含某个元素

array_contains()：在字段类型为array中查找是否包含以及不包含某元素，在where后使用查询分数包含90分的学生的姓名

select name from array_test where array_contains(score,90);

查询分数中不包含59分的学生的姓名

select name from array_test where !array_contains(score,59);

三. 【实验步骤】Map数据类型的使用

1.创建数据库表

在hive console上执行如下命令创建表map_test:

create table map_test(  
    name string,  
    score map<string,int>  
)row format delimited fields terminated by '@'  
collection items terminated by ','
map keys terminated by ':';

注意：需要指明分隔符,以上命令将@设置为map元素间分隔符，:设置为key与value之间的分隔符

2.导入数据

在桌面右键打开终端，使用如下命令进入master节点：

docker exec -it --privileged master bash

在目录/course/hive/hive_type下创建map_test.txt文件,并写入如下数据:

zhangshan@math:90,english:89,java:88,hive:80
lisi@math:98,english:79,java:96,hive:92
wangwu@math:88,english:86,java:89,hive:88
zhaoliu@math:89,english:78,java:79,hive:77

在hive console上执行如下命令将master节点上的map_test.txt导入到表map_test中：

load data local inpath '/course/hive/hive_type/map_test.txt' into table map_test;

3.操作表

1）查询表结构

desc map_test;

2）查询表中所有信息

select * from map_test;

3）查询map中所有信息

select score from map_test;

4）查询name为lisi的所有成绩信息

select score from map_test where name='lisi';

5）查询name为lisi的所有成绩，并用UDTF把结果变成多行

select explode(score) from map_test where name='lisi';

最低0.47元/天解锁文章

所念皆星河115

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫