Hive案例实操入门项目-WordCount

最新推荐文章于 2023-05-14 17:57:55 发布

weixin_42948399

最新推荐文章于 2023-05-14 17:57:55 发布

阅读量433

点赞数 1

分类专栏：大数据 hive 文章标签： hive big data 数据仓库大数据

本文链接：https://blog.csdn.net/weixin_42948399/article/details/120040644

版权

14 篇文章 0 订阅

订阅专栏

1 篇文章 0 订阅

订阅专栏

在Hive中实现Wordcount
使用到的函数：

函数名	使用方法	解释
split	split(参数,“分隔符”)	用于切分数据，也就是将一串字符串按指定的分隔符切割成了一个数组
explode	explode(数组)	用于打散行的函数（将一行的数据拆分成多行，它的参数必须为map或array）。这个函数常和split()并用
coount	count(需要计数的列)	返回多条数据中参数出现的总次数

WordCount

数据Wordcount.txt

world hadoop
dog fish
hadoop hello
spark
hello world
dog fish
hadoop spark
spark world
hello world
dog fish
hadoop
spark

创建数据表

create table wordcount1(line string);

导入数据

load data local inpath '/root/wordcount.txt' into table wordcount1

当前表中的数据
在这里插入图片描述

先把数据用split函数空格切割分开
select split(line," ") from wordcount1
在这里插入图片描述

通过函数嵌套配合explode炸裂函数，可以把数组里的内容炸裂成一条一条的数据
select explode(split(line," ")) as c1 from wordcount1
在这里插入图片描述

最后再用分组和计数函数实现
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;
在这里插入图片描述

最后再保存到结果表中
方式1：自动生成表、导入数据

create table wordcount_result1 as 
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;

在这里插入图片描述

方式二：手动建表导入数据
先创建表:

create table wordcount_result2(
	word string,
	num int
);

插入数据

insert into wordcount_result2 
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;

在这里插入图片描述

关注

专栏目录