Hive小练习实现单词统计

最新推荐文章于 2022-11-14 23:52:52 发布

曦彧

最新推荐文章于 2022-11-14 23:52:52 发布

阅读量470

点赞数

原文链接：https://baike.baidu.com/item/Hadoop%E5%AE%9E%E6%88%98%EF%BC%88%E7%AC%AC2%E7%89%88%EF%BC%89

版权

笔记专栏收录该内容

15 篇文章 1 订阅

订阅专栏

su -l hadoop

#输入密码

vi word.txt #新建一个word.txt文档，作为我们的数据文件

输入一些词汇，以" "为分隔符

hello world
hello terese
hello myfriend
hello everyone

esc

:wq保存退出

hive#回到hive命令行中

create table text (line string);#创建一个text表

load data local inpath '/home/hadoop/word.txt' into table text;#将数据加载到该表中

select *from text;#查看text表

如何将其中的每行的单词进行统计呢？

先将每行文本切割成单个单词，使用split函数，得到单个单词为元素的数组，使用explode函数将数组中的每个元素生成一行，最后得到hive能直接通过group by处理的形式。

使用split函数将每行的文本切割成单个的单词。

使用explode这个函数的功能是行转列，将得到的数组中的每个元素生成一行。

select explode(split(line,' '))as word from text;

select w.word,count(*) from (select explode(split(line,' '))as word from text) as w group by w.word;

#需要使用group by对数据进行统计。

select w.word,count(*) c from (select explode(split(line,' '))as word from text) as w group by w.word order by c desc limit 3;

#降序取前三

create table count as select w.word,count(*) c from (select explode(split(line,' '))as word from text) as w group by w.word order by c desc limit 3;

#将查询结果存入另一张表中

select * from count; #查看wordcount表

参考资料：

《Hadoop实战第2版》陆嘉恒，机械工业出版社；

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Hive小练习实现单词统计

su -l hadoop#输入密码vi word.txt #新建一个word.txt文档，作为我们的数据文件输入一些词汇，以" "为分隔符hello worldhello teresehello myfriendhello everyoneesc:wq保存退出hive#回到hive命令行中create table text (line s...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。