通过Hive实现wc（词频统计）

最新推荐文章于 2022-10-24 11:18:08 发布

muyingmiao

最新推荐文章于 2022-10-24 11:18:08 发布

阅读量721

点赞数 1

分类专栏： Hive

本文链接：https://blog.csdn.net/muyingmiao/article/details/102580149

版权

Hive的词频统计主要用到了Hive的split函数和explode函数

hive (test)> desc function extended split;
OK
tab_name
split(str, regex) - Splits str around occurances that match regex
Example:
  > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;
  ["one", "two", "three"]
Time taken: 0.005 seconds, Fetched: 4 row(s)

hive (test)> desc function extended explode;
OK
tab_name
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns 
Time taken: 0.003 seconds, Fetched: 1 row(s)
hive (test)>

1.数据如下，wctest.data

hello   hello   hello
world   world
welcome

2.启动Hive，创建表并加载数据

create table IF NOT EXISTS wc(sentence string );

load  data local inpath '/home/hadoop/data/wctest.data' overwrite into table wc;

3.wc统计

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

muyingmiao

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
通过Hive实现wc（词频统计）

Hive的词频统计主要用到了Hive的split函数和explode函数hive (test)> desc function extended split;OKtab_namesplit(str, regex) - Splits str around occurances that match regexExample: > SELECT split('oneAtwo...
复制链接

扫一扫