在Hive中实现Wordcount
使用到的函数:
函数名 | 使用方法 | 解释 |
---|---|---|
split | split(参数,“分隔符”) | 用于切分数据,也就是将一串字符串按指定的分隔符切割成了一个数组 |
explode | explode(数组) | 用于打散行的函数(将一行的数据拆分成多行,它的参数必须为map或array)。这个函数常和split()并用 |
coount | count(需要计数的列) | 返回多条数据中参数出现的总次数 |
WordCount
数据Wordcount.txt
world hadoop
dog fish
hadoop hello
spark
hello world
dog fish
hadoop spark
spark world
hello world
dog fish
hadoop
spark
创建数据表
create table wordcount1(line string);
导入数据
load data local inpath '/root/wordcount.txt' into table wordcount1
当前表中的数据
先把数据用split函数空格切割分开
select split(line," ") from wordcount1
通过函数嵌套配合explode炸裂函数,可以把数组里的内容炸裂成一条一条的数据
select explode(split(line," ")) as c1 from wordcount1
最后再用分组和计数函数实现
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;
最后再保存到结果表中
方式1:自动生成表、导入数据
create table wordcount_result1 as
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;
方式二:手动建表导入数据
先创建表:
create table wordcount_result2(
word string,
num int
);
插入数据
insert into wordcount_result2
select t.c1,count(t.c1)
from (select explode(split(line," ")) as c1 from wordcount1) t
group by t.c1;