Hive的词频统计主要用到了Hive的split函数和explode函数
hive (test)> desc function extended split;
OK
tab_name
split(str, regex) - Splits str around occurances that match regex
Example:
> SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1;
["one", "two", "three"]
Time taken: 0.005 seconds, Fetched: 4 row(s)
hive (test)> desc function extended explode;
OK
tab_name
explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
Time taken: 0.003 seconds, Fetched: 1 row(s)
hive (test)>
1.数据如下,wctest.data
hello hello hello
world world
welcome
2.启动Hive,创建表并加载数据
create table IF NOT EXISTS wc(sentence string );
load data local inpath '/home/hadoop/data/wctest.data' overwrite into table wc;
3.wc统计