add file word_count.py;
select transform(word) using 'python word_count.py' as word,cnt
from
(
select word
from table_a
distribute by word sort by word
) t0
word_count.py 代码如下
# coding:utf8
import sys
last_word = ''
cnt = 0
for line in sys.stdin:
word = line.strip()
if last_word != word:
if last_word != '':
sys.stdout.write(last_word + '\t' + str(cnt) + '\n')
last_word = word
cnt = 0
cnt += 1
if last_word != '':
sys.stdout.write(last_word + '\t' + str(cnt) + '\n')
通过distribute by和sort by来实现一个key由一个reduce处理并且有序