示例数据
hello word
hello python
map函数:mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print "%s\t%s" % (word, 1)
reduce函数:reducer.py
# -*-coding:utf-8-*-
import sys
from operator import itemgetter
wordcount = {}
for line in sys.stdin:
line = line.strip()
word,count = line.split('\t',2)
count = int(count)
if wordcount.has_key(word):
wordcount[word] += count
else:
wordcount[word] = count
word2count = sorted(wordcount.items(),key=itemgetter(0))
for word,count in word2count:
print "%s\t%s" %(word,count)
输出结果:
hello 2
word 1
python 1