用Python编写一个MapReduce程序

最新推荐文章于 2024-05-04 22:53:57 发布

chuimie3724

最新推荐文章于 2024-05-04 22:53:57 发布

阅读量606

点赞数

文章标签： python 大数据操作系统

原文链接：https://my.oschina.net/wolfoxliu/blog/901912

版权

本文介绍了如何在已有的Hadoop平台上使用Python编写MapReduce程序。包括编写mapper.py和reducer.py，设置执行权限，处理文件格式问题，通过Hue上传测试文件到HDFS，以及执行hadoop jar命令来运行MapReduce任务。

摘要由CSDN通过智能技术生成

本文基于实验室已经搭建好的Hadoop平台。

参考http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

1.编写mapper.py

#!/usr/bin/python2.6
import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s %s' % (word, 1)

2.编写reducer.py

#!/usr/bin/python

import sys
from operator import itemgetter

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split(' ', 1)

    try:
        count = int(count)
    except ValueError:
        continue

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s %s' % (current_word, c