发现了一个很好的python for hadoop的入门,适合没有什么hadoop基础的人学习:http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
还有一个也是类似文档:http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2_--_Running_WordCount_in_Python
下面有两个使用hadoop streaming变成的入门篇和高级篇,感觉也挺不错的:
http://dongxicheng.org/mapreduce/hadoop-streaming-programming/
http://dongxicheng.org/mapreduce/hadoop-streaming-advanced-programming/