1.
gedit wordcount_mapper.py
gedit wordcount_reducer.py
2.
chmod 775 wordcount_mapper.py
chmod 775 wordcount_reducer.py
3.上传测试文件到HDFS
hadoop fs -put testfile1 input/py
hadoop fs -put testfile2 input/py
4.用streaming方法运行
在/usr/hadoop/hadoop-2.6.0/share/hadoop/tools/lib下
hadoop jar hadoop-streaming-2.6.0.jar -input input/py -output pyout/(这个事先不能在HDFS中存在) -mapper /usr/hadoop/pytest/wordcount_mapper.py -reducer /usr/hadoop/pytest/wordcount_reducer.py
http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html#Hadoop_Streaming
12/8
更新:我终于能够用streaming跑出python结果了,
jxxy@node7:/usr/hadoop/hadoop-2.6.0/share/hadoop/tools/lib hadoop jar hadoop-streaming-2.6.0.jar -input input/py/* -output py_out6 -mapper /home/jxxy/hadoop/Wordcount_mapper.py -reducer /home/jxxy/hadoop/Wordcount_reducer.py -file /home/jxxy/hadoop/Wordcount_mapper.py -file /home/jxxy/hadoop/Wordcount_reducer.py
查来查去,首先Python的代码要求就是要对齐。。用Pycharm什么的检查一下吧。。。然后还需要对每一个位置都要了解。果然是要加上 -file指令来指定位置
12/9
今天用streaming做data join