命令如下:
hadoop jar /usr/local/hadoop/hadoop-streaming-0.23.6.jar \
-input /hdfs/input/path -output /hdfs/output/path \
-mapper "python mapper.py" -reducer "python reducer.py" \
-file mapper.py -file reducer.py
注意事项:
hdfs用户执行;
-input和-output为hdfs路径,且output路径应该为不存在的路径;
-mapper和-reducer中py需加python *.py
-file为必需项,将本地*.py文件打包放到集群上,供集群其他机器执行;