最近做机器学习的一个小例子,使用python实现hadoop mapreduce程序:计算一组数据的均值与方差,在跑集群时会报出如上错误:
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.1.3.jar
-input /user/*
-output /user/mr-output13
-file /python3/Mapper.py -mapper 'Mapper.py'
-file /python3/Reducer.py -reducer 'Reducer.py'
网上给的方法:Mapper.py 和 Reduce.py 的最前面要加上:#!/usr/bin/env python,这句不太明白。
最后的解决方法:
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.1.3.jar
-input /user/*
-output /user/mr-output13
-file /python3/Mapper.py -mapper "python3 Mapper.py"
-file /python3/Reducer.py -reducer "python3 Reducer.py"