对于命令行:$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/mahout-examples-0.12.2-job.jar org.apache.mahout.clustering.syntheticcontrol.cannopy.Job
prob1:Retring connect to server:0.0.0.0/0.0.0.0:8030.Already tried 0 time(s).
由于客户端源码的调度需要我们告诉Cluster,所以需要注入调度地址,如果没改过默认是0.0.0.0:8030;
即关键是给出yarn.resourcemanager.scheduler.address的目标ip和端口;
so,修改usr/local/hadoop/etc/hadoop/yarn-site.xml上的配置,由于是本地伪分布式服务器,在<configuration></configuration>之间增加
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
保存。
prob2:记得改为yarn模式:mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
./sbin/start-yarn.sh
./sbin/mr-jobhistory-daemon.sh start historyserver
prob3:Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/testdata
进elipse,在hdfs里对应路径新建文件夹testdata,上传synthetic_control.data即可。