大体流程可以参考:
http://qindongliang.iteye.com/blog/2224797
(次参考:http://blog.csdn.net/wind520/article/details/43458925)
补充细节如下:
1)
vi /etc/profile,我的配置为
JAVA_HOME='/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64'
HADOOP_HOME='/root/hadoop2.6'
SCALA_HOME='/root/scala2.10.4'
SPARK_HOME='/root/spark1.4.0'
MASTER='local-cluster[3,2,1024]' # 3-nodes cluster
2)
配置spark-env.sh,JAVA_HOME分配实际地址而不是相对地址,SPARK_MASTER_IP分配IP
export JAVA_HOME='/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64'(原文为export JAVA_HOME=$JAVA_HOME)
export SPARK_MASTER_IP=192.168.22.250(原文为SPARK_MASTER_IP=master)
最后我的配置为
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64
export SCALA_HOME=$SCALA_HOME
export HADOOP_HOME=/root/hadoop2.6
export HADOOP_CONF_DIR=/root/hadoop2.6/etc/hadoop
export SPARK_MASTER_IP=192.168.22.250
export SPARK_DRIVER_MEMORY=1G
slaves不用改
4)
启动:sbin/start-master.sh(原文为sbin/start-all.sh)
查看log,/root/spark1.4.0/logs
15/08/25 14:01:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@192.168.22.250:7077]
15/08/25 14:01:16 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
15/08/25 14:01:17 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/08/25 14:01:17 INFO server.AbstractConnector: Started SelectChannelConnector@mk-vm:6066
15/08/25 14:01:17 INFO util.Utils: Successfully started service on port 6066.
15/08/25 14:01:17 INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 6066
15/08/25 14:01:17 INFO master.Master: Starting Spark master at spark://192.168.22.250:7077
15/08/25 14:01:17 INFO master.Master: Running Spark version 1.4.0
15/08/25 14:01:17 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/08/25 14:01:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8080
15/08/25 14:01:17 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.
15/08/25 14:01:17 INFO ui.MasterWebUI: Started MasterWebUI at http://192.168.22.250:8080
15/08/25 14:01:17 INFO master.Master: I have been elected leader! New state: ALIVE
5)开启8080端口
/sbin/iptables -I INPUT -p tcp --dport 8080 -j ACCEPT
/etc/init.d/iptables save
service iptables restart
http://192.168.22.250:8080
7)
启动worker,如:sbin/start-slaves.sh park://192.168.22.250:7077
再看界面,worker增加了
8)停止、重启
./sbin/stop-master.sh
重启服务器后,重启Spark流程:
cd spark1.4.0/
sbin/start-all.sh
验证:http://192.168.22.250:8080/
9)减少INFO日志消息打印
cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties,去掉“.template”扩展名。
编辑新文件,用WARN替换代码中出现的INFO。
pyspark 输出消息将会更简略!