准备工作
spark集群(standalone模式)的搭建比较简单,搭建环境:
5台CentOs6.5虚拟机:client,node01,node02,node03,node04
集群规划:
前置工作:各个节点配置好JDK,本次搭建使用的jdk版本:
[root@node01 ~]# java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[root@node01 ~]#
配置网络,配置免密登录,同步各节点时间等都是搭建spark集群的前置操作,没有做这些配置的可以看这篇博客中的前六步:https://blog.csdn.net/Chris_MZJ/article/details/83033471
搭建步骤:
1、解压
将spark-1.6.3-bin-hadoop2.6.tgz解压到/opt/software/spark/下
[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# tar -xvf spark-1.6.3-bin-hadoop2.6.tgz
2、改名
将解压好的spark安装包改名为spark-1.6.3:
[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6 spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# mv spark-1.6.3-bin-hadoop2.6 spark-1.6.3
[root@node01 spark]# ls
spark-1.6.3 spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]#
3、修改配置文件
1)进入spark-1.6.3的conf目录,修改slave.template —>slaves,并添加spark集群的woeker节点为:node02 node03 node04,注意是每个节点都要换行,节点后面不能加空格:
[root@node01 conf]# ls
docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.sh.template
[root@node01 conf]# mv slaves.template slaves
[root@node01 conf]# ls
docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves spark-defaults.conf.template spark-env.sh.template
[root@node01 conf]#
[root@node01 conf]# vim slaves
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# A Spark Worker will be started on each of the machines listed below.
node02
node03
node04
2)在spark-1.6.3/conf/下修改spark-env.sh.template—>spark-env.sh,并添加以下配置信息:
export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker
[root@node01 conf]# mv spark-env.sh.template spark-env.sh
[root@node01 conf]# ls
docker.properties.template fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves spark-defaults.conf.template spark-env.sh
[root@node01 conf]# vim spark-env.sh
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker
4、发送配置文件
将配置好的spark-1.6.3发送到其他四个节点
[root@node01 spark]# scp -r spark-1.6.3 root@client:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node02:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node03:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node04:`pwd`
5、配置环境变量
配置环境变量是为了方便在任何目录下都可以操作spark集群,配置之前一定要修改spark-1.6.3/sbin/下start-all.sh这个命令,这是为了防止和hadoop集群的start-all.sh命令冲突,这里将它改为start-spark.sh
[root@node01 sbin]# mv start-all.sh start-spark.sh
6、启动集群
使用start-spark命令启动spark集群:
[root@node01 sbin]# ./start-spark.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.master.Master-1-node01.out
node02: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
node02: failed to launch org.apache.spark.deploy.worker.Worker:
node02: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: failed to launch org.apache.spark.deploy.worker.Worker:
node03: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: failed to launch org.apache.spark.deploy.worker.Worker:
node04: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
[root@node01 sbin]#
启动的过程出现了node02: failed to launch org.apache.spark.deploy.worker.Worker…的警告信息,这个我们可以忽略它,使用jps查看进程确保集群启动起来:
[root@node01 sbin]# jps
1667 Master
1741 Jps
[root@node02 ~]# jps
1607 Worker
1645 Jps
[root@node03 ~]# jps
1632 Jps
1584 Worker
[root@node04 ~]# jps
1577 Worker
1611 Jps
集群成功启动
7、网页查看集群状态
在浏览器中输入node01的ip+8080端口查看集群状态:
8、提交任务测试**
如未配置spark的环境变量,在spark-1.6.3/bin/目录下提交任务:
[root@node01 bin]# ./spark-submit --master spark://node01:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.3-hadoop2.6.0.jar 10
提交任务成功后可以切换到浏览器node01:8080下看到以下信息:
可以看到刚才提交的任务正在running,任务执行完成后在node01可以看到结果:
这个任务是计算PI值的算法,精度是和输入参数成正比的,刚才我给的参数是10,所以精确度较低,至此,整个spark集群的standalone搭建完毕。