Spark集群搭建

准备工作

spark集群(standalone模式)的搭建比较简单,搭建环境:

5台CentOs6.5虚拟机:client,node01,node02,node03,node04

集群规划:
在这里插入图片描述

前置工作:各个节点配置好JDK,本次搭建使用的jdk版本:

[root@node01 ~]# java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[root@node01 ~]# 

配置网络,配置免密登录,同步各节点时间等都是搭建spark集群的前置操作,没有做这些配置的可以看这篇博客中的前六步:https://blog.csdn.net/Chris_MZJ/article/details/83033471

搭建步骤:

1、解压

将spark-1.6.3-bin-hadoop2.6.tgz解压到/opt/software/spark/下

[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# tar -xvf spark-1.6.3-bin-hadoop2.6.tgz

2、改名

将解压好的spark安装包改名为spark-1.6.3:

[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6  spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# mv spark-1.6.3-bin-hadoop2.6 spark-1.6.3
[root@node01 spark]# ls
spark-1.6.3  spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# 

3、修改配置文件

1)进入spark-1.6.3的conf目录,修改slave.template —>slaves,并添加spark集群的woeker节点为:node02 node03 node04,注意是每个节点都要换行,节点后面不能加空格:

[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.sh.template
[root@node01 conf]# mv slaves.template slaves
[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves  spark-defaults.conf.template  spark-env.sh.template
[root@node01 conf]# 
[root@node01 conf]# vim slaves 

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
node02
node03
node04

2)在spark-1.6.3/conf/下修改spark-env.sh.template—>spark-env.sh,并添加以下配置信息:

export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker

[root@node01 conf]# mv spark-env.sh.template spark-env.sh
[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves  spark-defaults.conf.template  spark-env.sh

[root@node01 conf]# vim spark-env.sh 

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker

4、发送配置文件
将配置好的spark-1.6.3发送到其他四个节点

[root@node01 spark]# scp -r spark-1.6.3 root@client:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node02:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node03:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node04:`pwd`

5、配置环境变量
配置环境变量是为了方便在任何目录下都可以操作spark集群,配置之前一定要修改spark-1.6.3/sbin/下start-all.sh这个命令,这是为了防止和hadoop集群的start-all.sh命令冲突,这里将它改为start-spark.sh

[root@node01 sbin]# mv start-all.sh start-spark.sh

6、启动集群
使用start-spark命令启动spark集群:

[root@node01 sbin]# ./start-spark.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.master.Master-1-node01.out
node02: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
node02: failed to launch org.apache.spark.deploy.worker.Worker:
node02: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: failed to launch org.apache.spark.deploy.worker.Worker:
node03: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: failed to launch org.apache.spark.deploy.worker.Worker:
node04: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
[root@node01 sbin]# 

启动的过程出现了node02: failed to launch org.apache.spark.deploy.worker.Worker…的警告信息,这个我们可以忽略它,使用jps查看进程确保集群启动起来:

[root@node01 sbin]# jps
1667 Master
1741 Jps

[root@node02 ~]# jps
1607 Worker
1645 Jps

[root@node03 ~]# jps
1632 Jps
1584 Worker

[root@node04 ~]# jps
1577 Worker
1611 Jps

集群成功启动
7、网页查看集群状态
在浏览器中输入node01的ip+8080端口查看集群状态:
在这里插入图片描述
8、提交任务测试**
如未配置spark的环境变量,在spark-1.6.3/bin/目录下提交任务:

[root@node01 bin]# ./spark-submit --master spark://node01:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.3-hadoop2.6.0.jar 10

提交任务成功后可以切换到浏览器node01:8080下看到以下信息:
在这里插入图片描述
可以看到刚才提交的任务正在running,任务执行完成后在node01可以看到结果:
在这里插入图片描述
这个任务是计算PI值的算法,精度是和输入参数成正比的,刚才我给的参数是10,所以精确度较低,至此,整个spark集群的standalone搭建完毕。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值