Spark集群搭建

最新推荐文章于 2023-04-13 11:15:00 发布

Chris_MZJ

最新推荐文章于 2023-04-13 11:15:00 发布

阅读量282

点赞数 1

分类专栏：大数据文章标签： Spark集群搭建 Standlone

本文链接：https://blog.csdn.net/Chris_MZJ/article/details/83996966

版权

大数据专栏收录该内容

12 篇文章 0 订阅

订阅专栏

准备工作

spark集群(standalone模式)的搭建比较简单，搭建环境:

5台CentOs6.5虚拟机：client，node01,node02，node03，node04

集群规划：
在这里插入图片描述

前置工作：各个节点配置好JDK,本次搭建使用的jdk版本：

[root@node01 ~]# java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[root@node01 ~]#

配置网络，配置免密登录，同步各节点时间等都是搭建spark集群的前置操作，没有做这些配置的可以看这篇博客中的前六步：https://blog.csdn.net/Chris_MZJ/article/details/83033471

搭建步骤：

1、解压

将spark-1.6.3-bin-hadoop2.6.tgz解压到/opt/software/spark/下

[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# tar -xvf spark-1.6.3-bin-hadoop2.6.tgz

2、改名

将解压好的spark安装包改名为spark-1.6.3：

[root@node01 spark]# ls
spark-1.6.3-bin-hadoop2.6  spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]# mv spark-1.6.3-bin-hadoop2.6 spark-1.6.3
[root@node01 spark]# ls
spark-1.6.3  spark-1.6.3-bin-hadoop2.6.tgz
[root@node01 spark]#

3、修改配置文件

1）进入spark-1.6.3的conf目录，修改slave.template —>slaves,并添加spark集群的woeker节点为：node02 node03 node04，注意是每个节点都要换行，节点后面不能加空格：

[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves.template  spark-defaults.conf.template  spark-env.sh.template
[root@node01 conf]# mv slaves.template slaves
[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves  spark-defaults.conf.template  spark-env.sh.template
[root@node01 conf]# 
[root@node01 conf]# vim slaves 

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
node02
node03
node04

2）在spark-1.6.3/conf/下修改spark-env.sh.template—>spark-env.sh,并添加以下配置信息：

export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker

[root@node01 conf]# mv spark-env.sh.template spark-env.sh
[root@node01 conf]# ls
docker.properties.template  fairscheduler.xml.template  log4j.properties.template  metrics.properties.template  slaves  spark-defaults.conf.template  spark-env.sh

[root@node01 conf]# vim spark-env.sh 

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
export JAVA_HOME=/usr/local/jdk1.8.0_121
SPARK_MASTER_IP=node01
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=3
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_DIR=/opt/software/spark/worker

4、发送配置文件
将配置好的spark-1.6.3发送到其他四个节点

[root@node01 spark]# scp -r spark-1.6.3 root@client:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node02:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node03:`pwd`
[root@node01 spark]# scp -r spark-1.6.3 root@node04:`pwd`

5、配置环境变量
配置环境变量是为了方便在任何目录下都可以操作spark集群，配置之前一定要修改spark-1.6.3/sbin/下start-all.sh这个命令，这是为了防止和hadoop集群的start-all.sh命令冲突，这里将它改为start-spark.sh

[root@node01 sbin]# mv start-all.sh start-spark.sh

6、启动集群
使用start-spark命令启动spark集群：

[root@node01 sbin]# ./start-spark.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.master.Master-1-node01.out
node02: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: starting org.apache.spark.deploy.worker.Worker, logging to /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
node02: failed to launch org.apache.spark.deploy.worker.Worker:
node02: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node02.out
node03: failed to launch org.apache.spark.deploy.worker.Worker:
node03: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node03.out
node04: failed to launch org.apache.spark.deploy.worker.Worker:
node04: full log in /opt/software/spark/spark-1.6.3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-node04.out
[root@node01 sbin]#

启动的过程出现了node02: failed to launch org.apache.spark.deploy.worker.Worker…的警告信息，这个我们可以忽略它，使用jps查看进程确保集群启动起来:

[root@node01 sbin]# jps
1667 Master
1741 Jps

[root@node02 ~]# jps
1607 Worker
1645 Jps

[root@node03 ~]# jps
1632 Jps
1584 Worker

[root@node04 ~]# jps
1577 Worker
1611 Jps

集群成功启动
7、网页查看集群状态
在浏览器中输入node01的ip+8080端口查看集群状态：
在这里插入图片描述
8、提交任务测试**
如未配置spark的环境变量，在spark-1.6.3/bin/目录下提交任务：

[root@node01 bin]# ./spark-submit --master spark://node01:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.3-hadoop2.6.0.jar 10

提交任务成功后可以切换到浏览器node01:8080下看到以下信息：
在这里插入图片描述
可以看到刚才提交的任务正在running，任务执行完成后在node01可以看到结果：

这个任务是计算PI值的算法，精度是和输入参数成正比的，刚才我给的参数是10，所以精确度较低，至此，整个spark集群的standalone搭建完毕。