【部署Spark2.4.8+hadoop2.7.3集群】

CSDN话题挑战赛第2期
参赛话题:大数据学习成长记录


Spark集群部署

项目中需要使用Spark集群,通过Spark core, Spark SQL技术完成项目中的分析需求,因此需要完成Spark集群的部署,在原有的高可用Hadoop集群的基础上完成Spark集群部署。

下载Spark,Scala安装包

Spark安装包地址:https://archive.apache.org/dist/spark/spark-2.4.8/
在这里插入图片描述

选择spark-2.4.8-bin-hadoop2.7.tgz下载安装

Scala安装包:https://scala-lang.org/download/2.11.8.html
在这里插入图片描述

选择scala-2.11.8.tgz下载安装

返回目录

集群规划

集群规划

序号IP主机别名角色集群
1192.168.137.110node1NameNode(Active),DFSZKFailoverController(ZKFC),ResourceManager,mysql,RunJar(Hive服务端-metastore),RunJar(Hive服务端-hiveserver2),MasterHadoop,Spark
2192.168.137.111node2DataNode,JournalNode,QuorumPeerMain,NodeManager,RunJar(Hive客户端,启动时有),WorkerZookeeper,Hadoop,Spark
3192.168.137.112node3DataNode,JournalNode,QuorumPeerMain,NodeManager,RunJar(Hive客户端,启动时有),WorkerZookeeper,Hadoop,Spark
4192.168.137.113node4DataNode,JournalNode,QuorumPeerMain,NodeManager,RunJar(Hive客户端,启动时有),WorkerZookeeper,Hadoop,Spark
5192.168.137.114node5NameNode(Standby),DFSZKFailoverController(ZKFC),ResourceManager,JobHistoryServer,RunJar(Hive客户端,启动时有),WorkerHadoop,Spark

返回目录

部署Scala

Spark基于Scala开发,许多应用需要使用到scala代码,因此需要部署Scala到集群每个节点

安装包上传到node1,并解压

tar -zxvf scala-2.11.8.tgz -C /opt/soft_installed/

部署Scala

# 配置环境变量
vim /etc/profile

# 配置Scala
SCALA_HOME=/opt/soft_installed/scala-2.11.8

PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/b
in:$HIVE_HOME/bin:$SCALA_HOME/bin

export PATH JAVA_HOME JRE_HOME CLASSPATH HADOOP_HOME HADOOP_LOG_DIR YARN_LOG_DIR HADOOP_CONF
_DIR HADOOP_HDFS_HOME HADOOP_YARN_HOME ZOOKEEPER_HOME HIVE_HOME SCALA_HOME

source /etc/profile

验证Scala部署

[root@master scala-2.11.8]# scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171).
Type in expressions for evaluation. Or try :help.

scala> 2022 + 1991
res0: Int = 4013

分发Scala到其他节点

对集群的其他节点进行相同操作(node2~node5)

scp -r /opt/soft_installed/scala-2.11.8 node2:/opt/soft_installed

安装包上传解压

通过mobaXterm上传spark安装包到node1节点


# 解压
tar -zxvf spark-2.4.8-bin-hadoop2.7.tgz -C /opt/soft_installed/

# 删除安装包
rm spark-2.4.8-bin-hadoop2.7.tgz -rf

返回目录

修改Spark配置

配置spark-env.sh

cd /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh

# 在spark-env.sh尾部添加如下内容
vim spark-env.sh


# 配置JDK,HADOOP
export JAVA_HOME=/opt/soft_installed/jdk1.8.0_171
export HADOOP_HOME=/opt/soft_installed/hadoop-2.7.3
export HADOOP_CONF_DIR=/opt/soft_installed/hadoop-2.7.3/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export SCALA_HOME=/opt/soft_installed/scala-2.11.8
export YARN_CONF_DIR=/opt/soft_installed/hadoop-2.7.3/etc/hadoop

# SPARK
export SPARK_MASTER_IP=node1
export SPARK_MASTER_HOST=node1
export SPARK_MASTER_PORT=7077
export SPARK_LOG_DIR=/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/logs
export SPARK_HOME=/opt/soft_installed/spark-2.4.8-bin-hadoop2.7
export SPARK_PID_DIR=${SPARK_HOME}/pids
export SPARK_MASTER_WEBUI_PORT=8099
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=19022 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://lh1/spark/history"

# SPARK WORK
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=1


配置spark-defaults.conf

cd /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/conf
cp spark-defaults.conf.template spark-defaults.conf
# 在其尾部添加
vim spark-defaults.conf

spark.master    spark://node1:7077
spark.eventLog.enabled  true
spark.eventLog.dir      hdfs://lh1/spark/history
spark.eventLog.compress true
spark.yarn.historyServer.address        node5:19022


配置slaves

cd /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/conf
cp slaves.template slaves

# 在slaves尾部添加如下内容
vim slaves
# A Spark Worker will be started on each of the machines listed below.
node2
node3
node4
node5

返回目录

配置yarn-site.xml文件

cd $HADOOP_HOME/etc/hadoop
vim yarn-site.xml



        <!-- 虚拟内存的比率,是物理内存的倍数 -->
        <property>
                <name>yarn.nodemanager.vmem-pmem-ratio</name>
                <value>4</value>
        </property>

        <!-- 由于测试环境的虚拟机内存太少, 防止将来任务被意外杀死, 做如下配置 -->
        <!-- 是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其
杀掉,默认是true -->
        <property>
                <name>yarn.nodemanager.pmem-check-enabled</name>
                <value>false</value>
        </property>
        <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀
掉,默认是true -->
        <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
        </property>

		<property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- 日志保留时间设置7天 -->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
        <!-- 日志服务配置 -->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://node5:19888/jobhistory/logs</value>
        </property>


返回目录

分发安装包

将node1上配置好的spark软件包分发到其他节点

cd /opt/soft_installed/

scp -r /opt/soft_installed/spark-2.4.8-bin-hadoop2.7 node2:`pwd`
scp -r /opt/soft_installed/spark-2.4.8-bin-hadoop2.7 node3:`pwd`
scp -r /opt/soft_installed/spark-2.4.8-bin-hadoop2.7 node4:`pwd`
scp -r /opt/soft_installed/spark-2.4.8-bin-hadoop2.7 node5:`pwd`

cd $HADOOP_HOME/etc/hadoop

scp yarn-site.xml node2:`pwd` 
scp yarn-site.xml node3:`pwd` 
scp yarn-site.xml node4:`pwd` 
scp yarn-site.xml node5:`pwd` 

返回目录

启动集群

启动spark集群需要先启动

  • hadoop集群
  • hive(如果需要用到hive)
/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/start-all.sh

查看node1节点和node5节点启动后的jps

在这里插入图片描述

返回目录

集群运行验证

web验证集群运行状态

登录http://node1:8099/
在这里插入图片描述

提交jar任务

# 提交到yarn结果客户端看不到,yarn application可以看到
# 
/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 2 \
--queue default \
/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.8.jar 1000

# client运行模式,可以直接看到运行结果
/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 2 \
--queue default \
/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.8.jar 1000
Pi is roughly 3.1450157250786255


在这里插入图片描述

spark-shell 测试

[root@yarnserver hadooptest]# /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/10 00:39:59 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://yarnserver.phlh123.cn:4040
Spark context available as 'sc' (master = yarn, app id = application_1665297503457_0018).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.8
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.


scala> sc.textFile("hdfs://lh1/wordcount/input/class19_3.txt").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect()
res0: Array[(String, Int)] = Array((大数据专业,3), ("",1), (贵州师范大学,3), (19,3), (20220908,1))

scala>

返回目录

配置一键启停脚本

[root@master scripts]# cat onekeyspark.sh
#! /bin/bash

# spark集群

case $1 in
"start")
        echo "==========  now start spark cluster  =========="
        /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/start-all.sh
        ssh node5 "source /etc/profile; /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/start-history-server.sh"
        ssh node5 "source /etc/profile ; /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/start-master.sh";;
"stop")
        echo "==========  now stop spark cluster  =========="
        /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/stop-all.sh
        ssh node5 "source /etc/profile;/opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/stop-history-server.sh"
        ssh node5 "source /etc/profile ; /opt/soft_installed/spark-2.4.8-bin-hadoop2.7/sbin/stop-master.sh";;
*)
        echo Invalid Args!
        echo 'Usage: '$(basename $0)' start|stop';;
esac

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值