Spark On YARN模式的搭建比较简单,仅仅需要在YARN集群的”一个节点”上安装Spark即可,
该节点可作为提交Spark应用程序到YARN集群的客户端。Spark本身的Master节点和Worker节点不需要启动,Spark On Yarn集群的部署不依赖Standalone集群。
一:下载scala安装包
下载地址如下:
https://www.scala-lang.org/download/2.12.7.html
执行以下命令安装scala
mkdir -p /home/hadoop/scala
解压scala-2.12.7.tgz安装包到安装目录scala
tar -zxvf ~/tools/scala-2.12.7.tgz -C /home/hadoop/scala/
配置scala的环境变量
vi ~/.bash_profile
# scala
export SCALA_HOME=/home/hadoop/scala/scala-2.12.7
export PATH=$PATH:$SCALA_HOME/bin
source ~/.bash_profile
在任意目录执行: scala -version
Scala code runner version 2.12.7 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
在任意目录执行scala进入命令行模式
scala> val str:String="yanghong"
str: String = yanghong
二:下载spark
下载地址如下:
http://spark.apache.org/downloads.html
在 Choose a Spark release 中选择自己的版本2.4.5
在 Choose a package type 中选择2.7 and later
点击 Download Spark 后面的tgz文件下载即可
执行如下命令安装spark
mkdir -p /home/hadoop/spark
解压安装包spark-2.4.5-bin-hadoop2.7.tgz到~/spark安装目录
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz -C /home/hadoop/spark/
修改 spark-env.sh配置文件,添加如下的配置
export HADOOP_HOME=/home/hadoop/hadoop-ha/hadoop/hadoop-2.8.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
修改完毕后,即可运行Spark应用程序。例如,运行Spark自带的圆周率的例子(注意请求是Hadoop的HDFS和YARN启动),并且以Spark On YARN的cluster模式运行,命令如下:
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.5.jar
日志信息如下
20/04/08 14:05:36 INFO yarn.Client: Submitting application application_1586223718291_0006 to ResourceManager
20/04/08 14:05:37 INFO impl.YarnClientImpl: Submitted application application_1586223718291_0006
20/04/08 14:05:38 INFO yarn.Client: Application report for application_1586223718291_0006 (state: ACCEPTED)
20/04/08 14:05:38 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1586325937194
final status: UNDEFINED
tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/
user: hadoop
(state: ACCEPTED)
20/04/08 14:05:58 INFO yarn.Client: Application report for application_1586223718291_0006 (state: ACCEPTED)
20/04/08 14:05:59 INFO yarn.Client: Application report for application_1586223718291_0006
20/04/08 14:06:05 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: centoshadoop3
ApplicationMaster RPC port: 33564
queue: default
start time: 1586325937194
final status: UNDEFINED
tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/
user: hadoop
20/04/08 14:06:06 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)
20/04/08 14:06:07 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)
20/04/08 14:06:26 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)
20/04/08 14:06:27 INFO yarn.Client: Application report for application_1586223718291_0006 (state: FINISHED)
20/04/08 14:06:27 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: centoshadoop3
ApplicationMaster RPC port: 33564
queue: default
start time: 1586325937194
final status: SUCCEEDED
tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/
user: hadoop
20/04/08 14:06:27 INFO util.ShutdownHookManager: Shutdown hook called
20/04/08 14:06:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-299a795c-6432-47b3-86ec-c571ed324c58
20/04/08 14:06:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6b89e823-ba7f-49b2-ae7b-2ead7a75691e
看看final status状态的变化,最终为succeeded 成功,可在该界面查看具体的执行计划