【Spark十九】Spark on YARN部署

本文介绍了Spark在YARN上的两种部署方式:yarn-client和yarn-cluster。yarn-client模式下,Driver运行在客户端,而Executor在YARN集群中;yarn-cluster模式下,Driver作为ApplicationMaster在YARN中启动。提交Spark应用程序时,需要设置HADOOP_CONF_DIR或YARN_CONF_DIR环境变量。在yarn-client模式下,可通过http://hadoop.master:4040访问Driver状态,而在yarn-cluster模式下,Driver状态需通过YARN的Job History Server查看。
摘要由CSDN通过智能技术生成

不知不觉,已经到了Spark的第19篇博客了,这个系列很不系统,基本上是学到哪写到哪,而不是成竹在胸之后,高屋建瓴的写,这个等到对Spark有了比较深刻的理解和把握之后再来整理这些博客,毕竟刚接触Spark10天,继续!

在之前的文章中,Spark都是使用默认的伪分布式部署方式,没有从系统部署的角度去审视Spark,目前的状态是能运行Spark能跑通例子的程度,在此之前,Spark的配置文件内容是:

 

export SCALA_HOME=/home/hadoop/software/scala-2.11.4
export JAVA_HOME=/home/hadoop/software/jdk1.7.0_67

###localhost表示MASTER节点在本机?
export SPARK_MASTER=localhost

###这个表示什么意思?
export SPARK_LOCAL_IP=localhost
export HADOOP_HOME=/home/hadoop/software/hadoop-2.5.2
export SPARK_HOME=/home/hadoop/software/spark-1.2.0-bin-hadoop2.4
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native

###这个是干啥用的,如果使用Spark独立运行的话,应该不需要配置YARN相关的选项
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

Spark基于YARN的两种部署方式

  • yarn-client
  • yarn-cluster

 1. yarn-client

在这种模式下,Spark driver在客户机上运行,然后向YARN申请运行exeutor以运行Task,即Driver和YARN是分开的,Driver程序作为YARN集群的一个客户端,这是一种CS模式

 2. yarn-cluster

这种模式下,Spark driver将作为一个ApplicationMaster在YARN集群中先启动,然后再由ApplicationMaster向RM申请资源启动executor以运行Task。也就是说,在这种部署方式下,Driver程序运行在YARN集群上

 

在YARN中部署Spark应用程序时,可以使用Spark的bin/spark-submit提交Spark应用程序。在YARN上部署Spark应用程序的时候,不需要象Standalone、Mesos一样提供URL作为master参数的值,因为Spark应用程序可以在hadoop的配置文件里面获取相关的信息,所以只需要简单以yarn-cluster或yarn-client指定给master就可以了。因此,因为Spark需要从hadoop(或者具体的yarn相关的配置)的配置文件中获取相关的信息,所以需要配置环境变量HADOOP_CONF_DIR或者YARN_CONF_DIR。
所在在上面的配置再加一个配置项到conf/spark-env.sh中,同时在/etc/profile中也添加一行

 

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

yarn-client部署

1.  提交命令:

 

./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master yarn-client --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md

 

对比之前的由Spark自己管理计算资源的提交方式

 

./spark-submit --name SparkWordCount --class spark.examples.SparkWordCount --master spark://hadoop.master:7077 --executor-memory 512M --total-executor-cores 1 SparkWordCount.jar README.md

 

2. 说明:

 

2.1. 采用yarn-client方式,因为driver在客户端,所以可以通过webUI访问driver的状态,默认是http://hadoop.master:4040访问,而YARN通过http://haoop.master:8088访问。

2.2   提交一个作业,产生的日志以及整个过程貌似很复杂的样子

 

[hadoop@hadoop bin]$ sh submitSparkApplicationYarnClient.sh //yarn-client方式提交任务
Delete the HDFS output directory //删除上次执行任务时,产生的HDFS输出目录
15/01/10 07:27:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hadoop/SortedWordCountRDDInSparkApplication
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/01/10 07:27:52 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/10 07:27:52 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/10 07:27:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/01/10 07:27:53 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/01/10 07:27:53 INFO Remoting: Starting remoting
15/01/10 07:27:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 35401.
15/01/10 07:27:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:35401]
15/01/10 07:27:54 INFO spark.SparkEnv: Registering MapOutputTracker
15/01/10 07:27:54 INFO spark.SparkEnv: Registering BlockManagerMaster
15/01/10 07:27:54 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150110072754-dcdf
15/01/10 07:27:54 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/01/10 07:27:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/10 07:27:56 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-8f55f6ec-399b-4371-9ab4-d648047381c5
15/01/10 07:27:56 INFO spark.HttpServer: Starting HTTP Server
15/01/10 07:27:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/10 07:27:57 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:52196
15/01/10 07:27:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 52196.
15/01/10 07:27:57 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/01/10 07:27:58 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/01/10 07:27:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/01/10 07:27:58 INFO ui.SparkUI: Started SparkUI at http://localhost:4040
15/01/10 07:27:58 INFO spark.SparkContext: Added JAR file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/SparkWordCount.jar at http://localhost:52196/jars/SparkWordCount.jar with timestamp 1420892878400
//到此时,Spark1工作做完,将任务提交给Yarn/
15/01/10 07:28:00 INFO client.RMProxy: Connecting to ResourceManager at hadoop.master/192.168.26.136:8032 //连接ResourceManager
15/01/10 07:28:02 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers //申请资源
15/01/10 07:28:02 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/01/10 07:28:02 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead //分配一个资源单位,AM Container
15/01/10 07:28:02 INFO yarn.Client: Setting up container launch context for our AM //设置container
15/01/10 07:28:02 INFO yarn.Client: Preparing resources for our AM container 
15/01/10 07:28:03 INFO yarn.Client: Uploading resource file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> hdfs://hadoop.master:9000/user/hadoop/.sparkStaging/application_1420859110621_0002/spark-assembly-1.2.0-hadoop2.4.0.jar
把spark-assembly-1.2.0-hadoop-2.4.0.jar上传到HDFS上???
15/01/10 07:28:22 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/10 07:28:22 INFO spark.SecurityManager: Changing view acls to: hadoop
15/01/10 07:28:22 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/01/10 07:28:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/01/10 07:28:22 INFO yarn.Client: Submitting application 2 to ResourceManager
任务提交
15/01/10 07:28:22 INFO impl.YarnClientImpl: Submitted application application_1420859110621_0002
15/01/10 07:28:23 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:23 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1420892902791
	 final status: UNDEFINED
	 tracking URL: http://hadoop.master:8088/proxy/application_1420859110621_0002/
	 user: hadoop

///下面这一坨是什么情况?每秒钟rport一次状态?那这得产生多少垃圾日志?
15/01/10 07:28:24 INFO yarn.Client: Application report for application_1420859110621_0002 (state: ACCEPTED)
15/01/10 07:28:26 INFO
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值