关于spark任务提交的几种方式

1.Spark当前支持三种集群管理方式

  • Standalone—Spark自带的一种集群管理方式,易于构建集群。
  • Apache Mesos—通用的集群管理,可以在其上运行Hadoop MapReduce和一些服务应用。
  • Hadoop YARN—Hadoop2中的资源管理器。
  • Tip1: 在集群不是特别大,并且没有mapReduce和Spark同时运行的需求的情况下,用Standalone模式效率最高。
  • Tip2: Spark可以在应用间(通过集群管理器)和应用中(如果一个SparkContext中有多项计算任务)进行资源调度。

2.Running Spark on YARN

cluster mode :

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10

client mode :

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10

spark-submit 参数详解:
在这里插入图片描述

3.区分clinet,cluster,本地模式

下图是典型的client模式:spark的drive在任务提交的本机上
在这里插入图片描述
下图是cluster模式:spark drive在yarn上
在这里插入图片描述
三种模式的比较:
在这里插入图片描述

3.调试中遇到的问题

运行如下代码:

./bin/spark-submit --class org.apache.spark.examples.SparkPi 
–master yarn 
–deploy-mode client 
–driver-memory 4g 
–executor-memory 2g 
–executor-cores 1 
lib/spark-examples*.jar 
10

打印出结果:
在这里插入图片描述

运行如下代码:

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10

程序中一直出现如下反馈:
在这里插入图片描述
经查询stackoverflow,发现是yarn的资源不够,于是减小内存继续:

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi  \
--master yarn  \
--deploy-mode cluster  \
--driver-memory 1g   \
--executor-memory 1g  \
--executor-cores 1   \
lib/spark-examples*.jar 10

出现如下日志:

19/04/08 20:46:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/08 20:46:14 INFO client.RMProxy: Connecting to ResourceManager at sparkproject/192.168.48.140:8032
19/04/08 20:46:15 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
19/04/08 20:46:15 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/04/08 20:46:15 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/04/08 20:46:15 INFO yarn.Client: Setting up container launch context for our AM
19/04/08 20:46:15 INFO yarn.Client: Setting up the launch environment for our AM container
19/04/08 20:46:15 INFO yarn.Client: Preparing resources for our AM container
19/04/08 20:46:16 INFO yarn.Client: Uploading resource file:/usr/local/spark/lib/spark-assembly-1.5.1-hadoop2.4.0.jar -> hdfs://sparkproject:9000/user/root/.sparkStaging/application_1554721157685_0005/spark-assembly-1.5.1-hadoop2.4.0.jar
19/04/08 20:46:18 INFO yarn.Client: Uploading resource file:/usr/local/spark/lib/spark-examples-1.5.1-hadoop2.4.0.jar -> hdfs://sparkproject:9000/user/root/.sparkStaging/application_1554721157685_0005/spark-examples-1.5.1-hadoop2.4.0.jar
19/04/08 20:46:21 INFO yarn.Client: Uploading resource file:/tmp/spark-9ae21a02-046a-48ee-8125-23197cb44bd5/__spark_conf__1466562735680986305.zip -> hdfs://sparkproject:9000/user/root/.sparkStaging/application_1554721157685_0005/__spark_conf__1466562735680986305.zip
19/04/08 20:46:21 INFO spark.SecurityManager: Changing view acls to: root
19/04/08 20:46:21 INFO spark.SecurityManager: Changing modify acls to: root
19/04/08 20:46:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
19/04/08 20:46:22 INFO yarn.Client: Submitting application 5 to ResourceManager
19/04/08 20:46:22 INFO impl.YarnClientImpl: Submitted application application_1554721157685_0005
19/04/08 20:46:23 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:23 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.root
         start time: 1554727582584
         final status: UNDEFINED
         tracking URL: http://sparkproject:8088/proxy/application_1554721157685_0005/
         user: root
19/04/08 20:46:24 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:25 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:26 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:28 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:29 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:30 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:31 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:32 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:34 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:35 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:36 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:37 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:38 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:39 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:40 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:41 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:42 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:43 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:44 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:45 INFO yarn.Client: Application report for application_1554721157685_0005 (state: ACCEPTED)
19/04/08 20:46:46 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:46 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.48.140
         ApplicationMaster RPC port: 0
         queue: root.root
         start time: 1554727582584
         final status: UNDEFINED
         tracking URL: http://sparkproject:8088/proxy/application_1554721157685_0005/
         user: root
19/04/08 20:46:47 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:48 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:49 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:50 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:51 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:52 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:53 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:54 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:55 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:56 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:57 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:58 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:46:59 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:00 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:01 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:02 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:03 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:04 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:05 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:06 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:07 INFO yarn.Client: Application report for application_1554721157685_0005 (state: RUNNING)
19/04/08 20:47:08 INFO yarn.Client: Application report for application_1554721157685_0005 (state: FINISHED)
19/04/08 20:47:08 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.48.140
         ApplicationMaster RPC port: 0
         queue: root.root
         start time: 1554727582584
         final status: SUCCEEDED
         tracking URL: http://sparkproject:8088/proxy/application_1554721157685_0005/A
         user: root
19/04/08 20:47:08 INFO util.ShutdownHookManager: Shutdown hook called
19/04/08 20:47:08 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9ae21a02-046a-48ee-8125-23197cb44bd5

于是产生疑问,竟然成功执行,为什么没有打印出来pi的结果呢?
进入:yarn,查看stderr日志和stdout日志:
在这里插入图片描述
stderr日志:

19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
19/04/08 20:47:06 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
19/04/08 20:47:07 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
19/04/08 20:47:07 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.48.140:36495
19/04/08 20:47:07 INFO scheduler.DAGScheduler: Stopping DAGScheduler
19/04/08 20:47:07 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
19/04/08 20:47:07 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
19/04/08 20:47:07 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. sparkproject:45540
19/04/08 20:47:07 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. sparkproject:40228
19/04/08 20:47:07 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/04/08 20:47:07 INFO storage.MemoryStore: MemoryStore cleared
19/04/08 20:47:07 INFO storage.BlockManager: BlockManager stopped
19/04/08 20:47:07 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
19/04/08 20:47:07 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/04/08 20:47:07 INFO spark.SparkContext: Successfully stopped SparkContext
19/04/08 20:47:07 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
19/04/08 20:47:07 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
19/04/08 20:47:07 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
19/04/08 20:47:07 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
19/04/08 20:47:07 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
19/04/08 20:47:07 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
19/04/08 20:47:07 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1554721157685_0005
19/04/08 20:47:07 INFO util.ShutdownHookManager: Shutdown hook called
19/04/08 20:47:07 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1554721157685_0005/spark-715fec72-7b70-40f7-b626-b237881e7ad1

stdout日志:

Pi is roughly 3.144588

结合上面的针对client模式driver运行在本地,所以会打印出结果在本地,cluster运行结果在yarn上,所以结果会输出在yarn的stdout中而不是本地的调试日志中,这就能完美解释了。

参考文献:
https://blog.csdn.net/OiteBody/article/details/53542036
《spark快速大数据分析》
https://www.iteblog.com/archives/1223.html
https://blog.csdn.net/high2011/article/details/67637338
https://stackoverflow.com/questions/30828879/application-report-for-application-state-accepted-never-ends-for-spark-submi

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值