SparkCore(8):Spark Standalone和OnYarn两种模式提交命令实例

68 篇文章 0 订阅
18 篇文章 0 订阅

一、实现功能

二、前提

三、standalone模式

1.前提

2.命令

2.1 client模式

2.2 cluster:提交端口是REST server

四、yarn模式

1.前提

2.命令

2.1 client模式

2.2 cluster模式


一、实现功能

通过将spark的Standalone和OnYarn两种模式集中对比,更容易区分两者区别。

二、前提

1.Standalone和OnYarn两种模式的具体调度流程,可以参考博客

https://blog.csdn.net/u010886217/article/details/101377596

2.本文使用代码是源自:https://blog.csdn.net/u010886217/article/details/83317722,中代码打包至服务器

 

三、standalone模式

1.前提

(1)启动hdfs

(2)启动spark服务(核心!)

sbin/start-all.sh

(3)如果没有指定,则需要查看log,获得提交任务在哪个端口(因为是随机的):

[root@hadoop logs]# cat /opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/logs/spark-root-org.apache.spark.deploy.master.Master-1-hadoop.out
Spark Command: /opt/jdk1.8.0_151/bin/java -cp /opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/conf/:/opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/jars/*:/opt/modules/cdh5.7.0/hadoop-2.6.0-cdh5.7.0/etc/hadoop/ -Xmx1g org.apache.spark.deploy.master.Master --host hadoop --port 7077 --webui-port 8080
========================================
19/11/14 12:50:45 INFO Master: Started daemon with process name: 4774@hadoop
19/11/14 12:50:45 INFO SignalUtils: Registered signal handler for TERM
19/11/14 12:50:45 INFO SignalUtils: Registered signal handler for HUP
19/11/14 12:50:45 INFO SignalUtils: Registered signal handler for INT
19/11/14 12:50:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/14 12:50:47 INFO SecurityManager: Changing view acls to: root
19/11/14 12:50:47 INFO SecurityManager: Changing modify acls to: root
19/11/14 12:50:47 INFO SecurityManager: Changing view acls groups to: 
19/11/14 12:50:47 INFO SecurityManager: Changing modify acls groups to: 
19/11/14 12:50:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/11/14 12:50:48 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
19/11/14 12:50:48 INFO Master: Starting Spark master at spark://hadoop:7077
19/11/14 12:50:48 INFO Master: Running Spark version 2.1.0
19/11/14 12:50:49 WARN Utils: Service 'MasterUI' could not bind on port 8080. Attempting port 8081.
19/11/14 12:50:49 INFO Utils: Successfully started service 'MasterUI' on port 8081.
19/11/14 12:50:49 INFO MasterWebUI: Bound MasterWebUI to hadoop, and started at http://192.168.0.8:8081
19/11/14 12:50:49 INFO Utils: Successfully started service on port 6066.
19/11/14 12:50:49 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
19/11/14 12:50:50 INFO Master: I have been elected leader! New state: ALIVE
19/11/14 12:50:54 INFO Master: Registering worker 192.168.0.8:58064 with 1 cores, 1388.0 MB RAM

查看 'MasterUI' on port 8081.访问http://192.168.0.8:8081/获得Spark master的提交任务端口7077:

2.命令

2.1 client模式

date=`date +"%Y%m%d%H%M"`
/opt/modules/spark-2.1.0-bin-2.7.3/bin/spark-submit \
--master spark://bigdata.ibeifeng.com:7070 \
--deploy-mode client \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/datas/lib/scalaProjectMaven.jar

-》基于wc-hadoop01环境

date=`date +"%Y%m%d%H%M"`
/opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/bin/spark-submit \
--master spark://hadoop:7077 \
--deploy-mode client \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/project/scalaproject/scalaProjectMaven.jar

wc-hadoop01运行结果:

19/11/14 18:18:08 INFO SparkContext: Running Spark version 2.1.0
19/11/14 18:18:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/14 18:18:09 INFO SecurityManager: Changing view acls to: root
19/11/14 18:18:09 INFO SecurityManager: Changing modify acls to: root
19/11/14 18:18:09 INFO SecurityManager: Changing view acls groups to: 
19/11/14 18:18:09 INFO SecurityManager: Changing modify acls groups to: 
19/11/14 18:18:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/11/14 18:18:09 INFO Utils: Successfully started service 'sparkDriver' on port 41408.
19/11/14 18:18:09 INFO SparkEnv: Registering MapOutputTracker
19/11/14 18:18:09 INFO SparkEnv: Registering BlockManagerMaster
19/11/14 18:18:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/11/14 18:18:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/11/14 18:18:10 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b70893fc-caa4-4500-9e99-11aaa41e8cfe
19/11/14 18:18:10 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
19/11/14 18:18:10 INFO SparkEnv: Registering OutputCommitCoordinator
19/11/14 18:18:10 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/14 18:18:10 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/11/14 18:18:10 INFO SparkUI: Bound SparkUI to hadoop, and started at http://192.168.0.8:4041
19/11/14 18:18:10 INFO SparkContext: Added JAR file:/opt/project/scalaproject/scalaProjectMaven.jar at spark://192.168.0.8:41408/jars/scalaProjectMaven.jar with timestamp 1573726690750
19/11/14 18:18:10 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://hadoop:7077...
19/11/14 18:18:11 INFO TransportClientFactory: Successfully created connection to hadoop/192.168.0.8:7077 after 68 ms (0 ms spent in bootstraps)
19/11/14 18:18:11 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20191114181811-0000
19/11/14 18:18:11 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40505.
19/11/14 18:18:11 INFO NettyBlockTransferService: Server created on 192.168.0.8:40505
19/11/14 18:18:11 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/11/14 18:18:11 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.8, 40505, None)
19/11/14 18:18:11 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.8:40505 with 413.9 MB RAM, BlockManagerId(driver, 192.168.0.8, 40505, None)
19/11/14 18:18:11 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.8, 40505, None)
19/11/14 18:18:11 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.8, 40505, None)
19/11/14 18:18:11 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20191114181811-0000/0 on worker-20191114125053-192.168.0.8-58064 (192.168.0.8:58064) with 1 cores
19/11/14 18:18:11 INFO StandaloneSchedulerBackend: Granted executor ID app-20191114181811-0000/0 on hostPort 192.168.0.8:58064 with 1 cores, 1024.0 MB RAM
19/11/14 18:18:12 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20191114181811-0000/0 is now RUNNING
19/11/14 18:18:12 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
19/11/14 18:18:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 225.8 KB, free 413.7 MB)
19/11/14 18:18:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 21.3 KB, free 413.7 MB)
19/11/14 18:18:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.8:40505 (size: 21.3 KB, free: 413.9 MB)
19/11/14 18:18:14 INFO SparkContext: Created broadcast 0 from textFile at Wordcount_product.scala:34
19/11/14 18:18:16 INFO FileInputFormat: Total input paths to process : 1
19/11/14 18:18:16 INFO SparkContext: Starting job: sortByKey at Wordcount_product.scala:36
19/11/14 18:18:17 INFO DAGScheduler: Registering RDD 3 (map at Wordcount_product.scala:35)
19/11/14 18:18:17 INFO DAGScheduler: Got job 0 (sortByKey at Wordcount_product.scala:36) with 2 output partitions
19/11/14 18:18:17 INFO DAGScheduler: Final stage: ResultStage 1 (sortByKey at Wordcount_product.scala:36)
19/11/14 18:18:17 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
19/11/14 18:18:17 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
19/11/14 18:18:17 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at Wordcount_product.scala:35), which has no missing parents
19/11/14 18:18:17 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 413.7 MB)
19/11/14 18:18:17 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 413.7 MB)
19/11/14 18:18:17 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.8:40505 (size: 2.8 KB, free: 413.9 MB)
19/11/14 18:18:17 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:996
19/11/14 18:18:17 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at Wordcount_product.scala:35)
19/11/14 18:18:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
19/11/14 18:18:18 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.0.8:35601) with ID 0
19/11/14 18:18:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.0.8, executor 0, partition 0, PROCESS_LOCAL, 6072 bytes)
19/11/14 18:18:18 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.8:59810 with 413.9 MB RAM, BlockManagerId(0, 192.168.0.8, 59810, None)
19/11/14 18:18:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.8:59810 (size: 2.8 KB, free: 413.9 MB)
19/11/14 18:18:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.8:59810 (size: 21.3 KB, free: 413.9 MB)
19/11/14 18:18:22 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.0.8, executor 0, partition 1, PROCESS_LOCAL, 6072 bytes)
19/11/14 18:18:22 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3699 ms on 192.168.0.8 (executor 0) (1/2)
19/11/14 18:18:22 INFO DAGScheduler: ShuffleMapStage 0 (map at Wordcount_product.scala:35) finished in 4.750 s
19/11/14 18:18:22 INFO DAGScheduler: looking for newly runnable stages
19/11/14 18:18:22 INFO DAGScheduler: running: Set()
19/11/14 18:18:22 INFO DAGScheduler: waiting: Set(ResultStage 1)
19/11/14 18:18:22 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 233 ms on 192.168.0.8 (executor 0) (2/2)
19/11/14 18:18:22 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
19/11/14 18:18:22 INFO DAGScheduler: failed: Set()
19/11/14 18:18:22 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at sortByKey at Wordcount_product.scala:36), which has no missing parents
19/11/14 18:18:22 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.1 KB, free 413.7 MB)
19/11/14 18:18:22 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 413.7 MB)
19/11/14 18:18:22 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.0.8:40505 (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:22 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:996
19/11/14 18:18:22 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at sortByKey at Wordcount_product.scala:36)
19/11/14 18:18:22 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
19/11/14 18:18:22 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 192.168.0.8, executor 0, partition 0, NODE_LOCAL, 5818 bytes)
19/11/14 18:18:22 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.0.8:59810 (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:22 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.0.8:35601
19/11/14 18:18:22 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 bytes
19/11/14 18:18:22 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 192.168.0.8, executor 0, partition 1, NODE_LOCAL, 5818 bytes)
19/11/14 18:18:22 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 352 ms on 192.168.0.8 (executor 0) (1/2)
19/11/14 18:18:22 INFO DAGScheduler: ResultStage 1 (sortByKey at Wordcount_product.scala:36) finished in 0.434 s
19/11/14 18:18:22 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 115 ms on 192.168.0.8 (executor 0) (2/2)
19/11/14 18:18:22 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
19/11/14 18:18:22 INFO DAGScheduler: Job 0 finished: sortByKey at Wordcount_product.scala:36, took 6.205353 s
19/11/14 18:18:23 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
19/11/14 18:18:23 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
19/11/14 18:18:23 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
19/11/14 18:18:23 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
19/11/14 18:18:23 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
19/11/14 18:18:23 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
19/11/14 18:18:23 INFO SparkContext: Starting job: saveAsTextFile at Wordcount_product.scala:46
19/11/14 18:18:23 INFO DAGScheduler: Registering RDD 5 (map at Wordcount_product.scala:36)
19/11/14 18:18:23 INFO DAGScheduler: Got job 1 (saveAsTextFile at Wordcount_product.scala:46) with 1 output partitions
19/11/14 18:18:23 INFO DAGScheduler: Final stage: ResultStage 4 (saveAsTextFile at Wordcount_product.scala:46)
19/11/14 18:18:23 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
19/11/14 18:18:23 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 3)
19/11/14 18:18:23 INFO DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[5] at map at Wordcount_product.scala:36), which has no missing parents
19/11/14 18:18:23 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.0.8:40505 in memory (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:23 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.0.8:59810 in memory (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:23 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.0 KB, free 413.7 MB)
19/11/14 18:18:23 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.4 KB, free 413.7 MB)
19/11/14 18:18:23 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.0.8:40505 (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:23 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:996
19/11/14 18:18:23 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[5] at map at Wordcount_product.scala:36)
19/11/14 18:18:23 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
19/11/14 18:18:23 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, 192.168.0.8, executor 0, partition 0, NODE_LOCAL, 5813 bytes)
19/11/14 18:18:23 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.0.8:59810 (size: 2.4 KB, free: 413.9 MB)
19/11/14 18:18:23 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, 192.168.0.8, executor 0, partition 1, NODE_LOCAL, 5813 bytes)
19/11/14 18:18:23 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 141 ms on 192.168.0.8 (executor 0) (1/2)
19/11/14 18:18:23 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 130 ms on 192.168.0.8 (executor 0) (2/2)
19/11/14 18:18:23 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
19/11/14 18:18:23 INFO DAGScheduler: ShuffleMapStage 3 (map at Wordcount_product.scala:36) finished in 0.255 s
19/11/14 18:18:23 INFO DAGScheduler: looking for newly runnable stages
19/11/14 18:18:23 INFO DAGScheduler: running: Set()
19/11/14 18:18:23 INFO DAGScheduler: waiting: Set(ResultStage 4)
19/11/14 18:18:23 INFO DAGScheduler: failed: Set()
19/11/14 18:18:23 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[11] at saveAsTextFile at Wordcount_product.scala:46), which has no missing parents
19/11/14 18:18:23 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 69.3 KB, free 413.6 MB)
19/11/14 18:18:23 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 25.4 KB, free 413.6 MB)
19/11/14 18:18:23 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.0.8:40505 (size: 25.4 KB, free: 413.9 MB)
19/11/14 18:18:23 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
19/11/14 18:18:23 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[11] at saveAsTextFile at Wordcount_product.scala:46)
19/11/14 18:18:23 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
19/11/14 18:18:23 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, 192.168.0.8, executor 0, partition 0, NODE_LOCAL, 6118 bytes)
19/11/14 18:18:23 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.0.8:59810 (size: 25.4 KB, free: 413.9 MB)
19/11/14 18:18:24 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 192.168.0.8:35601
19/11/14 18:18:24 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 151 bytes
19/11/14 18:18:24 INFO DAGScheduler: ResultStage 4 (saveAsTextFile at Wordcount_product.scala:46) finished in 0.985 s
19/11/14 18:18:24 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 988 ms on 192.168.0.8 (executor 0) (1/1)
19/11/14 18:18:24 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
19/11/14 18:18:24 INFO DAGScheduler: Job 1 finished: saveAsTextFile at Wordcount_product.scala:46, took 1.391035 s

2.2 cluster:提交端口是REST server

date=`date +"%Y%m%d%H%M"`
/opt/modules/spark-2.1.0-bin-2.7.3/bin/spark-submit \
--master spark://bigdata.ibeifeng.com:6066 \
--deploy-mode cluster \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/datas/lib/scalaProjectMaven.jar

-》基于wc-hadoop01环境,REST server 端口也是6066

date=`date +"%Y%m%d%H%M"`
/opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/bin/spark-submit \
--master spark://hadoop:6066 \
--deploy-mode cluster \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/project/scalaproject/scalaProjectMaven.jar

结果:

Running Spark using the REST application submission protocol.
19/11/14 18:34:26 INFO RestSubmissionClient: Submitting a request to launch an application in spark://hadoop:6066.
19/11/14 18:34:27 INFO RestSubmissionClient: Submission successfully created as driver-20191114183427-0001. Polling submission state...
19/11/14 18:34:27 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20191114183427-0001 in spark://hadoop:6066.
19/11/14 18:34:27 INFO RestSubmissionClient: State of driver driver-20191114183427-0001 is now SUBMITTED.
19/11/14 18:34:27 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20191114183427-0001",
  "serverSparkVersion" : "2.1.0",
  "submissionId" : "driver-20191114183427-0001",
  "success" : true
}

 

四、yarn模式

参考:http://spark.apache.org/docs/2.1.0/running-on-yarn.html

1.前提

(1)开启hdfs

(2)开启yarn

2.OnYarn的client和cluster区别

(1)Client

Driver运行在Client端(提交Spark作业的机器)

Client会和请求到的Container进行通信来完成作业的调度和执行,Client是不能退出,如果退出任务结束。

运行日志,直接在客户端打印

(2)Cluster

Driver运行在ApplicationMaster中

Client提交作业后就可以关闭,因为已经在Yarn上运行。

运行日志,需要通过yarn命令查看:

参考:http://spark.apache.org/docs/2.1.0/running-on-yarn.html

yarn logs -applicationId <app ID>

3.命令

3.1 client模式

date=`date +"%Y%m%d%H%M"`
/opt/modules/spark-2.1.0-bin-2.7.3/bin/spark-submit \
--master yarn \
--deploy-mode client \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/datas/lib/scalaProjectMaven.jar

-》基于wc-hadoop01环境

参考上一篇文章:https://blog.csdn.net/u010886217/article/details/83317722

date=`date +"%Y%m%d%H%M"`
/opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/bin/spark-submit \
--master yarn \
--deploy-mode client \
--class _0722rdd.Wordcount_product \
--driver-memory   1G \
--driver-cores 1 \
--executor-memory 1G \
--executor-cores 1 \
--num-executors 1 \
--conf spark.app.coalesce=1 \
/opt/project/scalaproject/scalaProjectMaven.jar

3.2 cluster模式

date=`date +"%Y%m%d%H%M"`
/opt/modules/spark-2.1.0-bin-2.7.3/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/datas/lib/scalaProjectMaven.jar

-》基于wc-hadoop01环境

date=`date +"%Y%m%d%H%M"`
/opt/modules/cdh5.7.0/spark-2.1.0-bin-2.6.0-cdh5.7.0/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--class _0722rdd.Wordcount_product \
--conf spark.app.coalesce=1 \
/opt/project/scalaproject/scalaProjectMaven.jar

wc-hadoop01运行结果

19/11/14 18:39:26 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.0.8
         ApplicationMaster RPC port: 0
         queue: root.root
         start time: 1573727952427
         final status: UNDEFINED
         tracking URL: http://hadoop:8088/proxy/application_1573707031333_0002/
         user: root

(测试,成功~)

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值