spark-submit提交jar包到spark集群上

一、首先将写好的程序打包成jar包。在IDEA下使用maven导出jar包,如下:

在pom.xml中添加以下插件

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <useUniqueVersions>false</useUniqueVersions>
                            <classpathPrefix>lib/</classpathPrefix>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>

二、然后将导出的jar包上传至集群,并运行:

bin/spark-submit --master spark://master.hadoop:7077 --class nuc.sw.test.ScalaWordCount spark-1.0-SNAPSHOT.jar hdfs://master.hadoop:9000/spark/input/a.txt hdfs://master.hadoop:9000/spark/output

命令解释:

--master spark://master.hadoop:7077指定spark集群的master

--class nuc.sw.test.ScalaWordCount 指定类名全路径

接下来是jar的绝对路径,我的jar放在了spark的安装路径下

然后是输入和输出,我的输入和输出在hdfs上。

运行日志:

[root@master spark-2.2.0]# bin/spark-submit --master spark://master.hadoop:7077 --class nuc.sw.test.ScalaWordCount spark-1.0-SNAPSHOT.jar hdfs://master.hadoop:9000/spark/input/a.txt hdfs://master.hadoop:9000/spark/output
18/09/18 09:41:34 INFO spark.SparkContext: Running Spark version 2.2.0
18/09/18 09:41:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/18 09:41:35 INFO spark.SparkContext: Submitted application: ScalaWordCount
18/09/18 09:41:35 INFO spark.SecurityManager: Changing view acls to: root
18/09/18 09:41:35 INFO spark.SecurityManager: Changing modify acls to: root
18/09/18 09:41:35 INFO spark.SecurityManager: Changing view acls groups to: 
18/09/18 09:41:35 INFO spark.SecurityManager: Changing modify acls groups to: 
18/09/18 09:41:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
18/09/18 09:41:36 INFO util.Utils: Successfully started service 'sparkDriver' on port 33330.
18/09/18 09:41:36 INFO spark.SparkEnv: Registering MapOutputTracker
18/09/18 09:41:36 INFO spark.SparkEnv: Registering BlockManagerMaster
18/09/18 09:41:36 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/09/18 09:41:36 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/09/18 09:41:36 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-f353c0d2-29ee-431b-8cfc-4c13fdf12a64
18/09/18 09:41:36 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MB
18/09/18 09:41:36 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/09/18 09:41:36 INFO util.log: Logging initialized @2890ms
18/09/18 09:41:36 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/09/18 09:41:36 INFO server.Server: Started @3040ms
18/09/18 09:41:36 INFO server.AbstractConnector: Started ServerConnector@44e046e6{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/09/18 09:41:36 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@72efb5c1{/jobs,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2449cff7{/jobs/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62da83ed{/jobs/job,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@384fc774{/jobs/job/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71e9a896{/stages,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@408b35bf{/stages/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15bcf458{/stages/stage,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@726386ed{/stages/stage/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14bb2297{/stages/pool,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@797501a{/stages/pool/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@57f791c6{/storage,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c4f9535{/storage/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@30c31dd7{/storage/rdd,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@596df867{/storage/rdd/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@241a53ef{/environment,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2db2cd5{/environment/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@615f972{/executors,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73393584{/executors/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1827a871{/executors/threadDump,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7249dadf{/executors/threadDump/json,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66238be2{/static,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ea502e0{/,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@473b3b7a{/api,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@133e019b{/jobs/job/kill,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7dac3fd8{/stages/stage/kill,null,AVAILABLE,@Spark}
18/09/18 09:41:36 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.2:4040
18/09/18 09:41:37 INFO spark.SparkContext: Added JAR file:/apps/spark-2.2.0/spark-1.0-SNAPSHOT.jar at spark://192.168.1.2:33330/jars/spark-1.0-SNAPSHOT.jar with timestamp 1537234897070
18/09/18 09:41:37 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://master.hadoop:7077...
18/09/18 09:41:37 INFO client.TransportClientFactory: Successfully created connection to master.hadoop/192.168.1.2:7077 after 47 ms (0 ms spent in bootstraps)
18/09/18 09:41:37 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180918094137-0003
18/09/18 09:41:37 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20180918094137-0003/0 on worker-20180918083222-192.168.1.4-44724 (192.168.1.4:44724) with 2 cores
18/09/18 09:41:37 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20180918094137-0003/0 on hostPort 192.168.1.4:44724 with 2 cores, 512.0 MB RAM
18/09/18 09:41:37 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20180918094137-0003/1 on worker-20180918083224-192.168.1.3-33253 (192.168.1.3:33253) with 2 cores
18/09/18 09:41:37 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20180918094137-0003/1 on hostPort 192.168.1.3:33253 with 2 cores, 512.0 MB RAM
18/09/18 09:41:37 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33829.
18/09/18 09:41:37 INFO netty.NettyBlockTransferService: Server created on 192.168.1.2:33829
18/09/18 09:41:37 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/18 09:41:37 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.2, 33829, None)
18/09/18 09:41:37 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.2:33829 with 413.9 MB RAM, BlockManagerId(driver, 192.168.1.2, 33829, None)
18/09/18 09:41:37 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.2, 33829, None)
18/09/18 09:41:37 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.2, 33829, None)
18/09/18 09:41:37 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20180918094137-0003/1 is now RUNNING
18/09/18 09:41:37 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20180918094137-0003/0 is now RUNNING
18/09/18 09:41:38 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6999cd39{/metrics/json,null,AVAILABLE,@Spark}
18/09/18 09:41:38 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/09/18 09:41:41 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 248.8 KB, free 413.7 MB)
18/09/18 09:41:41 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.8 KB, free 413.7 MB)
18/09/18 09:41:41 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.2:33829 (size: 23.8 KB, free: 413.9 MB)
18/09/18 09:41:41 INFO spark.SparkContext: Created broadcast 0 from textFile at ScalaWordCount.scala:16
18/09/18 09:41:43 INFO mapred.FileInputFormat: Total input paths to process : 1
18/09/18 09:41:45 INFO spark.SparkContext: Starting job: sortBy at ScalaWordCount.scala:24
18/09/18 09:41:46 INFO scheduler.DAGScheduler: Registering RDD 3 (map at ScalaWordCount.scala:20)
18/09/18 09:41:46 INFO scheduler.DAGScheduler: Got job 0 (sortBy at ScalaWordCount.scala:24) with 2 output partitions
18/09/18 09:41:46 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (sortBy at ScalaWordCount.scala:24)
18/09/18 09:41:46 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/09/18 09:41:46 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/09/18 09:41:48 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at ScalaWordCount.scala:20), which has no missing parents
18/09/18 09:41:53 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.7 KB, free 413.7 MB)
18/09/18 09:41:53 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 413.7 MB)
18/09/18 09:41:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.2:33829 (size: 2.8 KB, free: 413.9 MB)
18/09/18 09:41:53 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/09/18 09:41:54 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at ScalaWordCount.scala:20) (first 15 tasks are for partitions Vector(0, 1))
18/09/18 09:41:54 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
18/09/18 09:42:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/09/18 09:42:13 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.3:49064) with ID 1
18/09/18 09:42:14 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.3, executor 1, partition 0, ANY, 4855 bytes)
18/09/18 09:42:14 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.3, executor 1, partition 1, ANY, 4855 bytes)
18/09/18 09:42:21 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:35989 with 117.0 MB RAM, BlockManagerId(1, 192.168.1.3, 35989, None)
18/09/18 09:42:29 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.4:58296) with ID 0
18/09/18 09:42:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.4:42944 with 117.0 MB RAM, BlockManagerId(0, 192.168.1.4, 42944, None)
18/09/18 09:43:01 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.3:35989 (size: 2.8 KB, free: 117.0 MB)
18/09/18 09:43:11 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.3:35989 (size: 23.8 KB, free: 116.9 MB)
18/09/18 09:43:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 80653 ms on 192.168.1.3 (executor 1) (1/2)
18/09/18 09:43:34 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 80016 ms on 192.168.1.3 (executor 1) (2/2)
18/09/18 09:43:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/09/18 09:43:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at ScalaWordCount.scala:20) finished in 100.271 s
18/09/18 09:43:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/09/18 09:43:34 INFO scheduler.DAGScheduler: running: Set()
18/09/18 09:43:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
18/09/18 09:43:34 INFO scheduler.DAGScheduler: failed: Set()
18/09/18 09:43:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at sortBy at ScalaWordCount.scala:24), which has no missing parents
18/09/18 09:43:34 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 413.6 MB)
18/09/18 09:43:35 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 413.6 MB)
18/09/18 09:43:35 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.2:33829 (size: 2.4 KB, free: 413.9 MB)
18/09/18 09:43:35 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/09/18 09:43:35 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at sortBy at ScalaWordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
18/09/18 09:43:35 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
18/09/18 09:43:35 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 192.168.1.3, executor 1, partition 0, NODE_LOCAL, 4625 bytes)
18/09/18 09:43:35 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 192.168.1.3, executor 1, partition 1, NODE_LOCAL, 4625 bytes)
18/09/18 09:43:36 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.3:35989 (size: 2.4 KB, free: 116.9 MB)
18/09/18 09:43:37 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.1.3:49064
18/09/18 09:43:37 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 bytes
18/09/18 09:43:40 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 4426 ms on 192.168.1.3 (executor 1) (1/2)
18/09/18 09:43:40 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 4471 ms on 192.168.1.3 (executor 1) (2/2)
18/09/18 09:43:40 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/09/18 09:43:40 INFO scheduler.DAGScheduler: ResultStage 1 (sortBy at ScalaWordCount.scala:24) finished in 4.610 s
18/09/18 09:43:45 INFO scheduler.DAGScheduler: Job 0 finished: sortBy at ScalaWordCount.scala:24, took 119.931560 s
18/09/18 09:43:45 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.1.2:33829 in memory (size: 2.8 KB, free: 413.9 MB)
18/09/18 09:43:45 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/09/18 09:43:45 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.1.3:35989 in memory (size: 2.8 KB, free: 116.9 MB)
18/09/18 09:43:46 INFO spark.SparkContext: Starting job: saveAsTextFile at ScalaWordCount.scala:27
18/09/18 09:43:46 INFO scheduler.DAGScheduler: Registering RDD 5 (sortBy at ScalaWordCount.scala:24)
18/09/18 09:43:46 INFO scheduler.DAGScheduler: Got job 1 (saveAsTextFile at ScalaWordCount.scala:27) with 2 output partitions
18/09/18 09:43:46 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (saveAsTextFile at ScalaWordCount.scala:27)
18/09/18 09:43:46 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
18/09/18 09:43:46 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3)
18/09/18 09:43:47 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[5] at sortBy at ScalaWordCount.scala:24), which has no missing parents
18/09/18 09:43:47 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.2 KB, free 413.6 MB)
18/09/18 09:43:47 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.4 KB, free 413.6 MB)
18/09/18 09:43:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.1.2:33829 (size: 2.4 KB, free: 413.9 MB)
18/09/18 09:43:47 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
18/09/18 09:43:47 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[5] at sortBy at ScalaWordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
18/09/18 09:43:47 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, 192.168.1.3, executor 1, partition 0, NODE_LOCAL, 4614 bytes)
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 5, 192.168.1.3, executor 1, partition 1, NODE_LOCAL, 4614 bytes)
18/09/18 09:43:47 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.1.3:35989 (size: 2.4 KB, free: 116.9 MB)
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 481 ms on 192.168.1.3 (executor 1) (1/2)
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 5) in 502 ms on 192.168.1.3 (executor 1) (2/2)
18/09/18 09:43:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
18/09/18 09:43:47 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (sortBy at ScalaWordCount.scala:24) finished in 0.507 s
18/09/18 09:43:47 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/09/18 09:43:47 INFO scheduler.DAGScheduler: running: Set()
18/09/18 09:43:47 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4)
18/09/18 09:43:47 INFO scheduler.DAGScheduler: failed: Set()
18/09/18 09:43:47 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at saveAsTextFile at ScalaWordCount.scala:27), which has no missing parents
18/09/18 09:43:47 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 73.7 KB, free 413.6 MB)
18/09/18 09:43:47 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 26.8 KB, free 413.5 MB)
18/09/18 09:43:47 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.1.2:33829 (size: 26.8 KB, free: 413.9 MB)
18/09/18 09:43:47 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
18/09/18 09:43:47 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at saveAsTextFile at ScalaWordCount.scala:27) (first 15 tasks are for partitions Vector(0, 1))
18/09/18 09:43:47 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, 192.168.1.3, executor 1, partition 0, NODE_LOCAL, 4625 bytes)
18/09/18 09:43:47 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 7, 192.168.1.3, executor 1, partition 1, NODE_LOCAL, 4625 bytes)
18/09/18 09:43:47 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.1.3:35989 (size: 26.8 KB, free: 116.9 MB)
18/09/18 09:43:48 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 192.168.1.3:49064
18/09/18 09:43:48 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 154 bytes
18/09/18 09:43:57 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 9322 ms on 192.168.1.3 (executor 1) (1/2)
18/09/18 09:43:57 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 7) in 9317 ms on 192.168.1.3 (executor 1) (2/2)
18/09/18 09:43:57 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
18/09/18 09:43:57 INFO scheduler.DAGScheduler: ResultStage 4 (saveAsTextFile at ScalaWordCount.scala:27) finished in 9.338 s
18/09/18 09:43:57 INFO scheduler.DAGScheduler: Job 1 finished: saveAsTextFile at ScalaWordCount.scala:27, took 10.077742 s
18/09/18 09:43:57 INFO server.AbstractConnector: Stopped Spark@44e046e6{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/09/18 09:43:57 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.2:4040
18/09/18 09:43:57 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors
18/09/18 09:43:57 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/09/18 09:43:57 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/09/18 09:43:58 INFO memory.MemoryStore: MemoryStore cleared
18/09/18 09:43:58 INFO storage.BlockManager: BlockManager stopped
18/09/18 09:43:58 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/09/18 09:43:58 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/09/18 09:43:58 INFO spark.SparkContext: Successfully stopped SparkContext
18/09/18 09:43:58 INFO util.ShutdownHookManager: Shutdown hook called
18/09/18 09:43:58 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a485a1fa-3d72-401a-a38e-62403ce75437
[root@master spark-2.2.0]# sbin/stop-all.sh 
slave2.hadoop: stopping org.apache.spark.deploy.worker.Worker
slave1.hadoop: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master

 

### 回答1: 可以使用spark-submit命令来运行jar,语法如下: ``` spark-submit --class <main-class> --master <master-url> <application-jar> <arguments> ``` 其中,`<main-class>`是应用程序的入口类,`<master-url>`是Spark集群的URL,`<application-jar>`是应用程序的jar路径,`<arguments>`是传递给应用程序的参数。 例如: ``` spark-submit --class com.example.MyApp --master spark://localhost:7077 myapp.jar arg1 arg2 arg3 ``` 在这个例子中,`com.example.MyApp`是应用程序的入口类,`spark://localhost:7077`是Spark集群的URL,`myapp.jar`是应用程序的jar路径,`arg1 arg2 arg3`是传递给应用程序的参数。 ### 回答2Spark-submit是一个用于将应用程序提交Spark集群的命令行工具,可以提交Scala、Java和Python等语言编写的应用程序。运行jarspark-submit的一种方式,jar是Java语言下的一种可执行模块,含了应用程序的依赖以及编译后的字节码。 一般来说,使用spark-submit运行jar,需要以下步骤: 1. 编写Spark应用程序,并编译成可执行的jar2. 配置Spark集群的参数。可以通过在提交jar时通过spark-submit的参数来指定,如:--master,--deploy-mode,--executor-memory等。 3. 编写启动脚本。在运行spark-submit之前,需要编写启动脚本,其中含了如何启动Spark集群的命令,以及如何提交应用程序的命令。 4. 运行spark-submit命令。在启动脚本中,使用spark-submit命令来提交应用程序,语法如下: ``` $SPARK_HOME/bin/spark-submit --class [class] [options] [jar file] [args] ``` 其中: --class:指定主类名称。 [options]:指定一些参数,例如,--master,--deploy-mode,--executor-memory等。 [jar file]:指定jar的路径。 [args]:指定应用程序运行时的参数,由应用程序自行解析和使用。 例如: ``` $SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1g --num-executors 3 $SPARK_HOME/examples/jars/spark-examples_2.11-2.3.1.jar 10 ``` 这个命令会提交SparkPi应用程序到YARN集群上,并使用3个executor和1G内存执行SparkPi,参数10会传给应用程序。 总之,运行jarspark-submit的一种常用方式,需要事先编译好可执行的jar,并通过spark-submit命令提交Spark集群上。在提交时需要配置一些参数,例如主类、集群模式、executor数量和内存等。 ### 回答3: Spark是一个用于大数据处理的开源分布式计算框架,它提供了很多工具来帮助开发者进行数据处理和分析。spark-submit是其中一个很重要的工具,可以用来运行提交Spark应用程序,其中括打好的jar,这个工具可以保证我们的应用程序可以在分布式集群运行spark-submit使用非常简单,只需要在终端中输入spark-submit命令,紧接着就是一大堆可选参数和必要参数了。其中最基本的命令格式如下: spark-submit [参数] [应用程序jar路径] [参数值] 下面是一些spark-submit参数的详细解释: 1. --master:指定运行模式,可以是local、standalone、mesos、yarn等,在本地模式下使用local选项。 2. --deploy-mode:指定部署模式,分为client和cluster模式,默认是client模式,可以在yarn模式下使用cluster选项。 3. --class:指定应用程序的入口类。 4. --conf:指定配置文件,如spark.executor.memory、spark.driver.memory等。 5. --executor-memory:指定每个Executor进程的内存大小。 6. --num-executors:指定启动的Executor进程数。 7. --executor-cores:指定每个Executor进程所占用的CPU核数。 8. --name:指定应用程序的名称。 当我们使用spark-submit提交应用程序时,需要指定应用程序的jar路径,一般情况下我们可以在Eclipse或者IntelliJ IDEA等IDE中将应用程序打成一个jar,然后将其上传到服务器中。使用spark-submit启动应用程序时,应用程序的jar路径可以使用相对路径或者绝对路径。 总之,spark-submit是一个非常强大的工具,可以帮助开发者轻松地提交运行Spark应用程序。开发者可以根据自己的需求灵活选择参数和修改配置信息,以获取更好的运行效果。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值