Intellij IDEA开发集群提交运行Spark代码
1、 右键项目:
选择倒第四个OpenModule Settings:
2、添加Spark依赖包:
添加进去:
3、进度:
4、编写Wc代码:
新建一个Scala类,开始编写spark代码:
import org.apache.spark.{SparkContext, SparkConf}
/**
* Created by admin on 2015/12/15.
*/
object SparkDemo1 {
def main (args: Array[String]) {
if(args.length<1){
System.err.print("Usage <File>")
System.exit(1)
}
val conf=new SparkConf()
val sc=new SparkContext(conf)
val line=sc.textFile(args(0))
line.flatMap(_.split("\t")).map((_,1)).reduceByKey(_+_).collect().foreach(println)
sc.stop()
}
}
Intellij IDA视图如下:
5、打包:
打好的包:
用的是Standalone集群:
主节点:
从节点:
6、集群运行代码:
[root@spark0 spark]# bin/spark-submit--class SparkDemo1 /usr/local/spark/ScalaDemo1.jar/usr/local/spark-1.5.2-bin-hadoop2.6/bigdata/wcDemo1.txt
运行结果:
整个运行过程:
[root@spark0 spark]# bin/spark-submit --class SparkDemo1 /usr/local/spark/ScalaDemo1.jar /usr/local/spark-1.5.2-bin-hadoop2.6/bigdata/wcDemo1.txt Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/12/15 14:44:21 INFO SparkContext: Running Spark version 1.5.2 15/12/15 14:44:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/12/15 14:44:21 INFO SecurityManager: Changing view acls to: root 15/12/15 14:44:21 INFO SecurityManager: Changing modify acls to: root 15/12/15 14:44:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/12/15 14:44:23 INFO Slf4jLogger: Slf4jLogger started 15/12/15 14:44:23 INFO Remoting: Starting remoting 15/12/15 14:44:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.6.2:40172] 15/12/15 14:44:23 INFO Utils: Successfully started service 'sparkDriver' on port 40172. 15/12/15 14:44:23 INFO SparkEnv: Registering MapOutputTracker 15/12/15 14:44:23 INFO SparkEnv: Registering BlockManagerMaster 15/12/15 14:44:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-dbd5e9f3-bfa1-4c03-9bbe-9ff2661049f2 15/12/15 14:44:23 INFO MemoryStore: MemoryStore started with capacity 534.5 MB 15/12/15 14:44:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ffda6806-746c-4740-9558-bd3a047fafe2/httpd-0ded5c18-fb57-4ab7-bcd2-3ef6a5fa550d 15/12/15 14:44:23 INFO HttpServer: Starting HTTP Server 15/12/15 14:44:23 INFO Utils: Successfully started service 'HTTP file server' on port 40716. 15/12/15 14:44:23 INFO SparkEnv: Registering OutputCommitCoordinator 15/12/15 14:44:24 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/12/15 14:44:24 INFO SparkUI: Started SparkUI at http://192.168.6.2:4040 15/12/15 14:44:24 INFO SparkContext: Added JAR file:/usr/local/spark/ScalaDemo1.jar at http://192.168.6.2:40716/jars/ScalaDemo1.jar with timestamp 1450161864561 15/12/15 14:44:24 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/12/15 14:44:24 INFO Executor: Starting executor ID driver on host localhost 15/12/15 14:44:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40076. 15/12/15 14:44:25 INFO NettyBlockTransferService: Server created on 40076 15/12/15 14:44:25 INFO BlockManagerMaster: Trying to register BlockManager 15/12/15 14:44:25 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40076 with 534.5 MB RAM, BlockManagerId(driver, localhost, 40076) 15/12/15 14:44:25 INFO BlockManagerMaster: Registered BlockManager 15/12/15 14:44:29 INFO MemoryStore: ensureFreeSpace(130448) called with curMem=0, maxMem=560497950 15/12/15 14:44:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 534.4 MB) 15/12/15 14:44:29 INFO MemoryStore: ensureFreeSpace(14276) called with curMem=130448, maxMem=560497950 15/12/15 14:44:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 534.4 MB) 15/12/15 14:44:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40076 (size: 13.9 KB, free: 534.5 MB) 15/12/15 14:44:29 INFO SparkContext: Created broadcast 0 from textFile at SparkDemo1.scala:14 15/12/15 14:44:30 INFO FileInputFormat: Total input paths to process : 1 15/12/15 14:44:30 INFO SparkContext: Starting job: collect at SparkDemo1.scala:16 15/12/15 14:44:30 INFO DAGScheduler: Registering RDD 3 (map at SparkDemo1.scala:16) 15/12/15 14:44:30 INFO DAGScheduler: Got job 0 (collect at SparkDemo1.scala:16) with 1 output partitions 15/12/15 14:44:30 INFO DAGScheduler: Final stage: ResultStage 1(collect at SparkDemo1.scala:16) 15/12/15 14:44:30 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 15/12/15 14:44:30 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 15/12/15 14:44:30 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkDemo1.scala:16), which has no missing parents 15/12/15 14:44:30 INFO MemoryStore: ensureFreeSpace(4064) called with curMem=144724, maxMem=560497950 15/12/15 14:44:30 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 534.4 MB) 15/12/15 14:44:30 INFO MemoryStore: ensureFreeSpace(2328) called with curMem=148788, maxMem=560497950 15/12/15 14:44:30 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 534.4 MB) 15/12/15 14:44:30 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:40076 (size: 2.3 KB, free: 534.5 MB) 15/12/15 14:44:30 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861 15/12/15 14:44:30 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkDemo1.scala:16) 15/12/15 14:44:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/12/15 14:44:30 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2213 bytes) 15/12/15 14:44:30 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/12/15 14:44:30 INFO Executor: Fetching http://192.168.6.2:40716/jars/ScalaDemo1.jar with timestamp 1450161864561 15/12/15 14:44:31 INFO Utils: Fetching http://192.168.6.2:40716/jars/ScalaDemo1.jar to /tmp/spark-ffda6806-746c-4740-9558-bd3a047fafe2/userFiles-e4174a0b-997b-424c-893b-0570668f9320/fetchFileTemp8876302010385452933.tmp 15/12/15 14:44:36 INFO Executor: Adding file:/tmp/spark-ffda6806-746c-4740-9558-bd3a047fafe2/userFiles-e4174a0b-997b-424c-893b-0570668f9320/ScalaDemo1.jar to class loader 15/12/15 14:44:36 INFO HadoopRDD: Input split: file:/usr/local/spark-1.5.2-bin-hadoop2.6/bigdata/wcDemo1.txt:0+142 15/12/15 14:44:36 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/12/15 14:44:36 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/12/15 14:44:36 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/12/15 14:44:36 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/12/15 14:44:36 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/12/15 14:44:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2253 bytes result sent to driver 15/12/15 14:44:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7131 ms on localhost (1/1) 15/12/15 14:44:37 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/12/15 14:44:38 INFO DAGScheduler: ShuffleMapStage 0 (map at SparkDemo1.scala:16) finished in 7.402 s 15/12/15 14:44:38 INFO DAGScheduler: looking for newly runnable stages 15/12/15 14:44:38 INFO DAGScheduler: running: Set() 15/12/15 14:44:38 INFO DAGScheduler: waiting: Set(ResultStage 1) 15/12/15 14:44:38 INFO DAGScheduler: failed: Set() 15/12/15 14:44:38 INFO DAGScheduler: Missing parents for ResultStage 1: List() 15/12/15 14:44:38 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at SparkDemo1.scala:16), which is now runnable 15/12/15 14:44:38 INFO MemoryStore: ensureFreeSpace(2288) called with curMem=151116, maxMem=560497950 15/12/15 14:44:38 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.2 KB, free 534.4 MB) 15/12/15 14:44:38 INFO MemoryStore: ensureFreeSpace(1375) called with curMem=153404, maxMem=560497950 15/12/15 14:44:38 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1375.0 B, free 534.4 MB) 15/12/15 14:44:38 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:40076 (size: 1375.0 B, free: 534.5 MB) 15/12/15 14:44:38 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:861 15/12/15 14:44:38 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at SparkDemo1.scala:16) 15/12/15 14:44:38 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 15/12/15 14:44:38 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1955 bytes) 15/12/15 14:44:38 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) 15/12/15 14:44:38 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 15/12/15 14:44:38 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 96 ms 15/12/15 14:44:38 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1541 bytes result sent to driver 15/12/15 14:44:38 INFO DAGScheduler: ResultStage 1 (collect at SparkDemo1.scala:16) finished in 0.335 s 15/12/15 14:44:38 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 335 ms on localhost (1/1) 15/12/15 14:44:38 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 15/12/15 14:44:38 INFO DAGScheduler: Job 0 finished: collect at SparkDemo1.scala:16, took 8.480429 s (spark,5) (hive,3) (hadoop,5) (docker,1) (flume,1) (solr,1) (storm,1) (elasticsearch,1) (kafka,1) (sqoop,1) (redis,1) (hbase,1) 15/12/15 14:44:39 INFO SparkUI: Stopped Spark web UI at http://192.168.6.2:4040 15/12/15 14:44:39 INFO DAGScheduler: Stopping DAGScheduler 15/12/15 14:44:42 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/12/15 14:44:42 INFO MemoryStore: MemoryStore cleared 15/12/15 14:44:42 INFO BlockManager: BlockManager stopped 15/12/15 14:44:42 INFO BlockManagerMaster: BlockManagerMaster stopped 15/12/15 14:44:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/12/15 14:44:42 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/12/15 14:44:42 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/12/15 14:44:42 INFO SparkContext: Successfully stopped SparkContext 15/12/15 14:44:42 INFO ShutdownHookManager: Shutdown hook called 15/12/15 14:44:42 INFO ShutdownHookManager: Deleting directory /tmp/spark-ffda6806-746c-4740-9558-bd3a047fafe2 [root@spark0 spark]# |
7、关闭Spark集群: