Spark启动

原创 2015年07月09日 22:56:09

1:Master启动

命令:

./sbin/start-master.sh

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-jifeng-org.apache.spark.deploy.master.Master-1-feng03.out
查看启动日志

[jifeng@feng03 logs]$ cat spark-jifeng-org.apache.spark.deploy.master.Master-1-feng03.out 
Spark Command: /home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/jifeng/hadoop-2.6.0/etc/hadoop/ -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.master.Master --ip feng03 --port 7077 --webui-port 8080
========================================
15/07/11 22:12:29 INFO master.Master: Registered signal handlers for [TERM, HUP, INT]
15/07/11 22:12:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 22:12:30 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 22:12:30 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 22:12:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 22:12:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/11 22:12:30 INFO Remoting: Starting remoting
15/07/11 22:12:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@feng03:7077]
15/07/11 22:12:31 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
15/07/11 22:12:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 22:12:31 INFO server.AbstractConnector: Started SelectChannelConnector@feng03:6066
15/07/11 22:12:31 INFO util.Utils: Successfully started service on port 6066.
15/07/11 22:12:31 INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 6066
15/07/11 22:12:31 INFO master.Master: Starting Spark master at spark://feng03:7077
15/07/11 22:12:31 INFO master.Master: Running Spark version 1.4.0
15/07/11 22:12:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 22:12:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8080
15/07/11 22:12:31 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.
15/07/11 22:12:31 INFO ui.MasterWebUI: Started MasterWebUI at http://192.168.0.110:8080
15/07/11 22:12:32 INFO master.Master: I have been elected leader! New state: ALIVE
15/07/11 22:13:43 INFO master.Master: Registering worker 192.168.0.110:35655 with 1 cores, 2.0 GB RAM


2:slave启动

命令:

./sbin/start-slave.sh <worker#> <master-spark-URL>

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ sbin/start-slaves.sh spark://feng03:7077
feng03: starting org.apache.spark.deploy.worker.Worker, logging to /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-jifeng-org.apache.spark.deploy.worker.Worker-1-feng03.out
查看启动日志


[jifeng@feng03 logs]$ cat spark-jifeng-org.apache.spark.deploy.worker.Worker-1-feng03.out 
Spark Command: /home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://feng03:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/11 22:13:39 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/07/11 22:13:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 22:13:41 INFO SecurityManager: Changing view acls to: jifeng
15/07/11 22:13:41 INFO SecurityManager: Changing modify acls to: jifeng
15/07/11 22:13:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 22:13:41 INFO Slf4jLogger: Slf4jLogger started
15/07/11 22:13:41 INFO Remoting: Starting remoting
15/07/11 22:13:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@192.168.0.110:35655]
15/07/11 22:13:41 INFO Utils: Successfully started service 'sparkWorker' on port 35655.
15/07/11 22:13:42 INFO Worker: Starting Spark worker 192.168.0.110:35655 with 1 cores, 2.0 GB RAM
15/07/11 22:13:42 INFO Worker: Running Spark version 1.4.0
15/07/11 22:13:42 INFO Worker: Spark home: /home/jifeng/spark-1.4.0-bin-hadoop2.6
15/07/11 22:13:42 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/07/11 22:13:42 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.110:8081
15/07/11 22:13:42 INFO Worker: Connecting to master akka.tcp://sparkMaster@feng03:7077/user/Master...
15/07/11 22:13:43 INFO Worker: Successfully registered with master spark://feng03:7077

3:Shell启动

./bin/spark-shell --master spark://IP:PORT

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./bin/spark-shell master=spark://feng03:7077
15/07/11 23:09:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 23:09:26 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 23:09:26 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 23:09:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 23:09:26 INFO spark.HttpServer: Starting HTTP Server
15/07/11 23:09:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34613
15/07/11 23:09:27 INFO util.Utils: Successfully started service 'HTTP class server' on port 34613.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/11 23:09:36 INFO spark.SparkContext: Running Spark version 1.4.0
15/07/11 23:09:36 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 23:09:36 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 23:09:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 23:09:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/11 23:09:37 INFO Remoting: Starting remoting
15/07/11 23:09:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.0.110:34690]
15/07/11 23:09:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 34690.
15/07/11 23:09:37 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/11 23:09:37 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/11 23:09:37 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-d8ab9b2d-bf0e-498a-9c7a-93fe904611e0/blockmgr-0531d884-7f97-46a0-8533-3b8c1abee2ee
15/07/11 23:09:37 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/07/11 23:09:38 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-d8ab9b2d-bf0e-498a-9c7a-93fe904611e0/httpd-e41d1cc4-870d-4882-9b66-8dbbc79645a3
15/07/11 23:09:38 INFO spark.HttpServer: Starting HTTP Server
15/07/11 23:09:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:38 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45995
15/07/11 23:09:38 INFO util.Utils: Successfully started service 'HTTP file server' on port 45995.
15/07/11 23:09:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/07/11 23:09:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:40 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/07/11 23:09:40 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/07/11 23:09:40 INFO ui.SparkUI: Started SparkUI at http://192.168.0.110:4040
15/07/11 23:09:40 INFO executor.Executor: Starting executor ID driver on host localhost
15/07/11 23:09:40 INFO executor.Executor: Using REPL class URI: http://192.168.0.110:34613
15/07/11 23:09:40 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41799.
15/07/11 23:09:40 INFO netty.NettyBlockTransferService: Server created on 41799
15/07/11 23:09:40 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/11 23:09:40 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:41799 with 267.3 MB RAM, BlockManagerId(driver, localhost, 41799)
15/07/11 23:09:40 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/11 23:09:41 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
15/07/11 23:09:42 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/07/11 23:09:42 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/07/11 23:09:43 INFO metastore.ObjectStore: ObjectStore, initialize called
15/07/11 23:09:43 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/07/11 23:09:43 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/07/11 23:09:44 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/07/11 23:09:44 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/07/11 23:09:48 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/07/11 23:09:49 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/07/11 23:09:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:53 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:53 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:54 INFO metastore.ObjectStore: Initialized ObjectStore
15/07/11 23:09:54 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/07/11 23:09:55 INFO metastore.HiveMetaStore: Added admin role in metastore
15/07/11 23:09:55 INFO metastore.HiveMetaStore: Added public role in metastore
15/07/11 23:09:56 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
15/07/11 23:09:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/07/11 23:09:56 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

经过修改spark-class

查看下启动的命令是:

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./bin/spark-shell master=spark://feng03:7077
/home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/jifeng/hadoop-2.6.0/etc/hadoop/ -Dscala.usejavacp=true -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main spark-shell master=spark://feng03:7077

4:测试WordCount

读取文件

val textFile = sc.textFile("file:///home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md")

textFile.count()
统计行数

scala> val textFile = sc.textFile("file:///home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md")
15/07/11 23:29:21 INFO storage.MemoryStore: ensureFreeSpace(233640) called with curMem=109214, maxMem=280248975
15/07/11 23:29:21 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 228.2 KB, free 266.9 MB)
15/07/11 23:29:21 INFO storage.MemoryStore: ensureFreeSpace(20038) called with curMem=342854, maxMem=280248975
15/07/11 23:29:21 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 19.6 KB, free 266.9 MB)
15/07/11 23:29:21 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:41799 (size: 19.6 KB, free: 267.2 MB)
15/07/11 23:29:21 INFO spark.SparkContext: Created broadcast 1 from textFile at <console>:21
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at <console>:21

scala> textFile.count()
15/07/11 23:29:24 INFO mapred.FileInputFormat: Total input paths to process : 1
15/07/11 23:29:24 INFO spark.SparkContext: Starting job: count at <console>:24
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:24) with 1 output partitions (allowLocal=false)
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(count at <console>:24)
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Missing parents: List()
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at textFile at <console>:21), which has no missing parents
15/07/11 23:29:24 INFO storage.MemoryStore: ensureFreeSpace(3008) called with curMem=362892, maxMem=280248975
15/07/11 23:29:24 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 266.9 MB)
15/07/11 23:29:24 INFO storage.MemoryStore: ensureFreeSpace(1791) called with curMem=365900, maxMem=280248975
15/07/11 23:29:24 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1791.0 B, free 266.9 MB)
15/07/11 23:29:24 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:41799 (size: 1791.0 B, free: 267.2 MB)
15/07/11 23:29:24 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at textFile at <console>:21)
15/07/11 23:29:24 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/07/11 23:29:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1426 bytes)
15/07/11 23:29:24 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/07/11 23:29:24 INFO rdd.HadoopRDD: Input split: file:/home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md:0+3624
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/07/11 23:29:25 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1830 bytes result sent to driver
15/07/11 23:29:25 INFO scheduler.DAGScheduler: ResultStage 0 (count at <console>:24) finished in 0.308 s
15/07/11 23:29:25 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 263 ms on localhost (1/1)
15/07/11 23:29:25 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/07/11 23:29:25 INFO scheduler.DAGScheduler: Job 0 finished: count at <console>:24, took 0.651881 s
res2: Long = 98

val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
统计单词数量
count.collect()
collect命令提交并执行job

scala> val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:23

scala> count.collect()
15/07/11 23:37:42 INFO spark.SparkContext: Starting job: collect at <console>:26
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Registering RDD 5 (map at <console>:23)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Got job 1 (collect at <console>:26) with 1 output partitions (allowLocal=false)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 2(collect at <console>:26)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at map at <console>:23), which has no missing parents
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(4136) called with curMem=362892, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.0 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(2311) called with curMem=367028, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.3 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:41799 (size: 2.3 KB, free: 267.2 MB)
15/07/11 23:37:42 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:874
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at map at <console>:23)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1415 bytes)
15/07/11 23:37:42 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
15/07/11 23:37:42 INFO rdd.HadoopRDD: Input split: file:/home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md:0+3624
15/07/11 23:37:42 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 2056 bytes result sent to driver
15/07/11 23:37:42 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (map at <console>:23) finished in 0.465 s
15/07/11 23:37:42 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/07/11 23:37:42 INFO scheduler.DAGScheduler: running: Set()
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 468 ms on localhost (1/1)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/07/11 23:37:42 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: failed: Set()
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Missing parents for ResultStage 2: List()
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at reduceByKey at <console>:23), which is now runnable
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(2288) called with curMem=369339, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 2.2 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(1377) called with curMem=371627, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1377.0 B, free 266.9 MB)
15/07/11 23:37:42 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:41799 (size: 1377.0 B, free: 267.2 MB)
15/07/11 23:37:42 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:874
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (ShuffledRDD[6] at reduceByKey at <console>:23)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1165 bytes)
15/07/11 23:37:42 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
15/07/11 23:37:42 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/07/11 23:37:42 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
15/07/11 23:37:42 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 7258 bytes result sent to driver
15/07/11 23:37:42 INFO scheduler.DAGScheduler: ResultStage 2 (collect at <console>:26) finished in 0.260 s
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Job 1 finished: collect at <console>:26, took 0.815686 s
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 263 ms on localhost (1/1)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
res3: Array[(String, Int)] = Array((package,1), (For,2), (Programs,1), (processing.,1), (Because,1), (The,1), (cluster.,1), (its,1), ([run,1), (APIs,1), (have,1), (Try,1), (computation,1), (through,1), (several,1), (This,2), ("yarn-cluster",1), (graph,1), (Hive,2), (storage,1), (["Specifying,1), (To,2), (page](http://spark.apache.org/documentation.html),1), (Once,1), (application,1), (prefer,1), (SparkPi,2), (engine,1), (version,1), (file,1), (documentation,,1), (processing,,2), (the,21), (are,1), (systems.,1), (params,1), (not,1), (different,1), (refer,2), (Interactive,2), (given.,1), (if,4), (build,3), (when,1), (be,2), (Tests,1), (Apache,1), (all,1), (./bin/run-example,2), (programs,,1), (including,3), (Spark.,1), (package.,1), (1000).count(),1), (Versions,1), (HDFS,1), (Data.,1), (>...
scala> 


保存计算结果
 count.saveAsTextFile("README.md")

scala> count.saveAsTextFile("README.md")
15/07/11 23:43:21 INFO spark.SparkContext: Starting job: saveAsTextFile at <console>:26
15/07/11 23:43:21 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 143 bytes
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Got job 2 (saveAsTextFile at <console>:26) with 1 output partitions (allowLocal=false)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Final stage: ResultStage 4(saveAsTextFile at <console>:26)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Missing parents: List()
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[7] at saveAsTextFile at <console>:26), which has no missing parents
15/07/11 23:43:21 INFO storage.MemoryStore: ensureFreeSpace(127984) called with curMem=362892, maxMem=280248975
15/07/11 23:43:21 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 125.0 KB, free 266.8 MB)
15/07/11 23:43:21 INFO storage.MemoryStore: ensureFreeSpace(43257) called with curMem=490876, maxMem=280248975
15/07/11 23:43:21 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 42.2 KB, free 266.8 MB)
15/07/11 23:43:21 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:41799 (size: 42.2 KB, free: 267.2 MB)
15/07/11 23:43:21 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:874
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[7] at saveAsTextFile at <console>:26)
15/07/11 23:43:21 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
15/07/11 23:43:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 3, localhost, PROCESS_LOCAL, 1165 bytes)
15/07/11 23:43:21 INFO executor.Executor: Running task 0.0 in stage 4.0 (TID 3)
15/07/11 23:43:22 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/07/11 23:43:22 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms
15/07/11 23:43:23 INFO output.FileOutputCommitter: Saved output of task 'attempt_201507112343_0004_m_000000_3' to hdfs://feng01:9000/user/jifeng/README.md/_temporary/0/task_201507112343_0004_m_000000
15/07/11 23:43:23 INFO mapred.SparkHadoopMapRedUtil: attempt_201507112343_0004_m_000000_3: Committed
15/07/11 23:43:23 INFO executor.Executor: Finished task 0.0 in stage 4.0 (TID 3). 1828 bytes result sent to driver
15/07/11 23:43:23 INFO scheduler.DAGScheduler: ResultStage 4 (saveAsTextFile at <console>:26) finished in 1.990 s
15/07/11 23:43:23 INFO scheduler.DAGScheduler: Job 2 finished: saveAsTextFile at <console>:26, took 2.230414 s
15/07/11 23:43:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 3) in 1992 ms on localhost (1/1)
15/07/11 23:43:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
看到信息,不带file://的话,默认保存到hdfs中


相关文章推荐

Spark --- 启动、运行、关闭过程

计算PI值// scalastyle:off println package org.apache.spark.examplesimport scala.math.randomimport org.a...

Spark运行模式(一)-----Spark独立模式

除了可以在Mesos或者YARN集群管理器上运行Spark外,Spark还提供了独立部署模式。你可以通过手动启动一个master和workers,或者使用提供的脚本来手动地启动单独的集群模式。你也可以...

Spark 启动方式

1、spark 提交任务方式 1)、spark on yarn: $ ./bin/spark-submit--class org.apache.spark.examples.SparkP...

spark启动顺序-笔记

1:hadoop的sbin下的dsf数据管理系统 2:spark的sbin下的start-all.sh启动系统 3:spark的bin下的spark-shell...

Spark的四种运行模式(1.2.1)

转载:http://blog.cheyo.net/29.html 介绍 本地模式 Spark单机运行,一般用于开发测试。 Standalone模式 ...

spark安装与使用(入门)

安装环境: Ubuntu sever版 ,java ,scala, 一:在linux下安装java环境(自行安装jdk) 二:安装Scala2.9.3 $ tar -zxf scala-2.9.3....

spark安装及入门笔记

spark介绍 Spark是个通用的集群计算框架,通过将大量数据集计算任务分配到多台计算机上,提供高效内存计算。如果你熟悉Hadoop,那么你知道分布式计算框架要解决两个问题:如何分发数据和如何分发...

Spark:通过start-slaves.sh脚本启动worker报错:Permission denied

背景信息: Spark两个节点,主机名分别为master和slave,$SPARK_HOMR/conf/slaves配置了两条记录:两行分别为master和slave。 错误描述: 但是启动的时...

pyspark初探(一)LearningSpark

启动pysparkIPYTHON=1 pysparkIPYTHON_OPTS="notebook" pyspark (set IPYTHON=1 pyspark for windows)执行pyt...

spark on yarn 的那些坑

在公司6个节点的测试集群运行得好好的,结果也很正常,然后放上60个节点的预生产环境,我勒个擦,搞了我两天,主要是生产环境的那些家伙不配合,一个问题搞得拖啊拖 ,首先是安全认证问题,截取一两个有意义的吧...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Spark启动
举报原因:
原因补充:

(最多只允许输入30个字)