Spark by Scala_worldcount

代码

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
    def main(args: Array[String]) {
        val inputFile =  "file:///usr/local/spark/mycode/wordcount_2/word.txt"
            val conf = new SparkConf().setAppName("WordCount").setMaster("local[2]")
            val sc = new SparkContext(conf)
            val textFile = sc.textFile(inputFile)
            val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
            wordCount.foreach(println)       
    }
}

截图

这里写图片描述

这里写图片描述

这里写图片描述


流程

[root@master wordcount_2]# /usr/local/src/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" /usr/local/spark/mycode/wordcount_2/target/scala-2.11/simple-project_2.11-1.0.jar
17/08/15 22:30:29 INFO spark.SparkContext: Running Spark version 1.6.0
17/08/15 22:30:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/15 22:30:30 INFO spark.SecurityManager: Changing view acls to: root
17/08/15 22:30:30 INFO spark.SecurityManager: Changing modify acls to: root
17/08/15 22:30:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/08/15 22:30:31 INFO util.Utils: Successfully started service 'sparkDriver' on port 35518.
17/08/15 22:30:32 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/08/15 22:30:32 INFO Remoting: Starting remoting
17/08/15 22:30:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.183.100:57774]
17/08/15 22:30:32 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 57774.
17/08/15 22:30:32 INFO spark.SparkEnv: Registering MapOutputTracker
17/08/15 22:30:32 INFO spark.SparkEnv: Registering BlockManagerMaster
17/08/15 22:30:33 INFO storage.DiskBlockManager: Created local directory at /usr/local/src/spark-1.6.0-bin-hadoop2.6/blockmgr-3fb8c838-81df-4b7e-9a9c-f798c1e3306b
17/08/15 22:30:33 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
17/08/15 22:30:33 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/08/15 22:30:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/08/15 22:30:33 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/08/15 22:30:33 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/08/15 22:30:33 INFO ui.SparkUI: Started SparkUI at http://192.168.183.100:4040
17/08/15 22:30:33 INFO spark.HttpFileServer: HTTP File server directory is /usr/local/src/spark-1.6.0-bin-hadoop2.6/spark-9dedbb32-e6fa-45e6-8580-2014770cd644/httpd-d1a6373b-55e6-440b-b7ac-c6e6264af676
17/08/15 22:30:34 INFO spark.HttpServer: Starting HTTP Server
17/08/15 22:30:34 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/08/15 22:30:34 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:48923
17/08/15 22:30:34 INFO util.Utils: Successfully started service 'HTTP file server' on port 48923.
17/08/15 22:30:34 INFO spark.SparkContext: Added JAR file:/usr/local/spark/mycode/wordcount_2/target/scala-2.11/simple-project_2.11-1.0.jar at http://192.168.183.100:48923/jars/simple-project_2.11-1.0.jar with timestamp 1502861434086
17/08/15 22:30:34 INFO executor.Executor: Starting executor ID driver on host localhost
17/08/15 22:30:34 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46353.
17/08/15 22:30:34 INFO netty.NettyBlockTransferService: Server created on 46353
17/08/15 22:30:34 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/08/15 22:30:34 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:46353 with 517.4 MB RAM, BlockManagerId(driver, localhost, 46353)
17/08/15 22:30:34 INFO storage.BlockManagerMaster: Registered BlockManager
17/08/15 22:30:35 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.9 KB, free 153.9 KB)
17/08/15 22:30:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.2 KB, free 168.1 KB)
17/08/15 22:30:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:46353 (size: 14.2 KB, free: 517.4 MB)
17/08/15 22:30:35 INFO spark.SparkContext: Created broadcast 0 from textFile at SimpleApp.scala:10
17/08/15 22:30:37 INFO mapred.FileInputFormat: Total input paths to process : 1
17/08/15 22:30:37 INFO spark.SparkContext: Starting job: foreach at SimpleApp.scala:12
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Registering RDD 3 (map at SimpleApp.scala:11)
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Got job 0 (foreach at SimpleApp.scala:12) with 2 output partitions
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (foreach at SimpleApp.scala:12)
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SimpleApp.scala:11), which has no missing parents
17/08/15 22:30:37 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 172.1 KB)
17/08/15 22:30:37 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 174.4 KB)
17/08/15 22:30:37 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:46353 (size: 2.3 KB, free: 517.4 MB)
17/08/15 22:30:37 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/08/15 22:30:37 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SimpleApp.scala:11)
17/08/15 22:30:37 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2209 bytes)
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2209 bytes)
17/08/15 22:30:38 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/15 22:30:38 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/15 22:30:38 INFO executor.Executor: Fetching http://192.168.183.100:48923/jars/simple-project_2.11-1.0.jar with timestamp 1502861434086
17/08/15 22:30:38 INFO util.Utils: Fetching http://192.168.183.100:48923/jars/simple-project_2.11-1.0.jar to /usr/local/src/spark-1.6.0-bin-hadoop2.6/spark-9dedbb32-e6fa-45e6-8580-2014770cd644/userFiles-9b0ac9e4-3639-466c-a390-41690c515800/fetchFileTemp8766360412566628126.tmp
17/08/15 22:30:38 INFO executor.Executor: Adding file:/usr/local/src/spark-1.6.0-bin-hadoop2.6/spark-9dedbb32-e6fa-45e6-8580-2014770cd644/userFiles-9b0ac9e4-3639-466c-a390-41690c515800/simple-project_2.11-1.0.jar to class loader
17/08/15 22:30:38 INFO rdd.HadoopRDD: Input split: file:/usr/local/spark/mycode/wordcount/word.txt:0+29
17/08/15 22:30:38 INFO rdd.HadoopRDD: Input split: file:/usr/local/spark/mycode/wordcount/word.txt:29+29
17/08/15 22:30:38 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/08/15 22:30:38 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/08/15 22:30:38 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/08/15 22:30:38 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/08/15 22:30:38 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/08/15 22:30:38 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2254 bytes result sent to driver
17/08/15 22:30:38 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 2254 bytes result sent to driver
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 644 ms on localhost (1/2)
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 626 ms on localhost (2/2)
17/08/15 22:30:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/08/15 22:30:38 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at SimpleApp.scala:11) finished in 0.693 s
17/08/15 22:30:38 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/08/15 22:30:38 INFO scheduler.DAGScheduler: running: Set()
17/08/15 22:30:38 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/08/15 22:30:38 INFO scheduler.DAGScheduler: failed: Set()
17/08/15 22:30:38 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at SimpleApp.scala:11), which has no missing parents
17/08/15 22:30:38 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.5 KB, free 176.9 KB)
17/08/15 22:30:38 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1574.0 B, free 178.4 KB)
17/08/15 22:30:38 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:46353 (size: 1574.0 B, free: 517.4 MB)
17/08/15 22:30:38 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
17/08/15 22:30:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at SimpleApp.scala:11)
17/08/15 22:30:38 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0,NODE_LOCAL, 1965 bytes)
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, partition 1,NODE_LOCAL, 1965 bytes)
17/08/15 22:30:38 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 2)
17/08/15 22:30:38 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
17/08/15 22:30:38 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
17/08/15 22:30:38 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
17/08/15 22:30:38 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
17/08/15 22:30:38 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
(sb,2)
(sh,2)
(li,4)
(bi,1)
(ni,2)
(sha,1)
(,5)
(hao,3)
17/08/15 22:30:38 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 1165 bytes result sent to driver
17/08/15 22:30:38 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1165 bytes result sent to driver
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 97 ms on localhost (1/2)
17/08/15 22:30:38 INFO scheduler.DAGScheduler: ResultStage 1 (foreach at SimpleApp.scala:12) finished in 0.104 s
17/08/15 22:30:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 108 ms on localhost (2/2)
17/08/15 22:30:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/08/15 22:30:38 INFO scheduler.DAGScheduler: Job 0 finished: foreach at SimpleApp.scala:12, took 1.181813 s
17/08/15 22:30:38 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/08/15 22:30:38 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/08/15 22:30:38 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.183.100:4040
17/08/15 22:30:38 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/15 22:30:39 INFO storage.MemoryStore: MemoryStore cleared
17/08/15 22:30:39 INFO storage.BlockManager: BlockManager stopped
17/08/15 22:30:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/08/15 22:30:39 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/08/15 22:30:39 INFO spark.SparkContext: Successfully stopped SparkContext
17/08/15 22:30:39 INFO util.ShutdownHookManager: Shutdown hook called
17/08/15 22:30:39 INFO util.ShutdownHookManager: Deleting directory /usr/local/src/spark-1.6.0-bin-hadoop2.6/spark-9dedbb32-e6fa-45e6-8580-2014770cd644
17/08/15 22:30:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/08/15 22:30:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/08/15 22:30:39 INFO util.ShutdownHookManager: Deleting directory /usr/local/src/spark-1.6.0-bin-hadoop2.6/spark-9dedbb32-e6fa-45e6-8580-2014770cd644/httpd-d1a6373b-55e6-440b-b7ac-c6e6264af676
[root@master wordcount_2]# 
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值