wordCount spark

8 篇文章 0 订阅
package com.ai.scala


import org.apache.spark.SparkConf
import org.apache.spark.SparkContext


object WordCount {
  def main(args: Array[String]): Unit = {
    println("wordCount");
    val conf=new SparkConf()
    conf.setAppName("First Spark App")
    conf.setMaster("local")
    val sc=new SparkContext(conf)
//val lines=sc.textFile("D:/360Downloads/scala/spark-1.4.0-bin-hadoop2.6/README.md", 1)
    val lines=sc.textFile("D:/360Downloads/scala/README.txt", 1)
    val wordCounts=lines.flatMap { line => line.split(" ") }.map { word => (word,1) }.reduceByKey(_+_)
    wordCounts.foreach(wordNumberPairs=>println(wordNumberPairs._1+" : "+wordNumberPairs._2))
    sc.stop()
  }
}


---------------console--------------
wordCount
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/06 12:53:53 INFO SparkContext: Running Spark version 1.4.0
16/07/06 12:53:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/06 12:53:54 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2162)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:301)
at com.ai.scala.WordCountCluster$.main(WordCountCluster.scala:12)
at com.ai.scala.WordCountCluster.main(WordCountCluster.scala)
16/07/06 12:53:54 INFO SecurityManager: Changing view acls to: Administrator
16/07/06 12:53:54 INFO SecurityManager: Changing modify acls to: Administrator
16/07/06 12:53:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
16/07/06 12:53:55 INFO Slf4jLogger: Slf4jLogger started
16/07/06 12:53:55 INFO Remoting: Starting remoting
16/07/06 12:53:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.3.120:50816]
16/07/06 12:53:55 INFO Utils: Successfully started service 'sparkDriver' on port 50816.
16/07/06 12:53:55 INFO SparkEnv: Registering MapOutputTracker
16/07/06 12:53:55 INFO SparkEnv: Registering BlockManagerMaster
16/07/06 12:53:55 INFO DiskBlockManager: Created local directory at C:\Users\Administrator.USER-20160227BV\AppData\Local\Temp\spark-6002d4f2-d9be-4334-b7cb-9b6cdcd8f0d6\blockmgr-d64d9d89-4422-4b08-bf9e-26d96fd0e6b4
16/07/06 12:53:55 INFO MemoryStore: MemoryStore started with capacity 962.0 MB
16/07/06 12:53:55 INFO HttpFileServer: HTTP File server directory is C:\Users\Administrator.USER-20160227BV\AppData\Local\Temp\spark-6002d4f2-d9be-4334-b7cb-9b6cdcd8f0d6\httpd-df84a38a-0be3-49a9-aae0-e03c94b422b8
16/07/06 12:53:55 INFO HttpServer: Starting HTTP Server
16/07/06 12:53:55 INFO Utils: Successfully started service 'HTTP file server' on port 50817.
16/07/06 12:53:55 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/06 12:53:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/06 12:53:55 INFO SparkUI: Started SparkUI at http://192.168.3.120:4040
16/07/06 12:53:55 INFO Executor: Starting executor ID driver on host localhost
16/07/06 12:53:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50836.
16/07/06 12:53:56 INFO NettyBlockTransferService: Server created on 50836
16/07/06 12:53:56 INFO BlockManagerMaster: Trying to register BlockManager
16/07/06 12:53:56 INFO BlockManagerMasterEndpoint: Registering block manager localhost:50836 with 962.0 MB RAM, BlockManagerId(driver, localhost, 50836)
16/07/06 12:53:56 INFO BlockManagerMaster: Registered BlockManager
16/07/06 12:53:57 INFO MemoryStore: ensureFreeSpace(130448) called with curMem=0, maxMem=1008740597
16/07/06 12:53:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 961.9 MB)
16/07/06 12:53:57 INFO MemoryStore: ensureFreeSpace(14257) called with curMem=130448, maxMem=1008740597
16/07/06 12:53:57 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 961.9 MB)
16/07/06 12:53:57 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:50836 (size: 13.9 KB, free: 962.0 MB)
16/07/06 12:53:57 INFO SparkContext: Created broadcast 0 from textFile at WordCountCluster.scala:14
16/07/06 12:53:57 INFO FileInputFormat: Total input paths to process : 1
16/07/06 12:53:57 INFO SparkContext: Starting job: foreach at WordCountCluster.scala:16
16/07/06 12:53:57 INFO DAGScheduler: Registering RDD 3 (map at WordCountCluster.scala:15)
16/07/06 12:53:57 INFO DAGScheduler: Got job 0 (foreach at WordCountCluster.scala:16) with 1 output partitions (allowLocal=false)
16/07/06 12:53:57 INFO DAGScheduler: Final stage: ResultStage 1(foreach at WordCountCluster.scala:16)
16/07/06 12:53:57 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
16/07/06 12:53:57 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
16/07/06 12:53:57 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountCluster.scala:15), which has no missing parents
16/07/06 12:53:57 INFO MemoryStore: ensureFreeSpace(4040) called with curMem=144705, maxMem=1008740597
16/07/06 12:53:57 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.9 KB, free 961.9 MB)
16/07/06 12:53:57 INFO MemoryStore: ensureFreeSpace(2310) called with curMem=148745, maxMem=1008740597
16/07/06 12:53:57 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 961.9 MB)
16/07/06 12:53:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:50836 (size: 2.3 KB, free: 962.0 MB)
16/07/06 12:53:57 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
16/07/06 12:53:57 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCountCluster.scala:15)
16/07/06 12:53:57 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/07/06 12:53:57 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1400 bytes)
16/07/06 12:53:57 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/07/06 12:53:57 INFO HadoopRDD: Input split: file:/D:/360Downloads/scala/README.txt:0+245
16/07/06 12:53:57 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/07/06 12:53:57 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/07/06 12:53:57 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/07/06 12:53:57 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/07/06 12:53:57 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/07/06 12:53:58 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2001 bytes result sent to driver
16/07/06 12:53:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 230 ms on localhost (1/1)
16/07/06 12:53:58 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/07/06 12:53:58 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCountCluster.scala:15) finished in 0.256 s
16/07/06 12:53:58 INFO DAGScheduler: looking for newly runnable stages
16/07/06 12:53:58 INFO DAGScheduler: running: Set()
16/07/06 12:53:58 INFO DAGScheduler: waiting: Set(ResultStage 1)
16/07/06 12:53:58 INFO DAGScheduler: failed: Set()
16/07/06 12:53:58 INFO DAGScheduler: Missing parents for ResultStage 1: List()
16/07/06 12:53:58 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountCluster.scala:15), which is now runnable
16/07/06 12:53:58 INFO MemoryStore: ensureFreeSpace(2192) called with curMem=151055, maxMem=1008740597
16/07/06 12:53:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 961.9 MB)
16/07/06 12:53:58 INFO MemoryStore: ensureFreeSpace(1373) called with curMem=153247, maxMem=1008740597
16/07/06 12:53:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1373.0 B, free 961.9 MB)
16/07/06 12:53:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:50836 (size: 1373.0 B, free: 962.0 MB)
16/07/06 12:53:58 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
16/07/06 12:53:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountCluster.scala:15)
16/07/06 12:53:58 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
16/07/06 12:53:58 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1165 bytes)
16/07/06 12:53:58 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
16/07/06 12:53:58 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
16/07/06 12:53:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
ScalaTest : 1
0.4.0 : 1
Worksheet : 1
2.10.6 : 1
(Luna) : 1
0.10.0 : 1
Refactoring : 1
full : 1
Framework : 1
Eclipse : 1
to : 1
0.3.0 : 1
ecosystem : 1
0.6.0 : 1
4.4.1 : 1
2.10.0 : 1
IDE : 2
2.11.8 : 1
Scala : 7
0.13.8 : 1
support : 2
Access : 1
and : 1
4.4.2 : 1
Play : 1
Search : 1
Sbt : 1
the : 1
16/07/06 12:53:58 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 886 bytes result sent to driver
16/07/06 12:53:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 53 ms on localhost (1/1)
16/07/06 12:53:58 INFO DAGScheduler: ResultStage 1 (foreach at WordCountCluster.scala:16) finished in 0.053 s
16/07/06 12:53:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/07/06 12:53:58 INFO DAGScheduler: Job 0 finished: foreach at WordCountCluster.scala:16, took 0.522300 s
16/07/06 12:53:58 INFO SparkUI: Stopped Spark web UI at http://192.168.3.120:4040
16/07/06 12:53:58 INFO DAGScheduler: Stopping DAGScheduler
16/07/06 12:53:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/06 12:53:58 INFO Utils: path = C:\Users\Administrator.USER-20160227BV\AppData\Local\Temp\spark-6002d4f2-d9be-4334-b7cb-9b6cdcd8f0d6\blockmgr-d64d9d89-4422-4b08-bf9e-26d96fd0e6b4, already present as root for deletion.
16/07/06 12:53:58 INFO MemoryStore: MemoryStore cleared
16/07/06 12:53:58 INFO BlockManager: BlockManager stopped
16/07/06 12:53:58 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/06 12:53:58 INFO SparkContext: Successfully stopped SparkContext
16/07/06 12:53:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/06 12:53:58 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/06 12:53:58 INFO Utils: Shutdown hook called
16/07/06 12:53:58 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/06 12:53:58 INFO Utils: Deleting directory C:\Users\Administrator.USER-20160227BV\AppData\Local\Temp\spark-6002d4f2-d9be-4334-b7cb-9b6cdcd8f0d6
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

5icode.top

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值