spark wordcount

1、这是idea下+spark包的源码

package main.scala

import org.apache.spark.{SparkConf,SparkContext}
/**
  * Created by root on 1/12/17.
  */
object WordCount {

  val conf =new SparkConf()
  val sc = new SparkContext(conf)
  val line=sc.textFile("hdfs://localhost:9000/user/root/input/data3.txt", 2)
  val counts = line.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
  counts.saveAsTextFile("hdfs://localhost:9000/user/root/output2")

}

build后,打包的jar,运行:
accumulate wordcount_jar # jps
5317 Worker
4574 SecondaryNameNode
7830 Main
7992 NailgunRunner
5389 SparkSubmit
5103 
4668 JobTracker
4802 TaskTracker
4445 DataNode
5237 Master
8537 Jps
4321 NameNode
accumulate wordcount_jar # ls
wordcount.jar
accumulate wordcount_jar # spark-submit --class "WordCount" --master local[4] wordcount.jar
java.lang.ClassNotFoundException: WordCount
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
accumulate wordcount_jar # spark-submit --class "main.scala.WordCount" --master local[4] wordcount.jar
Exception in thread "main" java.lang.NoSuchMethodException: main.scala.WordCount.main([Ljava.lang.String;)
        at java.lang.Class.getMethod(Class.java:1786)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:716)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
accumulate wordcount_jar # 

明明在spark-shell
  val line=sc.textFile("hdfs://localhost:9000/user/root/input/data3.txt", 2)
  val counts = line.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
  counts.saveAsTextFile("hdfs://localhost:9000/user/root/output2")
这段是没问题的,到底是那里出错?

2、
我以前试过一个版本,接下来试一下::

package main.scala

import org.apache.spark.{SparkConf,SparkContext}
/**
  * Created by root on 1/12/17.
  */
object WordCount {

  def main(args: Array[String]): Unit = {
    if (args.length < 2) {
      System.err.println("Usage:<file>")
      System.exit(1)
    }

    val conf = new SparkConf()
    val sc = new SparkContext(conf)
    val line = sc.textFile(args(0))
    line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).saveAsTextFile(args(1))
    line.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_ + _).collect().foreach(println)
    sc.stop()
  }
}
build 打包

accumulate wordcount_jar # ls
wordcount.jar
accumulate wordcount_jar # spark-submit --class main.scala.WordCount --master local[2] woordcount.jar hdfs://localhost:9000/user/root/input/data3.txt hdfs://localhost: 9000/user/root/output3
Warning: Local jar /home/hongxin/Desktop/wordcount/out/artifacts/wordcount_jar/woordcount.jar does not exist, skipping.
java.lang.ClassNotFoundException: main.scala.WordCount
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
accumulate wordcount_jar # jps
4574 SecondaryNameNode
5317 Worker
9013 Jps
7830 Main
4802 TaskTracker
4445 DataNode
8932 Launcher
4321 NameNode
7992 NailgunRunner
5389 SparkSubmit
5103
4668 JobTracker
5237 Master
accumulate wordcount_jar # java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) Server VM (build 24.45-b08, mixed mode)
accumulate wordcount_jar # spark-submit --class main.scala.WordCount --master local[2] wordcount.jar hdfs://localhost:9000/user/root/input/data3.txt hdfs://localhost: 9000/user/root/output3
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/12 17:29:44 INFO SparkContext: Running Spark version 1.6.1
17/01/12 17:29:45 WARN Utils: Your hostname, accumulate resolves to a loopback address: 127.0.0.1; using 192.168.1.6 instead (on interface wlan0)
17/01/12 17:29:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/01/12 17:29:45 INFO SecurityManager: Changing view acls to: root
17/01/12 17:29:45 INFO SecurityManager: Changing modify acls to: root
17/01/12 17:29:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/01/12 17:29:46 INFO Utils: Successfully started service 'sparkDriver' on port 41923.
17/01/12 17:29:47 INFO Slf4jLogger: Slf4jLogger started
17/01/12 17:29:47 INFO Remoting: Starting remoting
17/01/12 17:29:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.6:33579]
17/01/12 17:29:47 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 33579.
17/01/12 17:29:48 INFO SparkEnv: Registering MapOutputTracker
17/01/12 17:29:48 INFO SparkEnv: Registering BlockManagerMaster
17/01/12 17:29:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-105b0cb2-7774-467b-b46d-af21ce760d0a
17/01/12 17:29:48 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/01/12 17:29:48 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/12 17:29:48 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 17:29:49 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/01/12 17:29:49 INFO SparkUI: Started SparkUI at http://192.168.1.6:4041
17/01/12 17:29:49 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3cf867f2-68d6-4826-9ed4-94ff9c0d3e76/httpd-ab14640d-3b7a-4f42-9238-61f8c26fa0b2
17/01/12 17:29:49 INFO HttpServer: Starting HTTP Server
17/01/12 17:29:49 INFO Utils: Successfully started service 'HTTP file server' on port 44229.
17/01/12 17:29:56 INFO SparkContext: Added JAR file:/home/hongxin/Desktop/wordcount/out/artifacts/wordcount_jar/wordcount.jar at http://192.168.1.6:44229/jars/wordcount.jar with timestamp 1484213396693
17/01/12 17:29:56 INFO Executor: Starting executor ID driver on host localhost
17/01/12 17:29:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35804.
17/01/12 17:29:56 INFO NettyBlockTransferService: Server created on 35804
17/01/12 17:29:56 INFO BlockManagerMaster: Trying to register BlockManager
17/01/12 17:29:56 INFO BlockManagerMasterEndpoint: Registering block manager localhost:35804 with 511.1 MB RAM, BlockManagerId(driver, localhost, 35804)
17/01/12 17:29:56 INFO BlockManagerMaster: Registered BlockManager
17/01/12 17:29:59 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
17/01/12 17:29:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 38.8 KB, free 38.8 KB)
17/01/12 17:29:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.2 KB, free 42.9 KB)
17/01/12 17:29:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:35804 (size: 4.2 KB, free: 511.1 MB)
17/01/12 17:29:59 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:17
17/01/12 17:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/12 17:30:00 WARN LoadSnappy: Snappy native library not loaded
17/01/12 17:30:01 INFO FileInputFormat: Total input paths to process : 1
17/01/12 17:30:05 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:06 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:07 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:08 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:09 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:10 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:11 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:12 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)                                                                                             
17/01/12 17:30:13 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:30:14 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Exception in thread "main" java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
        at org.apache.hadoop.ipc.Client.call(Client.java:1118)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy13.getProtocolVersion(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
        at com.sun.proxy.$Proxy13.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
        at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
        at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:170)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1059)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1457)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1436)
        at main.scala.WordCount$.main(WordCount.scala:18)
        at main.scala.WordCount.main(WordCount.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
        at org.apache.hadoop.ipc.Client.call(Client.java:1093)
        ... 52 more
17/01/12 17:30:14 INFO SparkContext: Invoking stop() from shutdown hook
17/01/12 17:30:14 INFO SparkUI: Stopped Spark web UI at http://192.168.1.6:4041
17/01/12 17:30:14 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/01/12 17:30:14 INFO MemoryStore: MemoryStore cleared
17/01/12 17:30:14 INFO BlockManager: BlockManager stopped
17/01/12 17:30:14 INFO BlockManagerMaster: BlockManagerMaster stopped
17/01/12 17:30:14 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/01/12 17:30:14 INFO SparkContext: Successfully stopped SparkContext
17/01/12 17:30:14 INFO ShutdownHookManager: Shutdown hook called
17/01/12 17:30:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-3cf867f2-68d6-4826-9ed4-94ff9c0d3e76
17/01/12 17:30:14 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/01/12 17:30:14 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/01/12 17:30:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-3cf867f2-68d6-4826-9ed4-94ff9c0d3e76/httpd-ab14640d-3b7a-4f42-9238-61f8c26fa0b2
accumulate wordcount_jar #

原来是找不到,是名字有错,或者路径不对,总之,是找不到的意思了
3\与hadoop的hdfs连接出问题了!!
accumulate wordcount_jar # spark-submit --class main.scala.WordCount --master spark://accumulate:7077 wordcount.jar hdfs://localhost:9000/user/root/input/data3.txt hdfs://localhost: 9000/user/root/output3                       
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/12 17:38:06 INFO SparkContext: Running Spark version 1.6.1
17/01/12 17:38:06 WARN Utils: Your hostname, accumulate resolves to a loopback address: 127.0.0.1; using 192.168.1.6 instead (on interface wlan0)
17/01/12 17:38:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/01/12 17:38:06 INFO SecurityManager: Changing view acls to: root
17/01/12 17:38:06 INFO SecurityManager: Changing modify acls to: root
17/01/12 17:38:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/01/12 17:38:07 INFO Utils: Successfully started service 'sparkDriver' on port 34379.
17/01/12 17:38:07 INFO Slf4jLogger: Slf4jLogger started
17/01/12 17:38:07 INFO Remoting: Starting remoting
17/01/12 17:38:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.6:43385]
17/01/12 17:38:08 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43385.
17/01/12 17:38:08 INFO SparkEnv: Registering MapOutputTracker
17/01/12 17:38:08 INFO SparkEnv: Registering BlockManagerMaster
17/01/12 17:38:08 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-3f3e80f6-1c7b-4187-9a08-fb9a87a92876
17/01/12 17:38:08 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/01/12 17:38:08 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/12 17:38:10 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 17:38:10 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/01/12 17:38:10 INFO SparkUI: Started SparkUI at http://192.168.1.6:4041
17/01/12 17:38:10 INFO HttpFileServer: HTTP File server directory is /tmp/spark-e27ffd0c-ac15-4020-ac2a-919544085774/httpd-4dcfa4c9-bb5e-431f-95d3-c55a622af325
17/01/12 17:38:10 INFO HttpServer: Starting HTTP Server
17/01/12 17:38:10 INFO Utils: Successfully started service 'HTTP file server' on port 49924.
17/01/12 17:38:21 INFO SparkContext: Added JAR file:/home/hongxin/Desktop/wordcount/out/artifacts/wordcount_jar/wordcount.jar at http://192.168.1.6:49924/jars/wordcount.jar with timestamp 1484213900095
17/01/12 17:38:26 INFO AppClient$ClientEndpoint: Connecting to master spark://accumulate:7077...
17/01/12 17:38:32 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170112173831-0002
17/01/12 17:38:32 INFO AppClient$ClientEndpoint: Executor added: app-20170112173831-0002/0 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 17:38:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112173831-0002/0 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 17:38:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39099.
17/01/12 17:38:32 INFO NettyBlockTransferService: Server created on 39099
17/01/12 17:38:32 INFO BlockManagerMaster: Trying to register BlockManager
17/01/12 17:38:32 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.6:39099 with 511.1 MB RAM, BlockManagerId(driver, 192.168.1.6, 39099)
17/01/12 17:38:32 INFO BlockManagerMaster: Registered BlockManager
17/01/12 17:38:32 INFO AppClient$ClientEndpoint: Executor updated: app-20170112173831-0002/0 is now RUNNING
17/01/12 17:38:33 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/01/12 17:38:37 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
17/01/12 17:38:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 38.8 KB, free 38.8 KB)
17/01/12 17:38:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.2 KB, free 42.9 KB)
17/01/12 17:38:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.6:39099 (size: 4.2 KB, free: 511.1 MB)
17/01/12 17:38:37 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:17
17/01/12 17:38:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/12 17:38:38 WARN LoadSnappy: Snappy native library not loaded
17/01/12 17:38:39 INFO FileInputFormat: Total input paths to process : 1
17/01/12 17:38:40 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:41 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:42 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:43 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:44 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:45 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:46 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:47 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:48 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
17/01/12 17:38:49 INFO Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Exception in thread "main" java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
        at org.apache.hadoop.ipc.Client.call(Client.java:1118)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy13.getProtocolVersion(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
        at com.sun.proxy.$Proxy13.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
        at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
        at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:170)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1059)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1457)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
        at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1436)
        at main.scala.WordCount$.main(WordCount.scala:18)
        at main.scala.WordCount.main(WordCount.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
        at org.apache.hadoop.ipc.Client.call(Client.java:1093)
        ... 52 more
17/01/12 17:38:49 INFO SparkContext: Invoking stop() from shutdown hook
17/01/12 17:38:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.6:4041
17/01/12 17:38:49 INFO SparkDeploySchedulerBackend: Shutting down all executors
17/01/12 17:38:49 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
17/01/12 17:38:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/01/12 17:38:49 INFO MemoryStore: MemoryStore cleared
17/01/12 17:38:49 INFO BlockManager: BlockManager stopped
17/01/12 17:38:50 INFO BlockManagerMaster: BlockManagerMaster stopped
17/01/12 17:38:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/01/12 17:38:50 INFO SparkContext: Successfully stopped SparkContext
17/01/12 17:38:50 INFO ShutdownHookManager: Shutdown hook called
17/01/12 17:38:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-e27ffd0c-ac15-4020-ac2a-919544085774
17/01/12 17:38:50 INFO ShutdownHookManager: Deleting directory /tmp/spark-e27ffd0c-ac15-4020-ac2a-919544085774/httpd-4dcfa4c9-bb5e-431f-95d3-c55a622af325



4、编译hadoop natives 或者用本地路径

 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

spark-submit --class main.scala.WordCount --master spark://accumulate:7077 wordcount.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/12 20:47:16 INFO SparkContext: Running Spark version 1.6.1
17/01/12 20:47:16 WARN Utils: Your hostname, accumulate resolves to a loopback address: 127.0.0.1; using 192.168.1.6 instead (on interface wlan0)
17/01/12 20:47:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/01/12 20:47:16 INFO SecurityManager: Changing view acls to: root
17/01/12 20:47:16 INFO SecurityManager: Changing modify acls to: root
17/01/12 20:47:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/01/12 20:47:16 INFO Utils: Successfully started service 'sparkDriver' on port 41774.
17/01/12 20:47:17 INFO Slf4jLogger: Slf4jLogger started
17/01/12 20:47:17 INFO Remoting: Starting remoting
17/01/12 20:47:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.6:46940]
17/01/12 20:47:17 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 46940.
17/01/12 20:47:17 INFO SparkEnv: Registering MapOutputTracker
17/01/12 20:47:17 INFO SparkEnv: Registering BlockManagerMaster
17/01/12 20:47:17 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-8be7029a-f241-41af-8666-56c4aa2e3f87
17/01/12 20:47:17 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/01/12 20:47:17 INFO SparkEnv: Registering OutputCommitCoordinator
17/01/12 20:47:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 20:47:18 INFO Utils: Successfully started service 'SparkUI' on port 4041.
17/01/12 20:47:18 INFO SparkUI: Started SparkUI at http://192.168.1.6:4041
17/01/12 20:47:18 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ea4f6824-36c3-4c92-bb26-5f5dfddad4bc/httpd-973efbc1-48bc-46b5-a3f5-14ad878e29ec
17/01/12 20:47:18 INFO HttpServer: Starting HTTP Server
17/01/12 20:47:18 INFO Utils: Successfully started service 'HTTP file server' on port 49399.
17/01/12 20:47:22 INFO SparkContext: Added JAR file:/home/hongxin/Desktop/wordcount/out/artifacts/wordcount_jar/wordcount.jar at http://192.168.1.6:49399/jars/wordcount.jar with timestamp 1484225242671
17/01/12 20:47:23 INFO AppClient$ClientEndpoint: Connecting to master spark://accumulate:7077...
17/01/12 20:47:23 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170112204723-0007
17/01/12 20:47:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59777.
17/01/12 20:47:23 INFO NettyBlockTransferService: Server created on 59777
17/01/12 20:47:23 INFO BlockManagerMaster: Trying to register BlockManager
17/01/12 20:47:23 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.6:59777 with 511.1 MB RAM, BlockManagerId(driver, 192.168.1.6, 59777)
17/01/12 20:47:23 INFO BlockManagerMaster: Registered BlockManager
17/01/12 20:47:23 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/0 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:23 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/0 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:23 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/0 is now RUNNING
17/01/12 20:47:24 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/01/12 20:47:25 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
17/01/12 20:47:25 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 38.8 KB, free 38.8 KB)
17/01/12 20:47:25 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.2 KB, free 42.9 KB)
17/01/12 20:47:25 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.6:59777 (size: 4.2 KB, free: 511.1 MB)
17/01/12 20:47:25 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:14
17/01/12 20:47:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/12 20:47:26 WARN LoadSnappy: Snappy native library not loaded
17/01/12 20:47:26 INFO FileInputFormat: Total input paths to process : 1
17/01/12 20:47:27 INFO SparkContext: Starting job: saveAsTextFile at WordCount.scala:16
17/01/12 20:47:27 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:15)
17/01/12 20:47:27 INFO DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:16) with 2 output partitions
17/01/12 20:47:27 INFO DAGScheduler: Final stage: ResultStage 1 (saveAsTextFile at WordCount.scala:16)
17/01/12 20:47:27 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/01/12 20:47:27 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/01/12 20:47:27 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:15), which has no missing parents
17/01/12 20:47:27 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 47.0 KB)
17/01/12 20:47:27 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 49.3 KB)
17/01/12 20:47:27 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.6:59777 (size: 2.3 KB, free: 511.1 MB)
17/01/12 20:47:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/01/12 20:47:27 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:15)
17/01/12 20:47:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/01/12 20:47:31 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/0 is now EXITED (Command exited with code 1)
17/01/12 20:47:31 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/0 removed: Command exited with code 1
17/01/12 20:47:31 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
17/01/12 20:47:32 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/1 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:32 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/1 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:32 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/1 is now RUNNING
17/01/12 20:47:38 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/1 is now EXITED (Command exited with code 1)
17/01/12 20:47:38 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/1 removed: Command exited with code 1
17/01/12 20:47:38 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
17/01/12 20:47:38 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/2 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:38 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/2 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:38 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/2 is now RUNNING
17/01/12 20:47:42 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/2 is now EXITED (Command exited with code 1)
17/01/12 20:47:42 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/2 removed: Command exited with code 1
17/01/12 20:47:42 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
17/01/12 20:47:42 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/3 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:42 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/3 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:43 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/3 is now RUNNING
17/01/12 20:47:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/01/12 20:47:47 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/3 is now EXITED (Command exited with code 1)
17/01/12 20:47:47 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/3 removed: Command exited with code 1
17/01/12 20:47:47 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
17/01/12 20:47:47 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/4 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:47 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/4 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:47 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/4 is now RUNNING
17/01/12 20:47:49 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/4 is now EXITED (Command exited with code 1)
17/01/12 20:47:49 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/4 removed: Command exited with code 1
17/01/12 20:47:49 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
17/01/12 20:47:49 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/5 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:49 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/5 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:49 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/5 is now RUNNING
17/01/12 20:47:53 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/5 is now EXITED (Command exited with code 1)
17/01/12 20:47:53 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/5 removed: Command exited with code 1
17/01/12 20:47:53 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
17/01/12 20:47:54 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/6 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:47:54 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/6 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:47:54 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/6 is now RUNNING
17/01/12 20:47:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/01/12 20:48:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
^C^C17/01/12 20:48:15 INFO SparkContext: Invoking stop() from shutdown hook
17/01/12 20:48:15 INFO AppClient$ClientEndpoint: Executor updated: app-20170112204723-0007/6 is now EXITED (Command exited with code 1)
17/01/12 20:48:15 INFO SparkDeploySchedulerBackend: Executor app-20170112204723-0007/6 removed: Command exited with code 1
17/01/12 20:48:15 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
17/01/12 20:48:15 INFO SparkUI: Stopped Spark web UI at http://192.168.1.6:4041
17/01/12 20:48:16 INFO AppClient$ClientEndpoint: Executor added: app-20170112204723-0007/7 on worker-20170112093721-10.97.230.34-33502 (10.97.230.34:33502) with 4 cores
17/01/12 20:48:16 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170112204723-0007/7 on hostPort 10.97.230.34:33502 with 4 cores, 1024.0 MB RAM
17/01/12 20:48:16 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:15) failed in 48.568 s
17/01/12 20:48:16 INFO DAGScheduler: Job 0 failed: saveAsTextFile at WordCount.scala:16, took 48.977962 s
Exception in thread "main" org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down



  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
下面是一个简单的Spark WordCount程序: ```python from pyspark import SparkConf, SparkContext conf = SparkConf().setAppName("WordCount") sc = SparkContext(conf=conf) text_file = sc.textFile("hdfs://localhost:9000/input/sample.txt") words = text_file.flatMap(lambda line: line.split(" ")) word_counts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) word_counts.saveAsTextFile("hdfs://localhost:9000/output/wordcount") ``` 程序的功能是统计一个文本文件中每个单词出现的次数,并将结果保存到HDFS上。 下面是代码的详细解释: 首先,我们导入了`SparkConf`和`SparkContext`。这两个类是Spark的核心类,`SparkConf`用于配置Spark应用程序的属性,`SparkContext`用于连接Spark集群。 ```python from pyspark import SparkConf, SparkContext ``` 接下来,我们创建了一个`SparkConf`实例,并给应用程序命名为`WordCount`。我们还创建了一个`SparkContext`实例,并将`SparkConf`传递给它。这些代码将初始化Spark应用程序并连接到Spark集群。 ```python conf = SparkConf().setAppName("WordCount") sc = SparkContext(conf=conf) ``` 然后,我们使用`textFile()`方法从HDFS中读取输入文件,并创建一个RDD(弹性分布式数据集)。 ```python text_file = sc.textFile("hdfs://localhost:9000/input/sample.txt") ``` 接下来,我们使用`flatMap()`方法将每行文本拆分成单词,并创建一个新的RDD。 ```python words = text_file.flatMap(lambda line: line.split(" ")) ``` 然后,我们使用`map()`方法将每个单词转换为一个`(单词, 1)`的键值对,并创建一个新的RDD。 ```python word_counts = words.map(lambda word: (word, 1)) ``` 接下来,我们使用`reduceByKey()`方法对每个单词的计数进行聚合,并创建一个新的RDD。 ```python word_counts = word_counts.reduceByKey(lambda a, b: a + b) ``` 最后,我们使用`saveAsTextFile()`方法将结果保存到HDFS上,并指定输出目录。 ```python word_counts.saveAsTextFile("hdfs://localhost:9000/output/wordcount") ``` 这就是完整的Spark WordCount程序。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

星之擎

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值