Spark运行自己的代码
详见官网:http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
一、目录结构
目录结构一定要像这样子:
XX是我的工程目录,XX下面除了simple.sbt和SimpleApp.scala两个文件之外都是目录。
$ find .
.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala
二、SimpleApp.scala代码
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object SimpleApp {
defmain(args: Array[String]) {
val logFile = "/usr/local/spark/README.md" // Should be somefile on your system
val sc = new SparkContext("local", "Simple App","/usr/local/spark",
List("target/scala-2.10/simple-project_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line =>line.contains("a")).count()
val numBs = logData.filter(line =>line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs,numBs))
}
}
三、simple.sbt代码,注意空格
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.3"
libraryDependencies +="org.apache.spark" %% "spark-core" %"0.9.0-incubating"
resolvers += "Akka Repository" at"http://repo.akka.io/releases/"
四、编译
在XX目录下,输入sbt package,第一次编译的时候貌似要联网的,我没有联网搞了一下午就是没通过,一连网就下载了几个jar,就编译通过了。
五、运行
在XX目录下,输入 sbt run。