scala
linux 环境
spark 2.4.2
scala 2.12.8 (spark-scala版本通过运行spark-shell查看)
简单运行
目录结构
find .
.
./hw.scala
创建hw.scala文件
object Hi{
def main(args: Array[String]) = println("Hello world")
}
在当前目录下直接运行
sbt
然后sbt>中输入 run
即得到结果
构造项目
目录结构
$ find .
.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala
其中SimapleApp.scala的内容:
需要修改logFile的文件目录
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
sc.stop()
}
}
simple.sbt是配置文件:
注意要有空行
name := "Simple Project"
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.2"
然后再主目录下
sbt package
最后运行
spark-submit --class “SimpleApp” target/scala-2.12/simple-project_2.12-1.0.jar
其他:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2",
"mysql" % "mysql-connector-java" % "5.1.12"
)
Ref:
- https://www.jianshu.com/p/454cb5318372
- https://www.cnblogs.com/codingexperience/p/5372617.html