学习用sbt来构建并打包一个简单的单词统计的例程。
第一步 创建Scala版的单词统计应用
WordCount.scala
/**
* Illustrates flatMap + countByValue for wordcount.
*/
import org.apache.spark._
import org.apache.spark.SparkContext._
object WordCount {
def main(args: Array[String]) {
val inputFile = args(0)
val outputFile = args(1)
val conf = new SparkConf().setAppName("wordCount")
// Create a Scala Spark Context.
val sc = new SparkContext(conf)
// Load our input data.
val input = sc.textFile(inputFile)
// Split up into words.
val words = input.flatMap(line => line.split(" "))
// Transform into word and count.
val counts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}
// Save the word count back out to a text file, causing evaluation.
counts.saveAsTextFile(outputFile)
}
}
第二步 创建sbt构建应用配置
build.sbt
name := "learning-spark-mini-example"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0" % "provided"
第三步 项目的结构
./
./build.sbt
./src/
./src/main
./src/main/scala
./src/main/scala/WordCount.scala
第四步 Scala构建与运行
在项目文件目录的根目录处
sbt package
$SPARK_HOME$/bin/spark-submit \
--class "WordCount" \
--master local \
learning-spark-mini-example_2.11-1.0.jar \
../opt/modules/spark-2.2.1-bin-hadoop2.7/README.md \
./wordcounts
第五步 计数结果
[elon@hadoop scala]$ cd wordcounts/
[elon@hadoop wordcounts]$ ls
part-00000 _SUCCESS
[elon@hadoop wordcounts]$ cat part-00000
(package,1)
(For,3)
(Programs,1)
(processing.,1)
(Because,1)
(The,1)
(page](http://spark.apache.org/documentation.html).,1)
......
转载请注明出处:http://blog.csdn.net/coder__cs/article/details/78992764
本文出自【elon33的博客】