1.安装软件
jdk-8u92-windows-x64.exe
sbt-0.13.13.1.msi
scala-SDK-4.4.1-vfinal-2.11-win32.win32.x86_64.zip
2. 设置依赖库文件下载路径:
默认的sbt根目录为~/.sbt
默认的sbt的工作目录为~/.sbt/boot
默认的依赖库下载路径为 ~/.ivy2
我们可以设置自己的路径,这样重装系统就不用重新下载依赖,或者放入U盘中,可以多台电脑共用:
修改sbt配置文件:[sbt安装目录]\conf\sbtconfig.txt ,在文件中添加:-Dsbt.global.base=D:/sbt
-Dsbt.boot.directory=D:/sbt/boot/
-Dsbt.ivy.home=D:/sbt/ivy/
3. sbt运行时经常需要下载大量的jar包,默认连接到maven官网,速度通常比较慢。
在sbt根目录( ~/.sbt 或者上面修改后的 D:/sbt )下添加一个`repositories`文件,里面内容如下:
[repositories]
local
repox-maven: http://repox.gtan.com:8078/
repox-ivy: http://repox.gtan.com:8078/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
aliyun: http://maven.aliyun.com/nexus/content/groups/public/
typesafe:http://dl.bintray.com/typesafe/ivy-releases/ , [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
ivy-sbt-plugin:http://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
sonatype-oss-releases
maven-central
4. 创建scala项目的目录结构:
├── src
│ ├── main
│ │ ├── java
│ │ ├── resources
│ │ └── scala
├── build.sbt
├── project
│ ├── build.properties
│ ├── plugins.sbt
SBT使用的目录结构和MAVEN类似,在src/main/scala下编写scala代码,在src/main/resources下编写配置文件。
在path_to_project/build.sbt里面定义库依赖:
lazy val root = (project in file(".")).
settings(
name := "My Project",
version := "1.0" ,
scalaVersion := "2.11.8",
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.17",
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0",
resolvers += "Akka 2.4.17 Repository" at "http://repo.akka.io/2.4.17/"
)
当需要添加一个新的依赖库的时候,通过Maven Central Repository Search来查找很便捷。
在 path_to_project/project/plugins.sbt中添加:
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.0.1")
7.配置Spark运行环境
从 https://github.com/steveloughran/winutils 下载 winutils.exe .
将文件保存到一个目录 , 例如 c:\hadoop\bin .
设置环境变量 HADOOP_HOME , 路径为 winutils.exe 所在目录的父目录 , 例如上面的 c:\hadoop 目录.
将 winutils.exe 所在的目录 c:\hadoop\bin 加入系统PATH 环境变量中.
创建 c:\tmp\hive 目录
打开命令行执行 winutils.exe chmod -R 777 \tmp\hive , 给目录权限.
使用命令 winutils.exe ls \tmp\hive 查看是否设置权限成功.
打开系统cmd , 进入 path_to_project目录。
输入sbt , 打开 sbt console
在sbt console里面执行update命令,下载相应的库到local library repository
在sbt console里面执行eclipse命令,创建eclipse工程 / 更新classpath设置。
9.编写SparkWordCount例子:
使用Scala-Ide 打开工程.
在src/main/scala目录下新建 sparkWordCount.scala:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object sparkWordCount {
def main(args: Array[String]){
val conf = new SparkConf().setAppName("myapp").setMaster("local[2]")
val sc = new SparkContext(conf)
val lines = sc.parallelize( List("Hello World" , "Hello") )
lines.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
sc.stop()
}
}
该例子使用 setMaster("local[2]") 来使得该例子可以在eclipse 本地运行 , 方便本地调试.
鼠标选中sparkWordCount.scala , Run As -> Scala Application , 在eclipse中运行工程,查看Console中输出的结果.
10. 编写 AkkaWordCount例子:
使用Scala-Ide 打开工程.
在src/main/scala目录下新建 akkaWordCount.scala:
import scala.language.postfixOps
import akka.actor._
import akka.routing._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.IndexedSeq
import scala.collection.mutable.HashMap
import akka.util.Timeout
import scala.concurrent.duration._
import scala.concurrent.Await
import akka.pattern.ask
//define message
sealed trait MapReduceMessage;
case class WordCount(word:String , count:Int) extends MapReduceMessage;
case class MapData(dataList : ArrayBuffer[WordCount] ) extends MapReduceMessage
case class ReduceData(reduceDataMap : Map[String,Int] ) extends MapReduceMessage
case class Result() extends MapReduceMessage
//mapActor
class MapActor extends Actor{
def receive : Receive = {
case message :String =>
sender ! evaluateExpression(message)
}
val STOP_WORDS_LIST = List("a", "am", "an", "and", "are", "as", "at",
"be","do", "go", "if", "in", "is", "it", "of", "on", "the", "to")
def evaluateExpression(line : String) : MapData = MapData {
line.split("""\s+""").foldLeft(ArrayBuffer.empty[WordCount]){
(index , word)=>
if (!STOP_WORDS_LIST.contains(word.toLowerCase))
index += WordCount(word.toLowerCase , 1)
else
index
}
}
}
//reduceActor
class ReduceActor extends Actor{
def receive : Receive = {
case MapData(dataList) =>
sender ! reduce(dataList)
}
def reduce(words : IndexedSeq[WordCount] ) : ReduceData = ReduceData{
words.foldLeft(Map.empty[String,Int]){
(index , words) =>
if (index contains words.word)
index + ( words.word -> ( index.get(words.word).get + 1 ) )
else
index + (words.word -> 1)
}
}
}
//AggregateActor
class AggregateActor extends Actor{
val finalReduceMap = new HashMap[String , Int]
def receive : Receive = {
case ReduceData(reduceDataMap) =>
aggregateInMemoryReduce(reduceDataMap)
case Result =>
sender ! finalReduceMap.toString()
}
def aggregateInMemoryReduce(reduceList:Map[String,Int]) : Unit = {
for( (key,value) <- reduceList){
if (finalReduceMap contains key)
finalReduceMap(key) = (value + finalReduceMap.get(key).get)
else
finalReduceMap += (key -> value)
}
}
}
//MasterActor
class MasterActor extends Actor{
val mapActor = context.actorOf(Props(new MapActor)) , name = "map" )
val reduceActor:ActorRef = context.actorOf(Props(new ReduceActor),name="reduce")
val aggregateActor = context.actorOf(Props(new AggregateActor),name = "aggregate")
def receive : Receive = {
case line : String => mapActor ! line
case mapData : MapData => reduceActor ! mapData
case reduceData : ReduceData => aggregateActor ! reduceData
case Result => aggregateActor forward Result
}
}
object akkaWordCount {
def main(args:Array[String]){
val _system = ActorSystem("MapReduceApp")
val master = _system.actorOf(Props(new MasterActor) , name = "master")
implicit val timeout = Timeout(5 seconds)
master ! "The quick brown fox tried to jump over the lazy dog and fell on the dog"
master ! "Dog is man's best friend"
master ! "Dog and Fox belong to the same family"
Thread.sleep(500)
val future = (master ? Result).mapTo[String]
val result = Await.result(future, timeout.duration)
println(result)
_system.shutdown
}
}
鼠标选中 akkaWordCount.scala , Run As -> Scala Application , 在eclipse中运行工程,查看Console中输出的结果.