sbt 配置 akka , spark , 创建eclipse工程

最新推荐文章于 2021-06-02 23:45:35 发布

yuliying

最新推荐文章于 2021-06-02 23:45:35 发布

阅读量2.1k

点赞数

分类专栏： scala/akka/spark

本文链接：https://blog.csdn.net/yuliying/article/details/53230078

版权

scala/akka/spark 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

1.安装软件

jdk-8u92-windows-x64.exe

sbt-0.13.13.1.msi

scala-SDK-4.4.1-vfinal-2.11-win32.win32.x86_64.zip

2. 设置依赖库文件下载路径:

默认的sbt根目录为~/.sbt

默认的sbt的工作目录为~/.sbt/boot

默认的依赖库下载路径为 ~/.ivy2

我们可以设置自己的路径,这样重装系统就不用重新下载依赖,或者放入U盘中,可以多台电脑共用:

修改sbt配置文件：[sbt安装目录]\conf\sbtconfig.txt ,在文件中添加：

-Dsbt.global.base=D:/sbt
-Dsbt.boot.directory=D:/sbt/boot/
-Dsbt.ivy.home=D:/sbt/ivy/

3. sbt运行时经常需要下载大量的jar包，默认连接到maven官网，速度通常比较慢。

在sbt根目录( ~/.sbt 或者上面修改后的 D:/sbt )下添加一个`repositories`文件，里面内容如下：

[repositories]
local
repox-maven: http://repox.gtan.com:8078/
repox-ivy: http://repox.gtan.com:8078/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
aliyun: http://maven.aliyun.com/nexus/content/groups/public/
typesafe:http://dl.bintray.com/typesafe/ivy-releases/ , [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
ivy-sbt-plugin:http://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
sonatype-oss-releases
maven-central

4. 创建scala项目的目录结构:

├── src
│　 ├── main
│　 │　 ├── java
│　 │　 ├── resources
│　 │　 └── scala
├── build.sbt
├── project
│　 ├── build.properties
│　 ├── plugins.sbt

SBT使用的目录结构和MAVEN类似，在src/main/scala下编写scala代码，在src/main/resources下编写配置文件。

5. 配置工程的scala版本和akka / spark 依赖：

在path_to_project/build.sbt里面定义库依赖:

lazy val root = (project in file(".")).  
  settings(  
    name := "My Project",  
    version := "1.0" ,  
    scalaVersion := "2.11.8",  
    libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.17",  
    libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0",
    resolvers += "Akka 2.4.17 Repository" at "http://repo.akka.io/2.4.17/"  
  )

当需要添加一个新的依赖库的时候，通过Maven Central Repository Search来查找很便捷。

6. 给工程添加sbteclipse插件:

在 path_to_project/project/plugins.sbt中添加:

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.0.1")

7.配置Spark运行环境

从 https://github.com/steveloughran/winutils 下载 winutils.exe .
将文件保存到一个目录 , 例如 c:\hadoop\bin .
设置环境变量 HADOOP_HOME , 路径为 winutils.exe 所在目录的父目录，例如上面的 c:\hadoop 目录.
将 winutils.exe 所在的目录 c:\hadoop\bin 加入系统PATH 环境变量中.
创建 c:\tmp\hive 目录
打开命令行执行 winutils.exe chmod -R 777 \tmp\hive , 给目录权限.
使用命令 winutils.exe ls \tmp\hive 查看是否设置权限成功.

8. 创建eclipse工程:

打开系统cmd , 进入 path_to_project目录。
输入sbt ，打开 sbt console
在sbt console里面执行update命令，下载相应的库到local library repository
在sbt console里面执行eclipse命令，创建eclipse工程 / 更新classpath设置。

9.编写SparkWordCount例子:

使用Scala-Ide 打开工程.

在src/main/scala目录下新建 sparkWordCount.scala:

import org.apache.spark.SparkConf  
import org.apache.spark.SparkContext  
import org.apache.spark.SparkContext._  
  
object sparkWordCount {  
  def main(args: Array[String]){    
  
    val conf = new SparkConf().setAppName("myapp").setMaster("local[2]")  
    val sc = new SparkContext(conf)  
    val lines = sc.parallelize( List("Hello World" , "Hello")  )
    lines.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)  
    sc.stop()  
  }    
}

该例子使用 setMaster("local[2]") 来使得该例子可以在eclipse 本地运行 , 方便本地调试.

鼠标选中sparkWordCount.scala , Run As -> Scala Application , 在eclipse中运行工程,查看Console中输出的结果.

10. 编写 AkkaWordCount例子:

使用Scala-Ide 打开工程.

在src/main/scala目录下新建 akkaWordCount.scala:

import scala.language.postfixOps
import akka.actor._
import akka.routing._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.IndexedSeq
import scala.collection.mutable.HashMap
import akka.util.Timeout
import scala.concurrent.duration._
import scala.concurrent.Await
import akka.pattern.ask

//define message
sealed trait MapReduceMessage;
case class WordCount(word:String , count:Int) extends MapReduceMessage;
case class MapData(dataList : ArrayBuffer[WordCount] ) extends MapReduceMessage
case class ReduceData(reduceDataMap : Map[String,Int] ) extends MapReduceMessage
case class Result() extends MapReduceMessage

//mapActor
class MapActor extends Actor{
  
  def receive : Receive = {
    case message :String => 
       sender ! evaluateExpression(message)
  }
  
  val STOP_WORDS_LIST = List("a", "am", "an", "and", "are", "as", "at",
"be","do", "go", "if", "in", "is", "it", "of", "on", "the", "to")
  
  def evaluateExpression(line : String) : MapData = MapData {
      line.split("""\s+""").foldLeft(ArrayBuffer.empty[WordCount]){
        (index , word)=>
            if (!STOP_WORDS_LIST.contains(word.toLowerCase))
              index += WordCount(word.toLowerCase , 1)
            else
              index
      }
  }
}

//reduceActor 
class ReduceActor extends Actor{
  
  def receive : Receive = {
    case MapData(dataList) => 
      sender ! reduce(dataList)
  }
  
  def reduce(words : IndexedSeq[WordCount] ) : ReduceData = ReduceData{
    words.foldLeft(Map.empty[String,Int]){
      (index , words) => 
        if (index contains words.word)
          index + ( words.word -> ( index.get(words.word).get + 1 ) )
        else
          index + (words.word -> 1)
    } 
  }
}

//AggregateActor
class AggregateActor extends Actor{
  val finalReduceMap = new HashMap[String , Int]
  
  def receive : Receive = {
    case ReduceData(reduceDataMap) =>
      aggregateInMemoryReduce(reduceDataMap)
    case Result =>
      sender ! finalReduceMap.toString()
  }
  
  def aggregateInMemoryReduce(reduceList:Map[String,Int]) : Unit = {
    
    for( (key,value) <- reduceList){
      if (finalReduceMap contains key)
        finalReduceMap(key) = (value + finalReduceMap.get(key).get)
      else
        finalReduceMap += (key -> value)
    }
  }
}

//MasterActor
class MasterActor extends Actor{
  val mapActor = context.actorOf(Props(new MapActor)) , name = "map" )
  val reduceActor:ActorRef = context.actorOf(Props(new ReduceActor),name="reduce")
  val aggregateActor = context.actorOf(Props(new AggregateActor),name = "aggregate")
  
  def receive : Receive = {
    case line : String => mapActor ! line
    case mapData : MapData => reduceActor ! mapData
    case reduceData : ReduceData => aggregateActor ! reduceData
    case Result => aggregateActor forward Result
  }
  
}

object akkaWordCount {
  
  def main(args:Array[String]){

    val _system = ActorSystem("MapReduceApp")
    val master = _system.actorOf(Props(new MasterActor) , name = "master")
    implicit val timeout = Timeout(5 seconds)
    
    master ! "The quick brown fox tried to jump over the lazy dog and fell on the dog"
    master ! "Dog is man's best friend"
    master ! "Dog and Fox belong to the same family"
    
    Thread.sleep(500)
    
    val future = (master ? Result).mapTo[String]
    val result = Await.result(future, timeout.duration)
    println(result)
    _system.shutdown
  }
}

鼠标选中 akkaWordCount.scala , Run As -> Scala Application , 在eclipse中运行工程,查看Console中输出的结果.

yuliying

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
sbt 配置 akka , spark , 创建eclipse工程

sbt运行时经常需要下载大量的jar包，默认连接到maven官网，速度通常比较慢。在`~/.sbt/`下添加一个`repositories`文件，里面内容如下：[repositories]localoschina: http://maven.oschina.net/content/groups/public/oschina-ivy:http://maven.oschina.net/c
复制链接

扫一扫

专栏目录