看example源码学spark系列(2)-SparkPi

第二节继续计算Pi,这个是用到了Spark包的pi,显然更高大上了!

同样,simple.sbt文件

name := "SparkPi"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

SparkPi.scala文件

import scala.math.random

import org.apache.spark._

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    if (args.length == 0) {
      System.err.println("Usage: SparkPi <master> [<slices>]")
      System.exit(1)
    }
    val spark = new SparkContext(args(0), "SparkPi",
      System.getenv("SPARK_HOME"), SparkContext.jarOfClass(this.getClass))
    val slices = if (args.length > 1) args(1).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }
}
文件目录

$ find .
.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SparkPi.scala

运行(注意,这个是带参数的参数是“local[3]”):

$ sbt "project sparkpi" "run local[3]"
[info] Set current project to SparkPi (in build file:/home/jpan/Mywork/spark-example/exspark/SparkPi/)
[info] Set current project to SparkPi (in build file:/home/jpan/Mywork/spark-example/exspark/SparkPi/)
[info] Updating {file:/home/jpan/Mywork/spark-example/exspark/SparkPi/}sparkpi...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Compiling 1 Scala source to /home/jpan/Mywork/spark-example/exspark/SparkPi/target/scala-2.10/classes...
[info] Running SparkPi local[3]
..............................................
Pi is roughly 3.14652
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/static,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/metrics/json,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/executors,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/environment,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/pool,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/stage,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage,null}
14/05/09 16:10:46 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/05/09 16:10:48 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/05/09 16:10:48 INFO network.ConnectionManager: Selector thread was interrupted!
14/05/09 16:10:48 INFO network.ConnectionManager: ConnectionManager stopped
14/05/09 16:10:48 INFO storage.MemoryStore: MemoryStore cleared
14/05/09 16:10:48 INFO storage.BlockManager: BlockManager stopped
14/05/09 16:10:48 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/05/09 16:10:48 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/05/09 16:10:48 INFO spark.SparkContext: Successfully stopped SparkContext
14/05/09 16:10:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/05/09 16:10:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
[success] Total time: 12 s, completed May 9, 2014 4:10:48 PM
源码分析

 if (args.length == 0) {
      System.err.println("Usage: SparkPi <master> [<slices>]")
      System.exit(1)
    }

这段是要求用户必须输入参数,<>表示必须输入的,<master>就是你运行spark程序的主节点,因为我使用的是local,所以输入local就行了。

[ ]这是可选输入,表示线程数。我输入的是3,即3个线程运行。

    val spark = new SparkContext(args(0), "SparkPi",
      System.getenv("SPARK_HOME"), SparkContext.jarOfClass(this.getClass))

SparkContext是Spark里最重要的类之一,它指定运行配置环境。其api文档解释是

Main entry point for Spark functionality. A SparkContext represents the connection to a Sparkcluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

下面是并行执行运行计算pi,其中parallelize定义为如下:

    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y < 1) 1 else 0
    }.reduce(_ + _)

def parallelize[T](seq: Seq[T], numSlices: Int = defaultParallelism)(implicit arg0: ClassTag[T]): RDD[T]

Distribute a local Scala collection to form an RDD.

OK,这节结束,基本熟悉了spark的编程和运行方式了吧。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值