Spark作为最有可能代替mapreduce的分布式计算框架,当前非常火,本人也开始关注Spark并试着从hadoop+mahout转向Spark。
1.本地环境
本地为ubuntu14.04+jdk1.7
2.源码下载
http://spark.apache.org/downloads.html
我所使用的版本是:0.9.1版本,源码包为:spark-0.9.1.tgz
3.编译
解压缩源码包
$tar xzvf spark-0.9.1.tgz
$cd spark-0.9.1
sbt/sbt assembly
编译成功,你就可以玩儿啦!
4.运行实例
最简单的计算pi
$./bin/run-example org.apache.spark.examples.SparkPi local[3]
local代表本地,[3]表示3个线程跑。
结果如下:
Pi is roughly 3.13486
14/05/08 10:26:15 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/static,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/metrics/json,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/executors,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/environment,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/pool,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/stages/stage,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage,null}
14/05/08 10:26:15 INFO handler.ContextHandler: stopped o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/05/08 10:26:16 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
14/05/08 10:26:16 INFO network.ConnectionManager: Selector thread was interrupted!
14/05/08 10:26:16 INFO network.ConnectionManager: ConnectionManager stopped
14/05/08 10:26:16 INFO storage.MemoryStore: MemoryStore cleared
14/05/08 10:26:16 INFO storage.BlockManager: BlockManager stopped
14/05/08 10:26:16 INFO storage.BlockManagerMasterActor: Stopping BlockManagerMaster
14/05/08 10:26:16 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
14/05/08 10:26:16 INFO spark.SparkContext: Successfully stopped SparkContext
14/05/08 10:26:16 INFO Remoting: Remoting shut down
14/05/08 10:26:16 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
看到了吧PI=3.13486
5.交互方式运行
$ ./bin/spark-shell
进入交互模式。
交互模式实例可以看这里,它是对READ.me文件的处理。
http://spark.apache.org/docs/latest/quick-start.html