关闭

Spark源码编译---Spark学习笔记1

标签: spark源代码
4314人阅读 评论(5) 收藏 举报
分类:

要学习一个框架最好的方式就是调试其源代码。

编译Spark 0.81  with hadoop2.2.0

本机环境:

1.eclipse kepler

2.maven3.1

3.scala2.9.3

4.ubuntu12.04

步骤:

1. 先从网上下载spark0.81的源代码.  下载方式:_

2.  upzip v0.8.1-incubating.zip

3.  export MAVEN_OPTS="-Xmx1g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"  //这里-Xmx自己设置,我设置的是1G,机子比较旧。。。。推荐2G,如果jvm挂了,还是设置为1g把,慢就慢点了。

victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ export MAVEN_OPTS="-Xmx1g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

4.  maven就是好用,mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0  -Pnew-yarn -DskipTests package

victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0  -Pnew-yarn -DskipTests package

5. ....最终编译成功。

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM .......................... SUCCESS [5.742s]
[INFO] Spark Project Core ................................ SUCCESS [6:55.638s]
[INFO] Spark Project Bagel ............................... SUCCESS [57.687s]
[INFO] Spark Project Streaming ........................... SUCCESS [1:59.625s]
[INFO] Spark Project ML Library .......................... SUCCESS [1:12.154s]
[INFO] Spark Project Examples ............................ SUCCESS [4:01.735s]
[INFO] Spark Project Tools ............................... SUCCESS [18.163s]
[INFO] Spark Project REPL ................................ SUCCESS [59.977s]
[INFO] Spark Project YARN Support ........................ SUCCESS [1:24.402s]
[INFO] Spark Project Assembly ............................ SUCCESS [47.046s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 18:42.710s
[INFO] Finished at: Fri Mar 28 00:47:06 CST 2014
[INFO] Final Memory: 64M/560M
[INFO] ------------------------------------------------------------------------

然后用sbt(simple build tool)

victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true ./sbt/sbt assembly
Getting org.scala-sbt sbt 0.12.4 ...


[info] Checking every *.class/*.jar file's SHA-1.
[info] SHA-1: 040d65230771f2da5c90328a4e4ea844a489f39e
[info] Packaging /home/victor/software/incubator-spark-0.8.1-incubating/examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar ...
[info] Done packaging.



[info] Done packaging.
[success] Total time: 4488 s, completed Mar 28, 2014 2:18:46 AM

打包完成后,在assembly/target/scala-2.9.3/目录下会生成两个jar包,其中一个是spark-assembly-0.8.1-incubating-hadoop2.2.0.jar,examples/target/scala-2.9.3/下面也有一个jar包:spark-examples-assembly-0.8.1-incubating.jar,接下来将重点使用这两个包。

victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating/assembly/target/scala-2.9.3$ ll
total 90504
drwxrwxr-x 3 victor victor     4096  3月 28 21:43 ./
drwxrwxr-x 9 victor victor     4096  3月 28 01:27 ../
drwxrwxr-x 3 victor victor     4096  3月 28 01:27 cache/
-rw-rw-r-- 1 victor victor 92659663  3月 28 02:06 spark-assembly-0.8.1-incubating-hadoop2.2.0.jar



victor@victor-ubuntu:~/software/incubator-spark-0.8.1-incubating/examples/target/scala-2.9.3$ ll
total 179004
drwxrwxr-x 5 victor victor      4096  3月 28 01:59 ./
drwxrwxr-x 8 victor victor      4096  3月 28 01:26 ../
drwxrwxr-x 3 victor victor      4096  3月 28 01:23 cache/
drwxrwxr-x 4 victor victor      4096  3月 28 00:40 classes/
-rw-rw-r-- 1 victor victor  59982904  3月 28 00:43 spark-examples_2.9.3-assembly-0.8.1-incubating.jar
-rw-rw-r-- 1 victor victor 123286056  3月 28 02:18 spark-examples-assembly-0.8.1-incubating.jar
drwxrwxr-x 3 victor victor      4096  3月 28 00:41 test-classes/

将以下文件夹放到一个文件夹spark_client作为客户端。conf/assembly/target/scala-2.9.3/ 只需拷贝jar包examples/target/scala-2.9.3/只需拷贝jar包spark-class文件

保证:conf目录、spark-class文件,assembly目录(内部有target目录)、examples目录(内部有target目录)要写一个脚本来运行spark程序,就用example的例子把。详见我的下一篇,运行篇---->Spark学习笔记2--计算Pi


A Note About Hadoop Versions

Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported storage systems. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling:

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly

In addition, if you wish to run Spark on YARN, set SPARK_YARN to true:

SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

Note that on Windows, you need to set the environment variables on separate lines, e.g., set SPARK_HADOOP_VERSION=1.2.1.

For this version of Spark (0.8.1) Hadoop 2.2.x (or newer) users will have to build Spark and publish it locally. See Launching Spark on YARN. This is needed because Hadoop 2.2 has non backwards compatible API changes.


<原创,转载请注明出处http://blog.csdn.net/oopsoom/article/details/22345777>

0
0

  相关文章推荐
猜你在找
深度学习基础与TensorFlow实践
【在线峰会】前端开发重点难点技术剖析与创新实践
【在线峰会】一天掌握物联网全栈开发之道
【在线峰会】如何高质高效的进行Android技术开发
机器学习40天精英计划
Python数据挖掘与分析速成班
微信小程序开发实战
JFinal极速开发企业实战
备战2017软考 系统集成项目管理工程师 学习套餐
Python大型网络爬虫项目开发实战(全套)
查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:438424次
    • 积分:4939
    • 等级:
    • 排名:第5388名
    • 原创:80篇
    • 转载:0篇
    • 译文:1篇
    • 评论:66条
    博客专栏
    微博
    文章分类
    最新评论